Engineered for the edge: Small enough to run on edge devices like the NVIDIA Jetson Nano and mobile phones, the model uses Gemma’s 256k vocabulary to efficiently tokenize JSON and multilingual inputs.
- Broad ecosystem support: The model is supported by popular tools across the entire workflow: fine-tune with Hugging Face Transformers, Unsloth, Keras or NVIDIA NeMo and deploy using LiteRT-LM, vLLM, MLX, Llama.cpp, Ollama, Vertex AI or LM Studio.