We are excited to announce the release of Llama4 models, Scout and Maverick, on Oracle Cloud Infrastructure (OCI) Generative AI service. These models employ a Mixture of Experts (MoE) architecture, enabling efficient and powerful processing capabilities. The models are optimized for multimodal understanding, multilingual tasks, coding, tool-calling, and powering agentic systems.
Regions supported on GA:
- On-demand: ORD
- Dedicated AI Clusters: ORD, GRU, LHR, KIK
Key Features of the Llama 4 Series
- Multimodal capabilities: Both models are natively multimodal, capable of processing and integrating various data types, including text and images.
- Multilingual support: Training on data encompassing 200 languages, with fine-tuning support for 12 languages including Arabic, English, French, German, Hindi, Indonesian, Italian, Portuguese, Spanish, Tagalog, Thai, and Vietnamese. Image Understanding is English-only.
- Efficient Development: Llama 4 Scout is designed for accessibility, fitting on a smaller GPU footprint.
- Open-weight models: Both models are released under the custom Llama 4 Community License Agreement, allowing developers to fine-tune and deploy them while adhering to specific licensing terms.
- Knowledge Cutoff: August 2024
Note: The Llama Acceptable Use policy restricts its use in the European Union (EU).
Llama 4 Scout
- Architecture: Features 17 billion active parameters within a total of approximately 109 billion parameters, utilizing 16 experts.
- Context Window: Supports a context length of up to 10 million tokens (requires multiple GPUs). At GA, the service will support a context length of 192k tokens.
- Deployment: Designed to operate efficiently on a small GPU footprint.
- Performance: Outperforms models like Google’s Gemma 3 and Mistral 3.1 across multiple benchmarks.
Llama 4 Maverick
- Architecture: Features 17 billion active parameters but within a larger framework totalling around 400 billion parameters, utilizing 128 experts.
- Context Window: Supports a context length of up to 1 million tokens. At GA, the service will support a context length of 512k tokens.
- Deployment: Requires a larger footprint than Scout- approximately 2x for the OCI deployment at GA,
- Performance: Demonstrates performance comparable to OpenAI’s GPT-4o and DeepSeek-V3 in coding and reasoning tasks.
These advancements position the Llama 4 series as a significant step forward in AI model development, offering enhanced performance, versatility, and accessibility for a wide range of applications.
OCI customers can leverage models without infrastructure management concerns. Access is available through chat interfaces, APIs, or dedicated endpoints.
For integration details and pricing information, please consult our Generative AI service documentation or contact your Oracle representative.
For more information, please refer to the following resources.
- Generative AI service documentation
- OCI Generative AI service
- OCI Generative AI Agents
- OCI AI Service
- What is Generative AI?
- Why generative AI with Oracle?
- What is retrieval-augmented generation (RAG)?