Announcing Meta Llama 4 model support on OCI Generative AI

We are excited to announce the release of Llama4 models, Scout and Maverick, on Oracle Cloud Infrastructure (OCI) Generative AI service. These models employ a Mixture of Experts (MoE) architecture, enabling efficient and powerful processing capabilities. The models are optimized for multimodal understanding, multilingual tasks, coding, tool-calling, and powering agentic systems.

Regions supported on GA:

On-demand: ORD
Dedicated AI Clusters: ORD, GRU, LHR, KIK

Key Features of the Llama 4 Series

Multimodal capabilities: Both models are natively multimodal, capable of processing and integrating various data types, including text and images.
Multilingual support: Training on data encompassing 200 languages, with fine-tuning support for 12 languages including Arabic, English, French, German, Hindi, Indonesian, Italian, Portuguese, Spanish, Tagalog, Thai, and Vietnamese. Image Understanding is English-only.
Efficient Development: Llama 4 Scout is designed for accessibility, fitting on a smaller GPU footprint.
Open-weight models: Both models are released under the custom Llama 4 Community License Agreement, allowing developers to fine-tune and deploy them while adhering to specific licensing terms.
Knowledge Cutoff: August 2024

Note: The Llama Acceptable Use policy restricts its use in the European Union (EU).

Llama 4 Scout

Architecture: Features 17 billion active parameters within a total of approximately 109 billion parameters, utilizing 16 experts.
Context Window: Supports a context length of up to 10 million tokens (requires multiple GPUs). At GA, the service will support a context length of 192k tokens.
Deployment: Designed to operate efficiently on a small GPU footprint.
Performance: Outperforms models like Google’s Gemma 3 and Mistral 3.1 across multiple benchmarks.

Llama 4 Maverick

Architecture: Features 17 billion active parameters but within a larger framework totalling around 400 billion parameters, utilizing 128 experts.
Context Window: Supports a context length of up to 1 million tokens. At GA, the service will support a context length of 512k tokens.
Deployment: Requires a larger footprint than Scout- approximately 2x for the OCI deployment at GA,
Performance: Demonstrates performance comparable to OpenAI’s GPT-4o and DeepSeek-V3 in coding and reasoning tasks.

These advancements position the Llama 4 series as a significant step forward in AI model development, offering enhanced performance, versatility, and accessibility for a wide range of applications.

OCI customers can leverage models without infrastructure management concerns. Access is available through chat interfaces, APIs, or dedicated endpoints.

For integration details and pricing information, please consult our Generative AI service documentation or contact your Oracle representative.

Announcing Meta Llama 4 model support on OCI Generative AI

Key Features of the Llama 4 Series

Llama 4 Scout

Llama 4 Maverick

For more information, please refer to the following resources.

Niharika Kalra

Senior Product Marketing Manager, Cloud Platform

David Miller

Supplier Risks Assessment Using Generative AI

Talk to my agent, please!

Announcing Meta Llama 4 model support on OCI Generative AI

Key Features of the Llama 4 Series

Llama 4 Scout

Llama 4 Maverick

For more information, please refer to the following resources.

Authors

Niharika Kalra

Senior Product Marketing Manager, Cloud Platform

David Miller

Supplier Risks Assessment Using Generative AI

Talk to my agent, please!