Meta’s Llama 4 models – Llama 4 Scout and Llama 4 Maverick are here! These models enable people to build more personalized multimodal experiences. They offer significant improvements in image and text understanding, instruction following, and accommodate a range of use cases and developer needs. Whether you’re building apps for reasoning, summarization, or conversational AI, Llama 4 Scout and Maverick deliver powerful performance with open access. In a previous blog, we showed that you can use a Bring Your Own Container approach to deploy and fine tune Llama 4 models in Oracle Cloud Infrastructure (OCI) Data Science.  As OCI Data Science AI Quick Actions supports the newly released vLLM 0.8.3, we’re now enabling users to deploy and fine-tune Llama 4 models in a no-code environment. This broadens the range of personas able to leverage these models and simplify the process of working with OCI’s infrastructure and tools.

What are Llama 4’s Improvements?

Meta’s Llama 4 family includes:

Llama 4 Scout: A powerful multimodal model that supports context window of up to 10M tokens with 17B active parameters, 16 experts and a total of 109B parameters that can fit on a H100 (with Int4 quantization).

Llama 4 Maverick: A 17B active parameter model with 128 experts and a total of 400B parameters, delivering strong performance to cost ratio for reasoning and coding while remaining open-weight and customizable and can fit on a H100.

The new Llama 4 models use a mixture of experts (MoE) architecture. In MoE models, a single token activates only a fraction of the total parameters. MoE architectures are more compute efficient for model training and inference and, given a fixed training FLOPs budget, deliver higher quality models compared to dense architectures. Llama 4 models are designed with native multimodality, incorporating early fusion to seamlessly integrate text and vision tokens into a unified model backbone.

Download Llama 4 Scout and Llama 4 Maverick models today on Meta’s website llama.com and Hugging Face, an online model repository. Oracle Cloud Infrastructure (OCI) Data Science is a platform for data scientists and developers to work with open source models powered by OCI’s compute infrastructure with features that support the entire machine learning lifecycle. You can bring in Llama 4 models from Hugging Face or Meta to use inside OCI Data Science AI Quick Actions, a no-code solution for customers to seamlessly manage, deploy, fine-tune, and evaluate large language models inside our platform.  

What’s New in vLLM 0.8.3?

vLLM  is an open source library for inferencing and serving large language models.  vLLM 0.8.3 and later versions work with both Llama 4 Scout and Maverick models.  

Llama 4 + AI Quick Actions from OCI Data Science 

With AI Quick Actions, OCI Data Science offers a no-code solution to makes it easier to work with LLMs without setting up the serving container. You can:

  • Register models from Hugging Face or OCI Object Storage
  • Deploy models with our service-managed containers using vLLM 0.8.3 or later version
  • Fine-tune and evaluate models using an intuitive UI

How to Bring in Llama 4:

  • Access the model: Accept Meta’s license agreement via Hugging Face integration and generate an access token to log into the Hugging Face CLI in the terminal of the notebook you are using to access AI Quick Actions.
    • Command to validate access:
huggingface-cli login <your-hugging-face-token></your-hugging-face-token>
  • Register your model:
    • Use the “Register new model” option in AI Quick Actions as shown in Fig 1
    • Choose to pull from Hugging Face or bring in your model artifacts via Object Storage.  Fig 2 shows the option to pull model from Hugging Face.
  • Deploy: Use the service managed container vLLM 0.8.3 or later version to deploy Llama 4 and start generating responses.
  • Evaluate models with no additional setup

GPU Requirement: One H100 is required to run Llama 4 models on AI Quick Actions.  Working with a H100 in OCI Data Sciences requires a reservation for the shape.  You can do so by submitting a service request and specifying the shape and region you are interested in using the shape.  For additional information on working with GPU in OCI Data Science, check Requesting a GPU.   

Data Science AI Quick Actions model explorer
Fig 1:  Option to register a new model in AI Quick Actions

 

Model registration panel in AI Quick Actions
Fig 2:  Register a model from Hugging Face

 

Coming Soon: More Llama 4 Variants on AI Quick Actions. Meta has hinted at additional Llama 4 variants and multimodal extensions. These models will be added to AI Quick Actions soon, further expanding the suite of open-access models you can run on OCI. Stay tuned for even more powerful options to build and scale GenAI solutions directly in the OCI Data Science platform. 

Why Choose OCI Data Science AI Quick Actions for Llama 4?

OCI Data Science enables you to stay up to date with AI developments effortlessly. Through our partnership with Meta, the availability of Llama 4 models in OCI Data Science represents a step forward for anyone looking to build, deploy, and refine AI solutions, and with AI Quick Actions, fine-tuning and deploying those models are even easier.

  • Managed infrastructure: Focus on the model, not on setup.
  • Enterprise-grade GPUs: Run large models like Llama 4 Scout and Maverick smoothly.
  • Minimal set up: Leverage Data Science’s service managed container for deployment and fine tuning
  • Full lifecycle support: From development to deployment and monitoring.

Start Building with Llama 4 on OCI Today! The combination of Meta’s Llama 4 multimodals and OCI’s AI infrastructure makes it easier than ever to go from prototype to production.

Explore

References