Generative AI (Gen AI) and large language models (LLMs) are revolutionizing personal and professional lives. From digital assistants that manage email to chatbots that can communicate with enterprise data across industries, languages, and specialties, these technologies are driving a new era of convenience, productivity, and connectivity. LLMs can generate human-like text that is increasingly improving toward a human level of accuracy for tasks such as translation, summarization, content generation, question answering and many other real-world applications.
Oracle Cloud Infrastructure (OCI) provides comprehensive solutions including high performance compute infrastructure, databases, data science and GenAI services to build, train and deploy LLMs at scale.
We describe how you can deploy an LLM on Oracle Cloud Infrastructure using AI and Data Science Services.
You can download and save a Hugging Face model into a disk, and you can use joblib, pickle, cloudpickle, onnx to save the model. You also need to create a score.py file to host “load_model ()” and predict () function. The command “load_model ()” reads the model saved in the disk and returns an estimator object and “predict ()” function which takes a payload in the format of JSON object and model which is returned by load_model () function. It converts the JSON object payload into model object format for prediction. Later you can use the “score. predict ()” function to test the model locally.
OCI Data Science supports NVIDIA Triton Inference Server as a special container, mapping the service-mandated endpoints to Triton's inference capabilities and health HTTP or REST endpoints to help free you from it. To enable this process, set the environment variables when creating the model deployment. After the model is built, you can download and store the NVIDIA Triton Inference Server image to Oracle Container Registry. The Data Science notebook provides a simple API or software development kit (SDK) to compress model artifacts using Python. Oracle provides a sample Python code example to register model artifacts and store Triton Inference Server images for deploying LLM models.
You can build the model artifacts into a .zip archive file and save the model in Container Registry, always free with your OCI subscription. Only pay for the resource you consume. You can also create a NVIDIA Triton Inference Server using The Docker “Build” command and create model deployment as HTTP endpoint using register Triton container image.
OCI Data Science model deployment supports the zero-downtime update of individual models without changing the version structure. However, if you run the command “update_zdt”, for Triton-based model deployments, the version structure should be unchanged for the underlying model; otherwise, it can result in downtime.
OCI provides a convenient and flexible way to deploy and scale LLMs. Oracle provides various methods to apply AI to your business applications and empower innovation using Oracle’s Software-as-a-Service (SaaS) solutions and Data and AI platform. We offer cost effective high- performance compute, storage, and network infrastructure to build, test, deploy and use AI applications. To try it yourself, you can refer to Oracle data science AI sample code in the Oracle Data Science Repository.
If you’re new to Oracle Cloud Infrastructure, you can try out this solution for free using Oracle Cloud Free Tier, which provides US$300 free trial credits for a 30-day period. Free Tier also includes several Always Free services that are available for unlimited time, even after free credits expire.
For more information, see the following resources:
https://www.oracle.com/artificial-intelligence/
https://www.oracle.com/artificial-intelligence/data-science/
https://www.oracle.com/artificial-intelligence/host-llms-with-nvidia-gpus-in-oci/
I am a lead the Cloud Native Solution Architect in Cloud Engineering specializing in Modern Data Platform, Analytics and AI/ML. I provide Solution Architectures that are the best in class in OCI Cloud Native Services - Big Data Services, OKE, Data Science, AI Services and GenAI. I am focused on OCI AI services like Computer Vision, Anomaly Detection (based on MSET2) and Natural Language Understanding (NLU) including LLMs. I also lead the Public Cloud Migration programs namely - AWS Takeout / Azure Takeout / GCP Takeout.
Previous Post
Next Post