Coco Liu

Principal Product Manager

Product Manager, AI Incubation/Strategic Initiatives

Recent Blogs

Serving smaller Llama LLM models in a cost-efficient way with Ampere ...

In this blog, we will recap the main takeaways from the panel session and showcase an example of retrieval-augmented generation (RAG) to inference a use case leveraging Oracle 23ai vector database with a Llama 3 8B model running on an Ampere A1 instance on Oracle Cloud Infrastructure (OCI).

From inference to RAG: Choosing CPUs for efficient generative AI ...

Ampere ARM 64 CPUs are very cost efficient serving LLM generative AI agents that augment a response using your business data. This blog details the performance achievements and end to end examples for deploying container applications with OCI and firsthand customer experience working on this deployment stack.

Introducing Meta Llama 3 on OCI Ampere A1: A testament to CPU-Based ...

Meta Llama 3, the latest advancement in open-source Large Language Models (LLM), is now available for inference workloads using Ampere Altra, ARM-based CPUs on Oracle Cloud Infrastructure (OCI) Released by Meta on April 18th, Llama 3 models have been hailed as “the most capable openly available LLM to date,” offering unprecedented performance and flexibility for language processing tasks.

Democratizing Generative AI with CPU-based Inference

The Generative AI market faces a significant challenge regarding hardware availability worldwide. Much of the expensive GPU hardware capacity is being used for Large Language Model (LLM) training therefore creating an availability crunch for users wanting to deploy, evaluate foundation models in their own cloud tenancy/subscriptions for inference and fine tuning the ML models. CPUs are a choice for various workloads. Below is our experience working with CPUs including performance test results.

Receive the latest blog updates