Oracle Generative AI documentation | LiteLLM OCI provider docs

Why this matters

One OpenAI-compatible gateway can route to every model family hosted on Oracle Generative AI Infrastructure. OCI Signature v1 signing is handled inside LiteLLM, including Instance Principal and OKE Workload Identity paths.Teams get production controls such as virtual keys, budgets, routing, fallbacks, caching, guardrails, audit logging, and cost tracking.

LiteLLM now treats Oracle Generative AI Infrastructure as a first-class provider. Developers can route requests to Meta Llama, xAI Grok, Cohere Command, Cohere Embed, Google Gemini, and OpenAI gpt-5 through a single OpenAI-compatible endpoint, while LiteLLM handles OCI Signature v1 request signing for every supported authentication path.

That matters because modern AI systems rarely use only one model. A production agent may call a fast model for routing, a long-context model for retrieval, a reasoning model for planning, a vision model for document understanding, and an embedding model for memory. Without a gateway, each model family can bring a different SDK, authentication scheme, request shape, and rate-limit policy.

LiteLLM removes that complexity. Applications call the familiar OpenAI Chat Completions or embeddings interface. The gateway resolves credentials, chooses the right vendor adapter, transforms the request into the shape Oracle Generative AI Infrastructure expects, signs it, and normalizes the response before it returns to the application. Cohere-specific fields, generic model formats, reasoning controls, and streaming response buffering stay behind the gateway boundary.

Figure 1. LiteLLM sits between the application and the OCI tenancy, exposing one OpenAI-compatible API while forwarding signed requests to Oracle Generative AI Infrastructure.

Figure 1. LiteLLM sits between the application and the OCI tenancy, exposing one OpenAI-compatible API while forwarding signed requests to Oracle Generative AI Infrastructure.

What changed

The new provider guide and implementation bring the integration to parity with other major cloud inference platforms. Previous support focused on an early community contribution for Cohere chat with manual request signing. The current work covers proxy configuration, tool calling, vision input, reasoning parameters, environment-based authentication, and the current OCI model catalog.

All OCI-hosted models are addressable as oci/<model-name>. Application code does not need to branch for Cohere versus generic model families, and existing tools that already speak the OpenAI API can target a LiteLLM proxy with minimal or no code changes.

Example: call OCI through the LiteLLM SDK

from litellm import completion

response = completion(
    model="oci/xai.<grok-chat-model>",
    messages=[{"role": "user", "content": "What's the weather in Boston?"}],
    tools=[{
        "type": "function",
        "function": {"name": "get_current_weather"},
    }],
    tool_choice="auto",
)
print(response.choices[0].message.tool_calls)
Figure 2. A single request flows from an OpenAI-shaped call, through credential resolution, request transformation, OCI Signature v1 signing, and normalized response handling.

Figure 2. A single request flows from an OpenAI-shaped call, through credential resolution, request transformation, OCI Signature v1 signing, and normalized response handling.

Enterprise gateway capabilities

For production teams, the gateway is useful because it centralizes the controls that customers would otherwise need to build for each application:

  • Virtual API keys with per-key budgets, RPM and TPM limits, model allowlists, expiry dates, and team or user attribution.
  • Cost tracking with request-level attribution to key, team, user, model, or tag.
  • Routing and fallback across OCI regions or across providers on rate-limit or 5xx errors.
  • Caching through in-memory, Redis, S3, and Qdrant back ends in semantic or exact-match modes.
  • Guardrails and audit logging that apply uniformly across all providers, including Oracle Generative AI Infrastructure.

Deployment note: LiteLLM can be deployed entirely within a customer-managed OCI environment, helping organizations keep prompts, credentials, and application data within their tenancy boundaries.

CapabilityCoverage
Chat, synchronous and streamingAll vendor families
Function and tool callingCohere plus generic model families
Vision and multimodal inputMeta Llama vision variants, Cohere Command vision variants, Google Gemini 2.5
Reasoning controlsGoogle Gemini 2.5, OpenAI gpt-5, and latest xAI Grok reasoning variants
EmbeddingsCohere Embed, single and batch requests up to 96 documents
AuthenticationManual credentials, OCI_* environment variables, OCI SDK Signer, Instance Principal, and OKE Workload Identity

Example: run a LiteLLM proxy in front of OCI

# config.yaml
model_list:
  - model_name: oci-grok
    litellm_params:
      model: oci/xai.<grok-chat-model>
      oci_region: os.environ/OCI_REGION
      oci_user: os.environ/OCI_USER
      oci_fingerprint: os.environ/OCI_FINGERPRINT
      oci_tenancy: os.environ/OCI_TENANCY
      oci_key_file: os.environ/OCI_KEY_FILE
      oci_compartment_id: os.environ/OCI_COMPARTMENT_ID

litellm --config config.yaml

The Agentic Layer

LiteLLM gives applications one contract for every model on Oracle Generative AI Infrastructure. The natural next layer is the OpenAI Agents SDK, OpenAI’s open-source framework for building agentic applications. Agents SDK agents can plan, call tools, hand off work to other agents, enforce guardrails, and stream events back to a UI.

With LiteLLM in front, the Agents SDK can use its built-in OpenAI-compatible model class. The gateway holds the OCI signing credentials and enforces platform controls, while the agent carries only a virtual key issued by the gateway. That keeps model governance, cost attribution, and identity management in one place.

Example: OpenAI on top of the LiteLLM AI Gateway

from agents import Agent, OpenAIChatCompletionsModel, Runner, set_tracing_disabled
from openai import AsyncOpenAI

set_tracing_disabled(True)  # tracing would need an OpenAI platform key

client = AsyncOpenAI(
    api_key="<virtual-key>",            # key issued by the gateway
    base_url="http://litellm-gateway:4000",
)
agent = Agent(
    name="Research assistant",
    instructions="You are a concise research assistant.",
    model=OpenAIChatCompletionsModel(model="oci-cohere-command", openai_client=client),
)
result = Runner.run_sync(agent, "Summarise the latest news on ...")
print(result.final_output)
 
Figure 3. The OpenAI Agents SDK consumes the LiteLLM AI Gateway through its OpenAI-compatible model class, while the gateway carries budget, routing, observability, guardrail, and signing responsibilities.

Figure 3. The OpenAI Agents SDK consumes the LiteLLM AI Gateway through its OpenAI-compatible model class, while the gateway carries budget, routing, observability, guardrail, and signing responsibilities.

What customers can build

  • Multi-model agents that keep planning, tool execution, memory, and vision inside the same OCI tenancy and compartment.
  • OpenAI-compatible applications that can be repointed to Oracle Generative AI Infrastructure without an SDK swap.
  • Document and image pipelines that use the same image_url content block already supported by OpenAI-compatible vision APIs.
  • Hybrid routing setups where LiteLLM fails over from Oracle Generative AI Infrastructure to another provider, or vice versa, without application code changes.

Conclusion and CTA

This release turns LiteLLM into a practical enterprise gateway for Oracle Generative AI Infrastructure. Together with the OpenAI Agents SDK, the combination helps Oracle customers move from a few API calls to governed, observed, multi-tenant agent systems with the routing, spend, caching, guardrail, and audit surface required for production AI.

Next steps