Global enterprises often receive and maintain a large volume of documents and data in multiple languages, creating challenges for users who cannot read all those languages. Content in multiple languages can also create challenges for AI models trained on content in a limited set of languages, resulting in reduced response quality for non-English prompts and content. This applies to both LLMs generating responses and transformers generating vector embeddings. In such a scenario, translation services can make content more accessible to a wider set of users and improve AI model effectiveness. Oracle Autonomous Database Select AI now enables text translation directly from your database using OCI translation services (OCI Language), simplifying application development by supporting AI-based translation within SQL queries. 

Additionally, the ability to summarize various data – documents, logs, operational data, etc. – can turn a vast amount of information into actionable and digestible information. Large Language Models (LLMs) can help maintain content accuracy, handle context, and deal with domain-specific content – providing automated text summarization to filter, condense, and restructure information, whether in an extractive or abstractive manner. Select AI now offers a text summarization capability using the LLM specified in your AI profile

By combining these two features – translation and summarization – in the same SQL queries, you can easily develop solutions for use cases such as translating global news articles and summarizing them by topic or translating social media content and summarizing feedback for common threads or issues.

Using Select AI for Text Translation

In general, Select AI capabilities like NL2SQL, RAG, and chat rely on the LLM itself to detect the language used in the prompt. However, this is not always reliable or may not generate the result in the desired language. To exercise greater control, the Select AI’s ‘translate’ action uses dedicated AI provider translation services to translate your prompt and response using the language(s) you specify. 

While the translate capability can be used in combination with summarize, as noted, above, it can also be used in conjunction with other Select AI features such as retrieval augmented generation (RAG). For example, you may want to translate multilingual documents into a common language before vectorizing the resulting text. This can improve vector quality when using Select AI RAG and allow using a wider range of transformers, such as those that are not multilingual. 

Initially, Select AI’s translate capability uses OCI’s translation services, which currently support 30 languages. You can select source and target languages based on those supported by the service and specify your language choices either in your AI profile or as parameters to the TRANSLATE function. To view the list of supported languages by the OCI service, use the AI_TRANSLATION_LANGUAGES view.

There are three ways to use the translate capability:

  1. On the SQL command line using “select AI translate <text to be translated>”
  2. Using the function DBMS_CLOUD_AI.TRANSLATE
  3. Using DBMS_CLOUD_AI.generate with the ‘translate’ action

When specifying a language, you can either specify the language name or its code, e.g., “french” or “fr”.

Examples

In the following example, we set our AI profile “oci_translate” that specifies both source (“en” for English) and target (“fr” for French) languages. After setting the profile, we use the SQL command line to translate the string “I need to translate this” to French. 

BEGIN
  DBMS_CLOUD_AI.create_profile(
      'oci_translate',
      '{"provider": "oci",
         "credential_name": "new_CRED",
         "source_language": "en",
         "target_language": "fr"
        }');
END;

SQL> exec dbms_cloud_ai.set_profile('oci_translate');
PL/SQL procedure successfully completed.

SQL> select ai translate I need to translate this;
RESPONSE
---------------------
Je dois traduire ceci

In this second example, we’re using the TRANSLATE function from the DBMS_CLOUD_AI package. Although we’re using the same AI profile, we can override the target language explicitly here to translate to Spanish. 

DECLARE
  output clob;
BEGIN
  output := DBMS_CLOUD_AI.translate(
       profile_name      => 'oci_translate',
       text              => 'text to translate',
       target_language   => 'SPANISH'
  );
 
  dbms_output.put_line('translated_text: ' || output);
END;

translated_text: texto a traducir

Using Select AI for Summarization

The Select AI summarization capability can provide content either explicitly in the prompt or from a URI. You can control the summarization from several options—for example, in paragraphs or as a list of bullet points—and how closely the summary should follow the original wording, i.e., its extractiveness level.

Select AI supports summarizing text up to 1 GB in size using your specified AI provider. Like the translate capability, you use the summarize capability by specifying the ‘summarize’ action at the SQL command line, in the GENERATE function, or invoking the SUMMARIZE function. 

There are three ways to use the summarize capability:

  1. On the SQL command line using “select AI summarize <text to be translated>”
  2. Using DBMS_CLOUD_AI.generate with the ‘summarize’ action
  3. Using the function DBMS_CLOUD_AI.SUMMARIZE

Summarize processing options

You can choose one of two processing approaches for summarization, especially when working with large text: MapReduce and iterative refinement.

MapReduce

The MapReduce approach processes chunks independently and potentially in parallel. Each chunk generates a separate summary, as depicted in Figure 1. These summaries are then aggregated (summarized) into a final summary. This is appropriate for large-scale documents where parallel processing of chunks won’t compromise summary accuracy.
 
Figure 1: MapReduce process for text summarization

Iterative Refinement

In iterative refinement, Select AI divides the document into multiple chunks and generates an initial summary from the first chunk. As each additional chunk is processed, the summary is progressively updated, resulting in a complete summary over time, as depicted in Figure 2. This approach is applicable to situations where contextual accuracy and coherence are critical, such as when summarizing complex or highly interconnected texts where each part builds on the previous. Iterative refinement can also be particularly useful for processing large files that exceed the token limit of your LLM. Since this approach is sequential in nature by design, processing time can be longer than the MapReduce approach. 
 
Figure 2: Iterative Refinement process for text summarization

Both MapReduce and iterative refinement can be used to generate summaries for large texts that exceed the token limit of your LLM. In both approaches, the text must be broken into smaller, manageable chunks. However, unlike iterative refinement—which improves summaries step by step —MapReduce processes the chunks independently and in parallel, generating individual summaries for each. These summaries are then gradually combined to form a unified overall summary.

Examples

In the following example, we use the minimally specified AI profile oci_summarize. 

BEGIN
  DBMS_CLOUD_AI.create_profile(
      'oci_summarize',
      '{"provider": "oci",
         "credential_name": "new_CRED",
        }');
END;

SQL> exec dbms_cloud_ai.set_profile('oci_summzarize');
PL/SQL procedure successfully completed.

SQL> select ai summarize 
This is the long text to summarize…;

RESPONSE
---------------------
<the summarized text>

 

The next example uses the GENERATE function with the ‘summarize’ action. We first retrieve the text from object storage and pass it as the prompt. 

SELECT DBMS_CLOUD_AI.generate(
         prompt       => TO_CLOB(DBMS_CLOUD.get_object(
                          'OBJECT_STORE_CRED',
                          'https:// …/bucket…/o/select_ai/summary/test_4000_words.txt')),
         profile_name => 'oci_summarize',
         action       => 'SUMMARIZE');

Alternatively, you can use the SUMMARIZE function. In this example, we specify a location URI, a user prompt to guide the summary from the LLM, and the following parameters to indicate how long the summary should be:

  • user_prompt: The summary should start with ‘The summary of the article is: ‘
  • min_words: 50
  • max_words: 100
SELECT DBMS_CLOUD_AI.summarize(
         location_uri    => 'https://…bucket…/o/select_ai/'summary/test_4000_words.txt',
         credential_name => 'WEN_STORE_CRED',
         profile_name    => 'oci_summarize',
         user_prompt     => 'The summary should start with ''The summary of ' ||
                            'the article is: ''',
         params          => '{"min_words":50,"max_words":100}')

The summary of the article is:
<paragraph text of summary>.

This example asks for summary as a list using the following parameters:

  • user_prompt: The summary should start with ‘The summary of the article is: ‘
  • max_words: 100
  • summary_style: list

 

DECLARE
    l_text      CLOB;
    l_response  CLOB;
BEGIN
    l_text := TO_CLOB(
                DBMS_CLOUD.get_object(
                  credential_name => 'OBJECT_STORE_CRED',
                  location_uri    => 'https://…bucket…/o/select_ai/summary/dreams.txt'));
    l_response := DBMS_CLOUD_AI.summarize(
        text          =>  l_text,
        profile_name  =>  'oci_summarize',
        user_prompt   =>  'The summary should start with ''The summary of ' ||
                          'the article is: ''',
        params        =>  '{"max_words":100, "summary_style":"list"}');
    DBMS_OUTPUT.PUT_LINE(l_response);
END;
 
The summary of the article is:
- <summary text item 1>
…
- <summary text item n>.

Resources