Modern enterprises handle vast volumes of unstructured documents, where extracting meaningful insights—especially those tied to location data, zip codes with a combination of phone numbers, and a lot of numeric values—can be both time-intensive and error-prone.

Oracle Cloud Infrastructure (OCI) addresses this challenge by bringing together Document AI, Select AI, and Spatial capabilities, enabling developers to convert raw document data into actionable geospatial intelligence.

Thumbnail for this article

Preface: This article demonstrates how to design and implement a spatial intelligence application on OCI by integrating the following:

  • OCI Document Understanding for intelligent document parsing
  • Oracle Autonomous Database for structured data persistence
  • Select AI (NL2SQL) for translating natural language into executable SQL queries
  • Oracle Spatial for advanced geospatial analysis

Together, these services form a cohesive pipeline that bridges unstructured document processing with location-aware analytics.

We’ll go over how to construct a spatial data visualization application in this article using:

  • OCI Document Understanding
  • Oracle Autonomous Database (AI Database)
  • Select AI (Natural Language to SQL)
  • Spatial Analysis using pre-trained models

For visualization, either we can use Oracle APEX or Python Streamlit Map for a flexible, developer-driven interface to render interactive maps and spatial insights.

What is OCI Document Understanding?

OCI Document Understanding is an AI service that enables developers to extract the following:

  • Text
  • Tables
  • Key-value pairs
  • Document classifications
  • Optical Character Recognition (OCR)

It provides pre-trained models for common document processing tasks and supports customization for industry-specific needs. Refer to the oracle document for detailed information

Once a model is selected, documents (PDF/images) can be uploaded via APIs or CLI.
and data from tables can be extracted either as TEXT or as structured TABLE format (rows × columns) with ease.

Solution Overview

For certain use cases where documents contain a large amount of numerical data—such as pincodes, phone numbers, latitude, and longitude—vectorizing digits may not yield optimal results.

In such scenarios, to achieve better accuracy and performance with LLMs, it is more effective to:

  • Store the extracted data in relational format
  • Use NL2SQL (Select AI) to query the data

With this approach, we build a pipeline that:

  1. Extracts structured data from documents
  2. Stores it in Oracle Autonomous Database
  3. Uses natural language queries to retrieve insights
  4. Performs spatial analysis using LLM-generated SQL
  5. Visualizes results on a map

Additionally:

  • Text can be vectorized and embedded to perform similarity searches using natural language

Architecture Overview

High-level flow of the solution:

Architecture diagram that implies process flow

Architecture Workflow

  1. Upload a sample PDF containing address and geolocation data
  2. Extract tabular data and store it in Oracle Autonomous Database
  3. Extract text data and store it as embeddings
  4. Create Select AI profile for NL2SQL
  5. Create another profile for spatial analysis using LLM
  6. Execute queries and visualize results using Streamlit

Prerequisites

Create AI Credential

BEGIN
DBMS_CLOUD.CREATE_CREDENTIAL(
credential_name => '<cred_name>',
user_ocid => '<your_ocid1.user>',
tenancy_ocid => '<your_ocid1.tenancy>',
private_key => '-----BEGIN RSA PRIVATE KEY-----
…
       -----END RSA PRIVATE KEY-----',
fingerprint => '');
END;

Ref : https://docs.oracle.com/en-us/iaas/Content/API/SDKDocs/cliinstall.htm#configfile

Create Select AI Profiles

BEGIN
DBMS_CLOUD_AI.create_profile(
profile_name => 'profile_name',
attributes => '{
     "provider": "oci",
     "credential_name": "cred_name",
     "object_list": [{"owner": "<schema_name>", 
                       "name": "<table_name"}]}');
END;

Example:

BEGIN
DBMS_CLOUD_AI.create_profile(
profile_name => 'PROP_DET_G',
attributes => '{
          "provider": "oci",
          "credential_name": "GENAI_CRED",
          "object_list": [
             {"owner": "APPUSER", "name": "SLA_PROPERTY_DETAILS_G"}]
         }');
END;

Create Spatial Analysis Profile

Example :

BEGIN
DBMS_CLOUD_AI.create_profile(
profile_name => 'SPAT_ANALY_MLLAMA4',
attributes => '{
      "provider": "oci",
      "credential_name": "GENAI_CRED",
      "model": "meta.llama-4-maverick-17b-128e-instruct-fp8",
      "region": "us-chicago-1",
      "oci_apiformat": "GENERIC",
      "temperature": 0,
      "object_list": [
                {"owner": "<schema_name>", "name": "PROPERTY_DETAILS_G"},
                {"owner": "<schema_name>", "name": "SINGAPORE_AMENTIES"}]
               }');
END;

Ref: https://docs.oracle.com/en-us/iaas/autonomous-database-serverless/doc/dbms-cloud-ai-package.html

Note: Here we are using the pre-trained model meta.llama-4 , which provides appropriate responses for spatial operators. However, we can use any models based on our requirements. Refer here for the latest LLM model information.

Python Integration

Config – Initialize the global variables in pycode

Example: 
    CONFIG_PATH = r"C:\Users\Sathishkumar.oci\config"
    CONFIG_PROFILE = "DEFAULT"
    COMPARTMENT_ID = "ocid1.compartment.oc1"
    NAMESPACE = "sehubjapaciaas"
    BUCKET_NAME = "DM_bucket"
    PREFIX = "aiop"

Note: Either you can configure OCI CLI and store all OCI resource information in a config file, or you can assign those values as GLOBAL parameters inside the Python script.

Initialize OCI Clients

def init_clients():
    config = oci.config.from_file(CONFIG_PATH, CONFIG_PROFILE)

    ai_client = oci.ai_document.AIServiceDocumentClientCompositeOperations(
        oci.ai_document.AIServiceDocumentClient(config=config)
    )

    object_storage_client = oci.object_storage.ObjectStorageClient(config=config)
    return ai_client, object_storage_client

Data Processing Workflow

  • Upload PDF
  • Extract structured and unstructured data
  • Store tabular data in database
  • Store embeddings for semantic queries

The complete python routines can be referred here

1- Querying with Select AI (NL2SQL)

Set profile:

BEGIN
DBMS_CLOUD_AI.SET_PROFILE(profile_name => 'PROP_DET_G');
END;

Example: Prompts in natural language

  • List all records where pincode = 571100
  • Show properties on street ‘Lush on Holland Hill’

Example:1 Image that describes query in natural language
Example:2 Image that describes query in natural language

Above, output list all the properties based on prompt using Select AI ( NL2SQL)

2 – Spatial Visualization

  • Convert address into geolocation using Geoapify
  • Display Points of Interest (PoI)
  • Visualize using Streamlit map

How it works: Based on the selected record, the corresponding address attributes (such as building name, street name, and pincode) are structured into a JSON payload and sent to the Geoapify Geocoding API. The service resolves the address into precise geographic coordinates (latitude and longitude). Once the location is identified, nearby Points of Interest (PoIs) are retrieved and rendered on the map in the application for spatial visualization.

This is snapshot image showing spatial visualization of address in text box

Spatial Analysis with Select AI

Switch profile:

cursor.execute("""
BEGIN
DBMS_CLOUD_AI.SET_PROFILE(profile_name => 'SPAT_ANALY_MLLAMA4');
END; """)

Example: Prompts

  • Show all amenities near property ID 370 within 3 km
  • Show hospitals near property ID 52 within 2 km

Generate SQL

Sample Spatial Analytical query Generation using Select AI

SELECT DBMS_CLOUD_AI.GENERATE(prompt => 'Show all amenities near property ID 370 within 3 km',
profile_name => 'SPAT_ANALY_MLLAMA4',
action => 'showsql') as query
FROM dual;
SELECT DBMS_CLOUD_AI.GENERATE(prompt => 'Show hospitals near property ID 52 within 2 km',
profile_name => 'SPAT_ANALY_MLLAMA4',
action => 'showsql') as query
FROM dual;
Call the prompts inside routine in python

cursor.execute("""
         SELECT DBMS_CLOUD_AI.GENERATE(
         prompt => :1,
         action => 'runsql'
          ) FROM dual
    """, [system_prompt_spatial])

Spatial Analytics query generated by LLM :

Snapshot image that shows Oracle select AI generates and returns the sql query with appropriate spatial operators

Note: Provide system prompts to guide spatial SQL generation using Oracle Spatial operators. You can refer the complete system prompt here

Snapshot of sql output which generated based on previous query

Spatial Visualization

The results (amenities, locations, and distances) are visualised using the following:

  • Interactive maps
  • Real-time filtering
  • Spatial overlays
Snapshot of spatial visualization of amenities which generated based on previous query

Snapshot of spatial visualization of amenities with spatial analytics which generated based on previous query

Key Takeaways

  • Automate document processing using OCI Document Understanding
  • Query structured data using natural language (Select AI)
  • Generate spatial queries using LLMs
  • Combine relational + vector approaches for better accuracy
  • Build end-to-end applications rapidly with Python

Conclusion

By combining Document AI, Select AI, and Spatial capabilities, developers can build powerful applications that transform unstructured documents into location-aware intelligence systems.

This approach is ideal for:

  • Real estate analytics
  • Smart cities
  • Logistics and supply chain
  • Location intelligence platforms

For complete implementation, refer here: