JSON data is often stored in NoSQL or other special purpose databases. Although these allow the data to be stored and accessed, they do not have a similary strong consistency model, transaction model and other standard functionalities of relational databases. Since Oracle Database 12c, it is possible to access JSON data with standard database tools. Access is then carried out using standard means and SQL/JSON functions. No additional software installation is required; JSON support is immediately available in all cloud and software installations of the database without additional license costs. In addition, all relational database features are available when using JSON, such as indexes, transaction handling, mixed queries, relational views, external tables, etc. JSON data can therefore be accessed like all data in the database via OCI, .NET, REST, JDBC and of course tools such as APEX.
In the following blog, some essential JSON technologies are briefly outlined and demonstrated using short examples. However, some basics of JSON are assumed. The tutorial will walk you through the required steps for loading, accessing, indexing and searching JSON data in Oracle Database. We added notes to draw attention to 23ai features.
JSON Data Type, External Tables and Collection Tables
Before accessing JSON data in the database, consider how JSON can be made available in the database. JSON data can simply be stored textual in columns of data type VARCHAR2 or CLOB/BLOB. With the IS JSON condition the data can be validated – for well-formedness or for type of syntax used (STRICT or LAX, LAX is more tolerant and allows for example that key name are not quoted).
Release 21c introduced a dedicated JSON data type that you can use to store JSON data in an efficient binary format. This improves query performance because textual JSON data no longer needs to be parsed. You can create JSON-type instances from other SQL data, and conversely. There is a JSON constructor function to convert textual JSON data to JSON-type data and a JSON_Serialize SQL operator for the other direction.
Let’s start with a simple example: The JSON data is available in a DMP format (download from here) and can be made available to the database via the external table interface.
Note: In 23ai, JSON data type is supported as a column type in the external table definition. Newline-delimited and JSON-array file options are supported, which facilitates importing JSON data from an external table.
The following code snippet demonstrates access via the external table JSON_DUMP_FILE_CONTENT. There is a logical directory JSON_DIR and we have the privilege to access it.
SQL> connect scott@pdb1
Enter password:
Connected.
-- check access to JSON_DIR or create the directory
SQL> create directory json_dir as 'home/oracle/json_data_dir';
Directory created.
-- create external table with JSON data type.
SQL> create table json_dump_file_content (json_document JSON)
ORGANIZATION EXTERNAL (TYPE ORACLE_LOADER DEFAULT DIRECTORY json_dir
ACCESS PARAMETERS (RECORDS DELIMITED BY NEWLINE
DISABLE_DIRECTORY_LINK_CHECK
BADFILE 'JSONDumpFile.bad'
LOGFILE 'JSONDumpFile.log'
FIELDS (json_document CHAR(5000)))
LOCATION ('PurchaseOrders.dmp'))
REJECT LIMIT UNLIMITED;
Table created.
SQL> select count(*) from json_dump_file_content; COUNT(*) ---------- 10000
To convert JSON type data to textual data, you can use the JSON_SERIALIZE function and pretty-print it.
SQL> select json_serialize(json_document pretty) output
from json_dump_file_content where rownum=1;
OUTPUT
--------------------------------------------------------------------------------
{
"PONumber" : 1,
"Reference" : "MSULLIVA-20141102",
"Requestor" : "Martha Sullivan",
"User" : "MSULLIVA",
"CostCenter" : "A50",
"ShippingInstructions" :
{
"name" : "Martha Sullivan",
"Address" :
{
"street" : "200 Sporting Green",
"city" : "South San Francisco",
"state" : "CA",
"zipCode" : 99236,
"country" : "United States of America"
},
"Phone" :
[
{.....
Let’s create collections tables introduced in 23ai. A JSON collection table stores JSON documents in a fixed shape. It has a single, JSON-type object column named DATA.
If you provide CREATE JSON COLLECTION TABLE with the keywords WITH ETAG then each JSON document contains a document-handling field “_metadata”, whose value is an object with etag as its only field. This is the same as for a JSON duality view.
Let’s create a JSON COLLECTION table without any additional options.
SQL> CREATE JSON COLLECTION TABLE purchaseorders; Table created. SQL> desc purchaseorders; Name Null? Type ----------------------------------------- -------- ---------------------------- DATA JSON
Let’s add ETags. You may even use ETags in SQL Queries, please refer to the posting “ETags with Oracle Database tables” (posting).
SQL> CREATE JSON COLLECTION TABLE purchaseorders1 with etag; Table created. SQL> desc purchaseorders1 Name Null? Type ----------------------------------------- -------- ---------------------------- DATA JSON
When you create a JSON collection table you can also define one or more expression columns for it. These columns are virtual.
SQL> CREATE JSON COLLECTION TABLE purchaseorders2 WITH ETAG
(ref_long as (json_value (DATA,'$.PONumber.number()') || json_value
(DATA,'$.CostCenter.number()')));
Table created.
SQL> desc purchaseorders2
Name Null? Type
----------------------------------------- -------- ----------------------------
DATA JSON
We can also partition a JSON collection table.
SQL> create JSON COLLECTION table purchaseorder_part (po_num NUMBER GENERATED ALWAYS AS (json_value (DATA, '$.PONumber.number()' ERROR ON ERROR))) PARTITION BY RANGE (po_num) (PARTITION p1 VALUES LESS THAN (1000), PARTITION p2 VALUES LESS THAN (2000), PARTITION p3 values less than (maxvalue)); Table created.
To get some more insights, let’s query USER_TAB_COLS and check for virtual and hidden columns.
SQL> select table_name, column_name, hidden_column, data_type from user_tab_cols where table_name like 'PURCHASE%' and virtual_column='YES'; TABLE_NAME COLUMN_NAME HIDDEN_CO DATA_TYPE ------------------------------ ------------------------------------------------- --------- ---------- PURCHASEORDERS SYS_IME_OSON_18A3BF68FAF64FEBBF3DFB5AE7888261 YES RAW PURCHASEORDERS1 SYS_IME_OSON_C1DD49BA923C4F1BBFB54CF71B0BB148 YES RAW PURCHASEORDERS2 REF_LONG YES VARCHAR2 PURCHASEORDERS2 SYS_IME_OSON_4E247E7684CF4F56BFF6A6764C019E0C YES RAW PURCHASEORDER_PART PO_NUM YES NUMBER PURCHASEORDER_PART SYS_IME_OSON_2D825815C8E44FB6BFB95D865D7CA8DD YES RAW
ALL_JSON_COLLECTIONS describes all JSON collections accessible to the current user.
SQL> select * from ALL_JSON_COLLECTIONS; OWNER COLLECTION_NAME COLLECTION_TYPE ------------------------ ------------------------------ ------------------------------------ SCOTT PURCHASEORDERS TABLE SCOTT PURCHASEORDERS1 TABLE SCOTT PURCHASEORDERS2 TABLE SCOTT PURCHASEORDER_PART TABLE SH PURCHASEORDERS1 TABLE SCOTT DEPARTMENT_DV DUALITY VIEW
Accessing JSON data: Functions, Conditions, Dot Notation
Special SQL functions and operators have been introduced to ensure easy use of JSON information in the database. For example, if you want to access the data with a simple dot notation, you must use the IS JSON check constraint or a JSON data type with a table alias in the query.
Let’s insert some data in table purchaseorders1.
SQL> insert into purchaseorders1 select * from json_dump_file_content; 10000 rows created.
Let’s query it. As you see an entry “_id”, a unique identifier, is added automatically at insert time.
SQL> select json_serialize(data pretty) output
from purchaseorders1 where rownum=1;
OUTPUT
----------------------------------------------------------------------------
{
"_id" : "66bc86cd3fe2372edf048fb2",
"_metadata" :
{
"etag" : "1FA1F5048FB32EDFE0633579A50A9B92"
},
"PONumber" : 23,
"Reference" : "SVOLLMAN-20140523",
"Requestor" : "Shanta Vollman",
"User" : "SVOLLMAN",
"CostCenter" : "A50",
"ShippingInstructions" :
{
"name" : "Shanta Vollman",
"Address" :
{
"street" : "200 Sporting Green",
"city" : "South San Francisco",
"state" : "CA",
"zipCode" : 99236,
"country" : "United States of America"
},
"Phone" :
[
{
"type" : "Office",
"number" : "282-555-1511"
}
]
},
...
Now let’s use MIN and MAX function in a query and use the dot notation. You need to use a table alias, ‘j’ in this example.
SQL> select min(j.data."PONumber") min_po, max(j.data."PONumber") max_po from purchaseorders1 j; MIN_PO MAX_PO -------------------- -------------------- 1 10000
Note: In 23ai, JSON data type can be used directly in a WHERE, ORDER BY, and GROUP BY clause. Therefore, you can directly compare or sort values of JSON data type of any kind — whether scalar, object, or array. This means you can use JSON type directly in a WHERE clause, an ORDER BY clause, or a GROUP BY clause. More information can be found in the documentation.
Let’s use it in our example:
SQL> col ponumber format 99999 SQL> col costcenter format a10 SQL> select e.data.PONumber.number() ponumber, e.data."CostCenter" costcenter from purchaseorders1 e where e.data."CostCenter" ='A50' order by e.data.PONumber.number(); PONUMBER COSTCENTER -------- ---------- 1 "A50" 2 "A50" 3 "A50" 4 "A50" 5 "A50" 6 "A50" 7 "A50" 8 "A50" 12 "A50" 13 "A50" 14 "A50" 15 "A50" 16 "A50" 18 "A50" ...
Similar to the XMLTYPE usage, special JSON/SQL functions are available for JSON access.
The following functions and conditions can be used for queries on JSON documents:
- JSON_EXISTS: checks for the existence of a particular value within JSON data.
- JSON_VALUE: selects JSON data and returns a SQL scalar or an instance of a user-defined SQL object type or SQL collection type (varray, nested table).
- JSON_QUERY: selects a JSON fragment.
- JSON_TABLE: projects JSON data into a relational format via a virtual table, similar to a relational inline view.
- JSON_TEXTCONTAINS: can be in WHERE clause to perform a full-text search of JSON data.
Let’s illustrate JSON_VALUE in the following example.
SQL> select json_value(data,'$.ShippingInstructions.Phone[0].type') output from purchaseorders1 where json_value(data,'$.PONumber' returning number)=1000; OUTPUT --------------------------------------------------------------------- Office
There are many more examples in the documentation. See here.
Indexes: Function Based Index, Bitmap Index, Search Index
As you can imagine, the new constructs JSON_VALUE and JSON_EXISTS are well suited to creating a function-based index on a column with JSON content. The corresponding JSON paths are used to generate a scalar value. Bitmap indexes can also be created. The following example demonstrate the use.
SQL> create index ponumber_idx purchaseorders1(json_value(data,'$.PONumber.number()' error on error)); Index created.
SQL> col index_name format a30 SQL> select index_name, index_type, distinct_keys from user_indexes where table_name='PURCHASEORDERS1'; INDEX_NAME INDEX_TYPE DISTINCT_KEYS ------------------------------ ------------------------- ------------- PONUMBER_IDX FUNCTION-BASED NORMAL 10000 SYS_IL0000105542C00001$$ LOB SYS_C0013717 FUNCTION-BASED NORMAL ...
The JSON_VALUE function normally generates a VARCHAR2 value. In our example, we use a RETURNING clause to ensure that the NUMBER data type is used. The “ERROR ON ERROR” clause aborts the index creation with an error if no numeric value or no value at all is recorded in “PONumber”.
Now let’s run some queries to demonstrate the use of the index. By checking the execution plan, we can verify the usage. In the first example, information from the PURCHASEORDER1 table is queried.
SQL> set autotrace traceonly explain
SQL> select * from purchaseorders1
where json_value(data,'$.PONumber' returning number) = 1000;
Execution Plan
----------------------------------------------------------
Plan hash value: 205302180
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4115 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| PURCHASEORDERS1 | 1 | 4115 | 2 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | PONUMBER_IDX | 1 | | 1 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access(JSON_VALUE("DATA" /*+ LOB_BY_VALUE */ FORMAT OSON ,
'$.PONumber.number()'
RETURNING NUMBER ERROR ON ERROR TYPE(LAX) )=1000)
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
- automatic DOP: Computed Degree of Parallelism is 1 because of parallel threshold
However, if, for example, you only know the expression and not the corresponding path, you can still reach your goal with the condition JSON_TEXTCONTAINS. You must first create a JSON search index for the JSON column. The JSON search index is an Oracle Text index that is designed specifically for Ad Hoc Queries and Full-Text search using JSON data. It can improve the performance of both (1) ad hoc structural queries, that is, queries that you might not anticipate or use regularly, and (2) full-text search.
The column on which you create a JSON search index can be of data type JSON, VARCHAR2, CLOB, or BLOB. It must be known to contain only well-formed JSON data, which means that it is either of type JSON or it has an IS JSON check constraint.
By default, a JSON search index is maintained asynchronously and is automatically synced in the background. Of course, you can change this behavior. More about Text index maintenance in 23ai can be found in the posting “Text Index with Automatic Maintenance” (posting).
Let’s create a default JSON search index.
SQL> CREATE SEARCH INDEX po_search_idx on purchaseorders1 (data) FOR JSON; Index created.
Let’s try it out with the Oracle Text operator fuzzy. We use the fuzzy operator to expand queries to include words that are spelled similarly to the specified term. This type of expansion is helpful for finding more accurate results when there are frequent misspellings in your document set. In our example we mispelled the requestor “Vollmann”.
SQL> set autotrace on explain
SQL> set linesize 140
SQL> select json_serialize(data pretty) from purchaseorders1 p
WHERE json_textcontains(data, '$.Requestor', 'fuzzy(Volman)') and rownum=1;
JSON_SERIALIZE(DATAPRETTY)
----------------------------------------------------------------------------------------------------
{
"_id" : "66bc86cd3fe2372edf048fb2",
"_metadata" :
{
"etag" : "1FA1F5048FB32EDFE0633579A50A9B92"
},
"PONumber" : 23,
"Reference" : "SVOLLMAN-20140523",
"Requestor" : "Shanta Vollman",
"User" : "SVOLLMAN",
...
Execution Plan
----------------------------------------------------------
Plan hash value: 145592397
------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 916 | 1 (0)| 00:00:01 |
|* 1 | COUNT STOPKEY | | | | | |
| 2 | TABLE ACCESS BY INDEX ROWID| PURCHASEORDERS1 | 1 | 916 | 1 (0)| 00:00:01 |
|* 3 | DOMAIN INDEX | PO_SEARCH_IDX | | | 1 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM=1)
3 - access("CTXSYS"."CONTAINS"("P"."DATA" /*+ LOB_BY_VALUE */ ,'(fuzzy(Volman))
INPATH (/Requestor)')>0)
Note
-----
- automatic DOP: Computed Degree of Parallelism is 1 because of parallel threshold
Please consider indexing queries for which you know the query pattern ahead of time it’s generally advisable to use a function-based index that targets such a specific pattern. If both function-based and JSON search indexes are applicable to given a query, the function-based index is used.
Note: Starting with Oracle Database Release 21c, instead of creating an index, you can load the column into memory, using an In-Memory columnar format. This does not require an index, but allows for fast scanning of the text using In-Memory techniques. This is particularly valuable when running queries that combine text searches and structured searches on other In-Memory columns.
But what about Indexing JSON arrays? JSON arrays have always been a bit ‘odd’ in a relational database as they highlight the differences between the data models: JSON is a hierarchical (tree) data model where entities are embedded in each other (using nested arrays or objects) whereas tables are flat with just one value per columns and entities connected by relationships (same column value in different tables). Function based indexes used to be limited to one value per column and therefore could not support JSON arrays (We had good ‘workarounds’ like JSON Search index, materialized views, etc). But in 21c we added a true multi-value functional index allowing to index specific values in JSON arrays. You can get more information on that in the posting from Beda Hammerschmidt “JSON Type and other cool new JSON features in Oracle database release 21c“).
But how can we get the structure and content of JSON data stored in the database? The aggregate function JSON_DATAGUIDE lets you discover information about the structure and content of JSON documents stored in Oracle Database. Here is the example for our JSON data. We use dbms_json.format_hierarchical to receive the hierarchical format (flat format is the default setting) and the option DBMS_JSON.PRETTY to improve readability of the returned data guide with appropriate indentation.
SQL> select JSON_DATAGUIDE(data, dbms_json.format_hierarchical, dbms_json.pretty) dg_doc
from purchaseorders1;
DG_DOC
--------------------------------------------------------------------------------
{
"type" : "object",
"o:length" : 1,
"properties" :
{
"_id" :
{
"type" : "id",
"o:length" : 12,
"o:preferred_column_name" : "_id"
},
"User" :
{
"type" : "string",
"o:length" : 8,
"o:preferred_column_name" : "User"
},
"PONumber" :
{
"type" : "number",
"o:length" : 4,
"o:preferred_column_name" : "PONumber"
},
"LineItems" :
{
"type" : "array",
"o:length" : 1,
"o:preferred_column_name" : "LineItems",
"items" :
{
"properties" :
{
"Part" :
{
"type" : "object",
"o:length" : 1,
"o:preferred_column_name" : "Part",
"properties" :
{
"UPCCode" :
{
"type" : "number",
"o:length" : 8,
"o:preferred_column_name" : "UPCCode"
},
...
Let’s try it out and create a multivalue index on the LineItems array.
SQL> CREATE MULTIVALUE INDEX p_uppcode_idx on purchaseorders1 p (p.data.LineItems.Part.UPCCode.number()); Index created.
The following query will use the multi-value index (see execution plan).
SQL> set autotrace on explain
SQL> set linesize window
SQL> select data
from purchaseorders1 e
where json_exists(data, '$.LineItems.Part?(@.UPCCode == 85391628927)')
and rownum<=2;
DATA
--------------------------------------------------------------------------------
{"_id":"66bc86cd3fe2372edf04a856","_metadata":{"etag":"1FA1F504A8572EDFE0633579A
50A9B92"},"User":"ABULL","CostCenter":"A50","ShippingInstructions":{"name":"Alex
is Bull","Address":{"street":"200 Sporting Green","city":"South San Francisco","
state":"CA","zipCode":99236,"country":"United States of America"},"Phone":[{"typ
e":"Office","number":"909-555-7307"},{"type":"Mobile","number":"415-555-1234"}]}
,"Requestor":"Alexis Bull","Reference":"ABULL-20140421","AllowPartialShipment":t
rue,"Special Instructions":null,"LineItems":[{"ItemNumber":1,"Part":{"Descriptio
n":"One Magic Christmas","UnitPrice":19.95,"UPCCode":13131092899},"Quantity":9},
{"ItemNumber":2,"Part":{"Description":"Lethal Weapon","UnitPrice":19.95,"UPCCode
":85391628927},"Quantity":5}],"PONumber":1600}
{"_id":"66bc86cd3fe2372edf04a04e","_metadata":{"etag":"1FA1F504A04F2EDFE0633579A
50A9B92"},"PONumber":1086,"Reference":"TJOLSON-20141124","Requestor":"TJ Olson",
"User":"TJOLSON","CostCenter":"A50","ShippingInstructions":{"name":"TJ Olson","A
ddress":{"street":"200 Sporting Green","city":"South San Francisco","state":"CA"
,"zipCode":99236,"country":"United States of America"},"Phone":[{"type":"Office"
,"number":"449-555-8784"}]},"Special Instructions":"Courier","LineItems":[{"Item
Number":1,"Part":{"Description":"Bruce Lee","UnitPrice":27.95,"UPCCode":56775022
990},"Quantity":4},{"ItemNumber":2,"Part":{"Description":"Lethal Weapon","UnitPr
ice":19.95,"UPCCode":85391628927},"Quantity":5},{"ItemNumber":3,"Part":{"Descrip
tion":"On Her Majesty's Secret Service","UnitPrice":19.95,"UPCCode":27616812629}
,"Quantity":6},{"ItemNumber":4,"Part":{"Description":"Andy Griffith Show","UnitP
rice":27.95,"UPCCode":56775071295},"Quantity":5}]}
Execution Plan
----------------------------------------------------------
Plan hash value: 2702768220
--------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 1834 | 7 (0)| 00:00:01 |
|* 1 | COUNT STOPKEY | | | | | |
| 2 | TABLE ACCESS BY INDEX ROWID BATCHED| PURCHASEORDERS1 | 2 | 1834 | 7 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN (MULTI VALUE) | P_UPPCODE_IDX | 50 | | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<=2)
3 - access(JSON_QUERY("DATA" /*+ LOB_BY_VALUE */ FORMAT OSON ,
'$.LineItems.Part.UPCCode.number()' RETURNING NUMBER ASIS WITHOUT ARRAY WRAPPER ERROR ON ERROR
NULL ON EMPTY NULL ON MISMATCH TYPE(LAX) MULTIVALUE)=85391628927)
Note
-----
- dynamic statistics used: dynamic sampling (level=AUTO (SYSTEM))
- automatic DOP: Computed Degree of Parallelism is 1 because of parallel threshold
Conclusion
JSON is a native part of the Oracle database and can be used in all editions of the database, such as RAC, Single Instance, Non CDB, CDB etc. No particular prerequisites or setups are required to use the JSON functionality. If you have already worked with XML in Oracle database, you will notice that operators and functions have a certain similarity to the functions of the XMLDB. Even if it was not explicitly mentioned, there are no restrictions for queries on mixed data, i.e. JSON and relational data. For example, you can perform join operations on structured and JSON data.
JSON data can be accessed like all data in the database via OCI, .NET, REST, JDBC and of course tools such as APEX. Especially the recent APEX version, APEX 24.1, introduces a major change to REST Data Sources: APEX understands now complex and nested REST API responses as well (see also the posting listed in the future readings section).
In this posting we concentrated on the basic features. There are many more topics we could have covered. So that the posting does not become too long, topics such as JSON generation, JSON views especially JSON Duality Views, JSON schema etc. are left out and will be covered in a future posting.
Further readings
- JSON Developer’s Guide
- JSON in Oracle Database Office Hours: Oracle Database 23ai: JSON Collection Tables and Views
- Oracle Database 23ai (Playlist YouTube)
- JSON in Oracle database (YouTube)
- Data Types in 23ai – BOOLEAN, VECTOR, JSON, and XMLTYPE (YouTube)
- JSON in Oracle Database (GitHub)
- APEX 24.1: REST Data Sources and nested JSON responses (posting)
- JSON Type and other cool new JSON features in Oracle database release 21c (posting)
- Text Index with Automatic Maintenance (posting)
- New full-text search in 23ai: Ubiquitous Database Search (posting)
- JSON postings on blogs.oracle.com/coretec (postings)
