Subscribe

Share

Database, SQL and PL/SQL

Run with JSON

Store JSON natively in Oracle Database 12c, and access it via SQL.

By Arup Nanda Oracle ACE Director

January/February 2015

Mark, the chief technology officer at Acme Bank, has several concerned visitors today. Recently Acme has been accepting transactions from business partners such as convenience stores and third-party billing companies—and even partners outside the country. To simplify and expedite the processing of transactions from these external systems, Acme has allowed the transactions to come in a semistructured manner, in the form of JavaScript Object Notation (JSON)—a leading document interchange format. JSON enables any set of data to be transmitted immediately, without a predetermined format expected by a relational database, and this makes it attractive for integrating outside transactions quickly. Acme’s partners can send any pertinent data they want without first waiting for a mutually agreeable format. However, even though Acme allows data to come in as JSON, the data is stored in a structured manner in the database, in a relational format.

Dave, the lead developer, and his team spend a lot of time accepting, parsing, and deciphering the JSON documents before storing them in the database. The JSON format enables any data to be included, but if there is no corresponding database column (or table), the data can’t be accepted into Acme’s systems. This is making life miserable for Debbie, the lead DBA, who has to alter the database structures quickly and often to accommodate the new data. It’s not OK with her that fast and frequent responses are required to make things work, and she wants Dave to immediately stop sending her a continuous stream of data management projects. But Dave has no choice. Acme must accept semistructured data in JSON format to be competitive in the marketplace, and the JSON format enables the company to develop a repository for valuable but loosely structured data such as streams and feeds from various sources such as Facebook and Twitter—content often referred to as big data. This repository must be flexible enough to store any type of data coming in while being easy to query with a language everyone understands: SQL. So, Dave explains, far from going away, JSON is here to stay and its use will be growing exponentially at Acme.

Altering existing database structures takes time, and if a specific data item in an incoming JSON document has no corresponding column in the Acme database, that transaction has to wait until the column is created. This means an interruption of the normal business of the company. Dave wants new columns to somehow be created on the fly, although he is fully aware that it is impossible. In short, everyone is unhappy with the present arrangement. And they all turn to Mark, who brought in JSON as a transaction data exchange format in the first place.

There is a very easy solution, responds Mark: put JSON documents directly in the database without any parsing, preprocessing, or prior alteration of database structures. The solution delivers the flexibility of JSON for semistructured data and the power, reliability, and familiarity of Oracle Database—with no special actions required of the DBAs. The audience, intrigued, presses him to explain.


What Is JSON?

JSON, Mark explains, is a standard for free-format text in which any kind of data can be included, along with a descriptor. The descriptor for the data is called a key, and the actual data is called a value. The collection of related data is put into a single JSON document file. Any type of data can be represented as key/value pairs. Mark shows everyone an example of a bank transaction in a JSON document, shown in Listing 1, and provides a description of key lines in the listing. (The line numbers shown are not part of the document; they are there to aid in the explanation.)

Code Listing 1: A check transaction JSON document

  1  {
2 "TransId" : 1,
3 "TransDate" : "01-JAN-2015",
4 "TransTime" : "11:05:00",
5 "TransType" : "Deposit",
6 "AccountNumber" : 123,
7 "AccountName" : "Smith, John",
8 "TransAmount" : 100.00,
9 "Location" : "ATM",
10 "CashierId" : null,
11 "ATMDetails" : {
12 "ATMId" : 301,
13 "ATMLocation" : "123 Some St, Danbury CT 06810"
14 },
15 "WebDetails" : {
16 "URL" : null
17 },
18 "Source" : "Check",
19 "CheckDetails" : [
20 {
21 "CheckNumber" : 101,
22 "CheckAmount" : 50.00,
23 "AcmeBankFlag" : true,
24 "Endorsed" : true
25 },
26 {
27 "CheckNumber" : 102,
28 "CheckAmount" : 50.00,
29 "AcmeBankFlag" : false,
30 "Endorsed" : true
31 }
32 ]
33 }


Line Explanation
1 and 33 JSON keys and values are enclosed in curly braces ({ and }).
2 The TransId key is a unique identifier for the transaction. Key names are enclosed in double quotation marks, and the value for a key is shown after the colon. Because this TransId value is a number datatype, quotation marks are not needed.
3 The value for TransDate is a date format, so it is enclosed in quotation marks. (Characters, time stamps, and so on are also enclosed in quotation marks.)
10 The CashierId value is null, which means it is not known. Because this is an ATM transaction, a cashier is irrelevant. The JSON document could have omitted the CashierId key and value, but it included the key for completeness and assigned a null value.
11–14 ATMDetails embeds a key (which includes the ATMId and ATMLocation keys and values).
19–32 CheckDetails introduces embedded keys and values, but instead of introducing values for just one activity, such as an ATM transaction (as in the case of ATMDetails), CheckDetails introduces multiple values for multiple activities—one set of subkeys and values for each check deposited. There are two checks in this array, and arrays are presented in square brackets ([ and ]).
23 In JSON “true” is a Boolean datatype (used without quotation marks).

Acme receives JSON documents like the one in Listing 1 every day as transactions, Dave confirms, and specially written programs read the documents and store the data in appropriate tables and columns. Well, no more, says Mark.

Mark creates a TRANSACTIONS table to hold the JSON document in Listing 1.

create table transactions (
id number not null primary key,
trans_msg clob,
constraint
check_json check (trans_msg is json)
)

Mark explains that he will store the JSON document in the TRANS_MSG column. But, exclaims Dave, this is just a simple character large object (CLOB) column, not a special JSON datatype column. That is precisely the point, confirms Mark. Oracle Database 12c can store JSON data in ordinary CLOB columns without a special datatype. However, to make sure the users load JSON data and not just any bunch of characters, Mark adds a check constraint on the TRANS_MSG column (TRANS_MSG IS JSON) that requires the data to be in JSON format. The other column in the table, ID, is used only as an identifier for that record. The TRANSACTIONS table doesn’t have columns for all the attributes of a transaction, such as the bank information, check number, and so on. All the data about a transaction is inside the JSON document.

After creating the TRANSACTIONS table, Mark stores the JSON document in the table by using the following SQL statement:


insert into transactions
values (
1
'{ complete text
. (without line numbers)
. of JSON document from
. Listing 1 here }'
)
/

Mark includes the text of the JSON document shown in Listing 1 between the single quotes in the above SQL statement. He then loads another JSON document, shown in Listing 2, into the TRANSACTIONS table. Listing 2 shows a transfer transaction (shown as a value for Source) that transfers money from an external account to account number 125, which is owned by Jane Smith. Because this is a transfer, check details are irrelevant, so there is no CheckDetails section in this JSON document. Similarly, because it’s a transfer, it has some new keys, such as information on the source bank the money is coming from, shown under TransferDetails. Mark asks his visitors to compare the two documents shown in Listings 1 and 2. Although they both convey the same broad type of information (account transactions), they are for different types of transactions, so the document contents are different. When Acme receives a new type of transaction, all the relevant details of that new type of transaction can be represented in a JSON document, even if no room has been made for the data in the database tables. That, Mark explains, is the advantage of storing JSON data in the database, where the entire contents of transactions can be put into a single column: TRANS_MSG. And there is no need to alter the table.

Code Listing 2: Inserting a JSON document for a transfer transaction

insert into transactions
values
(
sys_guid(),
systimestamp,
'{
"TransId" : 3,
"TransDate" : "01-JAN-2015",
"TransTime" : "10:05:00",
"TransType" : "Deposit",
"AccountNumber" : 125,
"AccountName" : "Smith, Jane",
"TransAmount" : 300.00,
"Location" : "website",
"CashierId" : null,
"ATMDetails" : null,
"WebDetails" : {
"URL" : "www.proligence.com/acme/dep.htm"
},
"Source" : "Transfer",
"TransferDetails" :
{
"FromBankRouting" : "012345678",
"FromAccountNo" : "1234567890",
"FromAccountType" : "Checking"
}
}'
)
/

Dave presents another problem. Acme receives data in JSON format in two ways—single transactions and groups of transactions as JSON document files—and the groups typically include several hundred thousand transactions. Using an INSERT statement works for single transactions, he explains, but it becomes impractical for inserting hundreds of thousands of JSON files. Is there a quick-loading mechanism?

Code Listing 3: SQL*Loader control file and data file

File: trans.ctl
load data into table transactions
fields terminated by ','
(
trans_id sequence(max,1),
fname filler char(80),
trans_body lobfile(fname) terminated by EOF
)
File: trans.data
C:\acme\json_landing\trans1.txt
C:\acme\json_landing\trans2.txt
C:\acme\json_landing\trans3.txt

There is, assures Mark, with the SQL*Loader feature of Oracle Database. He creates a SQL*Loader control file named trans.ctl, shown in Listing 3. All the JSON files to be loaded are located in a landing area in the C:\acme\json_landing directory. Mark creates another file named trans.data containing the names of the JSON files to be loaded, also shown in Listing 3. Then he calls SQL*Loader:

sqlldr control=trans.ctl data=trans.data

This command loads all the JSON files listed in the trans.data file to the TRANSACTIONS table—without individual INSERT statements.


Querying

Dave looks skeptical. All Mark did was load the JSON documents into a column of the CLOB datatype, he points out, just as you would for any other text that is meaningless to the database. Will the applications have to extract the contents of this CLOB and then process the data via special programs, as they do now?

Not at all, Mark says with a smile. The database knows that the data in that CLOB column is in JSON format, and it is not just meaningless text. Users can query that data directly, using a JSON extension of the SQL language. To demonstrate, he issues the SQL statement shown in Listing 4 to retrieve the data from the TRANS_MSG column in the TRANSACTIONS table. Pointing to the output, Mark shows how he extracted precisely the items he wanted and retrieved them formatted—as expected—and not as globs of text. The output even took care of multiple child items in an array. For example, in TRANS_ID 1, there are two checks, 101 and 102, in amounts of $50.00 each. In TRANS_ID 2, there is just one check, 151, for $200.00. TRANS_ID 3 is a transfer; so there is no information for the CheckDetails key in the JSON document, and the values for CheckDetails.CheckNumber and CheckDetails.CheckAmount appear as ? to indicate null values. (Null values appear as ? because of SET NULL ? at the beginning of the SQL statement.) Mark cautions everyone that the JSON key names used in the query, including t.Trans_Msg.TransId, are case-sensitive.

Code Listing 4: Retrieving JSON (data method 1)

Set null ?
select t.Trans_Msg.TransId trans_id,
t.Trans_Msg.Source source,
t.Trans_Msg.CheckDetails.CheckNumber check_num,
t.Trans_Msg.CheckDetails.CheckAmount check_amount,
t.Trans_Msg.CheckDetails.AcmeBankFlag Acme_Bank_Flag
from transactions t;
TRANS_ID SOURCE CHECK_NUM CHECK_AMOUNT ACME_BANK_FLAG
———————— ————————— —————————— ——————————————— ——————————————
1 Check [101,102] [50.00,50.00] [,]
2 Check 151 200.00 ?
3 Transfer ? ? ?

A second way to query the data, Mark continues, is to use the JSON functions built into Oracle Database 12c, most notably JSON_VALUE, which extracts the value of a specific key from a JSON document. Listing 5 shows the query he uses to extract the details for check 151. The highest key in a JSON document, Mark explains, is represented by $. Subsequent subkeys are placed in hierarchical order, separated by a period. For example, by writing $.CheckDetails.CheckNumber, Mark instructs the SQL to search starting from the top and going to the CheckDetails key and further down to CheckNumber. He directs everyone’s attention to the output of $.CheckDetails.AcmeBankFlag, which is “true” instead of null, as was the case in Listing 4. The reason is simple: “true” is a valid JSON datatype, and the JSON_VALUE function correctly parsed it. (Oracle recommends JSON_VALUE for querying JSON data stored in Oracle Database 12c.)

Code Listing 5: Retrieving JSON (data method 2)

select
json_value(trans_msg,'$.TransId') trans_id,
json_value(trans_msg,'$.Source') Source,
json_value(trans_msg,'$.CheckDetails.CheckNumber') check_num,
json_value(trans_msg,'$.CheckDetails.CheckAmount') check_amount,
json_value(trans_msg,'$.CheckDetails.AcmeBankFlag') Acme_Bank_Flag
from transactions
where json_value(trans_msg,'$.CheckDetails.CheckNumber') = 151
/
TRANS_ID SOURCE CHECK_NUM CHECK_AMOUNT ACME_BANK_FLAG
———————— ————————— —————————— ———————————— ——————————————
2 Check 151 200.00 true

Valerie, another developer, raises a concern. TRANS_ID 1 includes two checks. How do you display, instead of as an array [101,102], only the first one? That’s simple, assures Mark. Use the array counter. The counter starts at 0, so for the first subkey, Mark demonstrates the approach by executing the following to extract the first check number:

json_value(trans_msg,
'$.CheckDetails[0].CheckNumber')

The second value will be represented as [1], and so on. To show all the values, Mark uses CheckDetails[*]. (He can use specific numbers to pull specific items—for example, CheckDetails[0,2] to pull the first and third items.)


Performance

Directing his attention to Debbie, Mark asks if she has a question. Yes, she answers. What about the performance of queries using the JSON_VALUE function? Because it’s essentially a full-text search, won’t it tax the database?

That’s a valid concern, Mark concedes, but Oracle Database 12c knows there is JSON data in the column, and it can access the data with the JSON_VALUE function, so indexes can be created on those most used keys—just as with normal columns in database tables. Mark demonstrates this by creating a unique index on TransID inside the JSON document:

create unique index in_trans_trans_id
on transactions
(json_value(trans_msg,'$.TransId'));

Bitmap indexes are particularly useful in high-volume environments where the frequency of the values is low. For instance, the Acme Bank Flag key has just two possible values—true and false—to show whether a check is from Acme or not. It’s perfect for a bitmap index. Mark creates the index:

create bitmap index in_trans_acmeflag
on transactions
(json_value(trans_msg,
'$.CheckDetails.AcmeBankFlag'));

Code Listing 6: Execution plan for a query on an indexed value

SQL> explain plan for
2 select * from transactions
3 where json_value(trans_msg,'$.CheckDetails.AcmeBankFlag') = 'true';
Explained.
SQL> select * from table(dbms_xplan.display());
PLAN_TABLE_OUTPUT
——————————————————————————————————————————————————————————————————————————————
Plan hash value: 159327478
——————————————————————————————————————————————————————————————————————————————
|Id | Operation | Name | Rows | Bytes |
——————————————————————————————————————————————————————————————————————————————
| 0 | SELECT STATEMENT | | 1 | 1235 |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| TRANSACTIONS | 1 | 1235 |
| 2 | BITMAP CONVERSION TO ROWIDS | | | |
|*3 | BITMAP INDEX SINGLE VALUE | IN_TRANS_ACMEFLAG | | |
——————————————————————————————————————————————————————————————————————————————
PLAN_TABLE_OUTPUT
——————————————————————————————————————————————————————————————————————————————
Predicate Information (identified by operation id):
——————————————————————————————————————————————————————————————————————————————
3 - access(JSON_VALUE("TRANS_MSG" FORMAT JSON ,
'$.CheckDetails.AcmeBankFlag' RETURNING
VARCHAR2(4000) NULL ON ERROR)='true')


Where’s JSON?
To see which columns include JSON data, query the USER_JSON_COLUMNS view:
SQL> select * from user_json_columns;
TABLE_NAME COLUMN_NAME FORMAT DATA_TYPE
------------- ------------- --------- ----------
TRANSACTIONS TRANS_MSG TEXT CLOB
To demonstrate how the index is actually used, Mark shows the group Listing 6, the execution plan output of a query on that key. Citing the output, Mark shows everyone that the query did indeed use the IN_TRANS_ACMEFLAG index. Debbie is relieved to hear that, but she brings up another point. Sometimes users perform pattern-matching searches—for example, looking to see if an ATM in Stamford was used in any transaction. In such a case, they look for the word
Stamford in the ATMLocation subkey within the ATMDetails key; they cannot perform an exact match. Here is what the query looks like:
select
json_value(trans_msg,
'$.TransId') trans_id,
json_value(trans_msg,
'$.Source') Source
where json_textcontains (
trans_msg,
'$.ATMDetails.ATMLocation',
'Stamford'
);

Debbie is concerned that this query, being a free-format search, will adversely affect the database performance. Not at all, reassures Mark. He creates a special text index on the JSON data, using the following SQL:

create index ind_trans_01 
on transactions (trans_msg)
indextype is
ctxsys.contextparameters ('section group
CTXSYS.JSON_SECTION_GROUP
sync (on commit)');

The parameter section contains a special clause that instructs Oracle Database to index the columns as JSON sections, which facilitates free-format searches.


Conclusion

JSON is an internet communication standard because of its easy integration with JavaScript and flexible format, but it requires coding to access the data inside a JSON document. A relational database such as Oracle Database offers reliability and the power of SQL, but it demands structure. With Oracle Database 12c, there is no need to choose between JSON flexibility and database structure; you can choose both by using JSON in the database.

Next Steps

LEARN more about JSON
 json.org
bit.ly/jsonoracle

 DOWNLOAD Oracle Database 12c

 READ more Nanda



Photography by Scott Webb, Unsplash