Some comments on streaming evaluation, error handling and duplicate key names.

When
evaluating a JSON path expression (e.g. $.address[1 to 3]. zip) the query engine reads the JSON data in a streaming
manner. This means we do not have to load the entire data into memory first but
can start evaluating the path expression right away. It also means that if we
found what we’re looking for we can early terminate the evaluations For
instance if we use JSON_EXISTS to check for the presence of a field called ‘id’
we can return TRUE the moment we encountered one, we do not have to read the
rest of the JSON data.

This
implies a couple things

· It
is beneficial to place fields that are queried often to the beginning of the
JSON data since early terminating will make those queries faster

· A
syntax error in the JSON data may not be detected if early termination kicks in
since the engine may not have to read the entire JSON data. Use ‘IS JSON’ (with a
check constraint or separately in the WHERE clause) to validate your JSON data if required. Or use ‘IS NOT
JSON’ to find invalid rows and mark or remove them. If you can rely on your application to only insert syntactically correct JSON then this is not an issue for you.

· Values
from duplicate keys may get ignored.

This
brings us to the issue of duplicate key names – duplicate keys inside the same
object. Like ‘{a:true, a:false}’.

Unfortunately
the JSON specification makes no statement about those – implying that they are
not illegal. This is why IS JSON by default will not reject rows which have
duplicate keys. When I looked at other
JSON utilities (e.g. gson or http://jsonlint.com,
which is great btw) I found that if JSON data with duplicate key names is
provided then all but one key are silently dropped. This is explainable if you
imagine a JSON object being backed by some hashtable/map implementation. This
implies to me that duplicate keys are rather nasty, of no use and one should
rather stay away from them. I invite you to correct me if you disagree! So far I have not encountered any JSON in the wild that made use of duplicate keys and I would not understand the semantics if I would find one.

In order
to avoid duplicate key name we have added an option to IS JSON called ‘WITH
UNIQUE KEYS’.

SELECT
CASE WHEN (‘{a:true, a:false}’ IS JSON) THEN ‘valid’ ELSE ‘invalid’ END FROM
dual;

——-
valid

whereas

SELECT
CASE WHEN (‘{a:true, a:false}’ IS JSON WITH
UNIQUE KEYS) THEN ‘valid’ ELSE ‘invalid’ END FROM dual;

——-
Invalid

You can
add the ‘WITH UNIQUE KEYS’ option to your IS JSON check constraint but it will
add computation cost because for every object we have to keep track of the key
names we have encountered so far in order to reject a key we have seen already.
Most JSON generating application and tools may never
generate JSON with duplicate key names so this may be a rather academic
problem. Anyways, we wanted to give you a feature to check for them but it is
not active by default because it makes IS JSON slower.

Some comments on streaming evaluation, error handling and duplicate key names.

Beda Hammerschmidt

Senior Director of Development

The new SQL/JSON Query operators (Part5: JSON_TABLE, Nested Path, Ordinality Column)

Oracle Database Cloud - Exadata Service

Some comments on streaming evaluation, error handling and duplicate key names.

Authors

Beda Hammerschmidt

Senior Director of Development

The new SQL/JSON Query operators (Part5: JSON_TABLE, Nested Path, Ordinality Column)

Oracle Database Cloud - Exadata Service