X
  • Monday, May 18, 2015

The new SQL/JSON Query operators (Part5: JSON_TABLE, Nested Path, Ordinality Column)

By: Beda Hammerschmidt | Consulting (Coding) Member of Technical Staff

The examples in the previous posting showed how JSON_TABLE iterated
over a single array and returned JSON values as column values. This raises one
question: What if arrays are nested?

The sample JSON data (pulled from the Facebook Graph API) contains and multiple
postings (array) and each posting can have multiple comments or ‘likes’ (both arrays)
.

Other data may have more levels of nesting since JSON allows arbitrary nesting
of array.

So how do we drill down into nested array? The answer is to
add a NESTED PATH with a new COLUMNS clause to the JSON_TABLE:

SELECT jt.*
   FROM fb_tab,
        JSON_TABLE(col, '$.data[*]' COLUMNS (
          "Message" PATH '$.message',
          "Type" VARCHAR2(20) PATH '$.type',
          "ShareCount" NUMBER PATH '$.shares.count' DEFAULT 0 ON
ERROR,
          NESTED PATH '$.likes.data[*]' COLUMNS (
            "Author" PATH
'$.name'
))) "JT";

Message                           Type ShareCount Author
--------------------------------------------------------------------------------
How are Baxters Food Group and... link      0    Starup Haji
How are Baxters Food Group and... link      0    Elaine Dala
It's important to have an IT a... link      5    Asal Alibiga
It's important to have an IT a... link      5    Hasan Reni

You can see that if a nested array produces rows then values are repeated for the outer columns. In the example this means that 'Message', 'Link' and 'ShareCount' are repeated for all likes of the same posting. Makes sense because the likes refer to the posting.

(The
JOIN between the inner and the outer COLUMNS clause is a so called ‘OUTER
JOIN’. This means that data which has no nested array (where the NESTED PATH
does not select anything) will not be suppressed - all columns of the inner COLUMNS clause have
NULL values instead.)

As you
can see the nested (inner) item is now responsible for the number of row; we
see a new row for each item in the inner array. Values for the outer COLUMNS
clause are getting repeated because they are the same for all values of the
inner array (i.e. all comments refer to the same posting).

Sibling arrays 

Now what
happens if multiple arrays are not nested but on the same level? In the
Facebook example this would apply for the array of ‘likes’ and the array of
‘comments’. Both arrays are nested under the posting but they’re both on the
same level. We call them ‘sibling arrays’.

Semantically,
sibling arrays represent different ‘things’: a like at position X has nothing
to do with a comment at the same position X. This is why we return values for
sibling arrays in different rows with only one sibling array at the time
returning column value. Thus the total
number of returned rows is the sum of the items in the sibling array and not
the (Cartesian) product. (The Join between the sibling arrays is a UNION join.)

SELECT
"Message", "Author_l",
"Author_c"
FROM fb_tab,
     JSON_TABLE(col, '$.data[*]' COLUMNS (
       "Message" PATH '$.message',
NESTED PATH '$.likes.data[*]' COLUMNS (
         "Author_l"
VARCHAR2(20) PATH '$.name'
),
NESTED PATH '$.comments.data[*]' COLUMNS (
         "Author_c"
VARCHAR2(20) PATH '$.from.name'
)
)) "JT";

Message           Author_l   Author_c
--------------------------------- --------------- --------------------
How are Baxters Food Group and... Starup Haji
How are Baxters Food Group and... Elaine Dala
How are Baxters Food Group and... Asal Alibiga
It's important to have an IT a... Hasan Reni
It's important to have an IT a...                 Cesar Sanchez


Ordinality Column

Let’s assume you have several nested array. How do we keep
track of the hierarchy when unnesting the data? Or asked differently: For two
or more column value originating from an inner array how do we now if they
belong to the same (or different) outer values? In the example above the repeating message may hint the common parent but what if there were duplicates? How could we distinguish them? 

Here we need an
artificial example to illustrate the point: The following two JSON instances are slightly different, they contain the same scalar values but the hierarchy is different. Look at the 'd' values in both instances: 

Data instance 1:
{"a":
"a1",

 "b": [
        {"c": "c1",
         "d": [1,2]}
      ]
}

Data instance 2

{"a":
"a1",
 "b": [
        {"c": "c1",
         "d": [1]},
        {"c": "c1",
         "d": [2]}
       ]
}

When projecting out all scalar values one get’s this table
with 2 rows and 3 columns for both instances

SELECT jt.*
FROM ord_tab,  
     JSON_TABLE(col, '$' COLUMNS (
       "a" VARCHAR2(5) PATH '$.a',
       NESTED PATH '$.b[*]' COLUMNS (
          "c" VARCHAR2(5) PATH '$.c',
          NESTED PATH '$.d[*]' COLUMNS (
             "d" NUMBER PATH '$'
          )
       )
     )) jt;

a     c     d
----- ----- ----------
a1    c1     1
a1    c1     2

a1    c1      1
a1    c1     2

 Clearly, after this relational projection we would not be
able to differentiate between the two instances. What we lost is the hierarchy information:
we do not know if the values for D are under the same or different ‘parents’ ,
i.e enclosing object.

To preserve this information we can add another column to each columns clause with
a sequence number that is auto incremented for every new array item within the same parent. When the parent changes then sequence starts over again.

SELECT jt.*
FROM ord_tab,  
     JSON_TABLE(col, '$' COLUMNS (
       "a" VARCHAR2(5) PATH '$.a',
       NESTED PATH '$.b[*]' COLUMNS (
          "c" VARCHAR2(5) PATH '$.c',
"seq" FOR ORDINALITY,
          NESTED PATH '$.d[*]' COLUMNS (
             "d" NUMBER PATH '$'
          )
       )
     )) jt;

a     c seq d
----- ----- ---------- ----------
a1    c1 1 1
a1    c1 1 2

a1    c1 1 1
a1    c1 2 2

Now, but looking at the column 'seq' one can see that the first two values for 'd' are under the same parent. The second two values for 'd' are under separate parents.
This way we can preserve the hierarchy information. One can add an ordinality column into every columns clause.

We invite you to give us feedback; for instance as a comment under each posting or in our LinkedIn group (https://www.linkedin.com/grp/home?gid=6966766). You can also follow the Twitter hashtag #OracleJSON and #OracleDocDB.

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha