X

A blog about JSON in the Oracle database Technology

  • May 27, 2015

Querying Facebook JSON data with JSON_TABLE

Beda Hammerschmidt
Consulting (Coding) Member of Technical Staff

With the Java utility from the last posting we can pull tons of JSON data from the Facebook JSON API.
Let's use JSON_TABLE and SQL to extract useful information! 

In my example I start with just 83 rows!

select count(1) from fb_tab;

  COUNT(1)
----------
83

Facebook did not allow me to search further in the past and I did not want to mix postings from several sources. It seems if you want to acquire more data from one source you need to periodically pull data from the API.

Anyways, every row has multiple postings, so let's see how many we have:

SELECT count(1)
  FROM fb_tab, JSON_TABLE(col, '$.data[*]' COLUMNS (
             "Message" PATH '$.message'
)) "JT";

    COUNT(1)
----------
      1325

Since users can like or comment each posting let's see how many 'reactions' these postings caused:

SELECT count(1)
FROM fb_tab,
    JSON_TABLE(col, '$.data[*]' COLUMNS (
       "Message" PATH '$.message',
NESTED PATH '$.likes.data[*]' COLUMNS (
         "Author_l" VARCHAR2(20) PATH '$.id'),
NESTED PATH '$.comments.data[*]' COLUMNS (
         "Author_c" VARCHAR2(20) PATH '$.from.id')
)) "JT";

  COUNT(1)
----------
     39910

So, our 83 rows actually contained almost 40k user interactions. I think this is a great example to illustrate the high information density of JSON. If one had normalized the data using a conventional ER model leading to multiple tables and relationships one would have ended with 40k rows - a factor of almost 500!

Let's do some basic analysis using SQL: What percentage of user reactions are 'likes' versus 'comments'?

SELECT (COMMENTS/TOTAL*100) || '%' "COMMENTS",
       (LIKES/TOTAL*100) || '%' "LIKES"
FROM (
  SELECT count(1) "TOTAL",
         count("Author_l") "LIKES",
         count("Author_c") "COMMENTS"
  FROM fb_tab,
       JSON_TABLE(col, '$.data[*]' COLUMNS (
          "Message" PATH '$.message',
          NESTED PATH '$.likes.data[*]' COLUMNS (
            "Author_l" VARCHAR2(20) PATH '$.id'),
          NESTED PATH '$.comments.data[*]' COLUMNS (
             "Author_c" VARCHAR2(20) PATH '$.from.id')
       )) "JT"
);

COMMENTS LIKES
---------------- -----------
19.76%         80.21%

You can see in this example that we make use of the 'nested path' to drill into the nested array. This was explained in a previous posting.
The example uses a subquery to first do the counting. The outer query then does the calculations to come up with the percentages.

The query results show us that it is obviously  much easier to get someone to click a 'like' button that to write a comment. 

Next step, let's invent a little score: a like counts 1 and a comment counts 4; the score is the sum of both.
Now let's find the top 10 postings based on that score:

SELECT max("Message"),
       (count("Author_l") +  4 * count("Author_c")) "SCORE"
FROM fb_tab
     JSON_TABLE(col, '$.data[*]' COLUMNS (
      "Message" PATH '$.message',
      o1 FOR ORDINALITY,
NESTED PATH '$.likes.data[*]' COLUMNS (
       "Author_l" VARCHAR2(20) PATH '$.id'),
NESTED PATH '$.comments.data[*]' COLUMNS (
       "Author_c" VARCHAR2(20) PATH '$.from.id')
)) "JT"
GROUP BY fb_tab.ROWID, jt.o1
ORDER BY SCORE DESC
FETCH FIRST 10 ROWS ONLY;

This SQL statement does a bunch of things: Instead of counting all likes and comments we count them per posting. We use GROUP BY to express this. A group represents one posting and it is represented by the ROW ID (identifying the JSON instance; we could have chose to have another ID column in the table instead) and the position of the posting with the JSON instance (one instance/row contains up to 25 postings). We then calculate the score in the select list, order by the score and limit the result to 10 rows. 

Lesson learned today: JSON_TABLE takes your JSON data and flattens it to the relational model so that you can use all of SQL to run (analytic) queries. 

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.