X

A blog about JSON in the Oracle database Technology

  • February 10, 2020

SODA with partitioning

We often get asked about the possibility of using partitioning with SODA collections. Today, this can be done with just a bit of initial setup, by using the "mapped collection" feature of SODA. When a regular collection is created in SODA, there's a regular Oracle table that gets created under the covers. This table stores all the data in the collection. But, it's also possible to map a SODA collection to an existing table, as long as it has (at least) the id and content columns needed by the SODA collection. Such a table can have one or more partitioned columns. As of today, SODA itself doesn't yet have the ability to populate partitioned columns, although it is on the roadmap. Fortunately there are a couple of good solutions to work around this issue:

(1) Populate the partition column(s) using a trigger.   

(2) Setup the partition column(s) as a virtual column(s).

Below we show to to setup both options. 

Partitioning with a trigger

First, create a custom table that will back a SODA collection:

 

create table MYCOL ("ID" VARCHAR2(255) not null,
                    "CREATED_ON" timestamp default sys_extract_utc(SYSTIMESTAMP) not null,
                    "LAST_MODIFIED" timestamp default sys_extract_utc(SYSTIMESTAMP) not null,
                    "VERSION" varchar2(255) not null, "JSON_DOCUMENT" BLOB,
                    "ORDER_TIMESTAMP" timestamp not null,
                    primary key ("ID"),
                    check ("JSON_DOCUMENT" is json format json strict))
                    LOB("JSON_DOCUMENT") STORE AS (CACHE)
                    partition by range (ORDER_TIMESTAMP)
                      (partition p0 values less than (timestamp '2019-01-01 00:00:00'),
                       partition p1 values less than (timestamp '2020-01-01 00:00:00'),
                       partition p2 values less than (timestamp '2021-01-01 00:00:00'))

This custom table has five columns which will be automatically maintained by SODA (ID, CREATE_ON, LAST_MODIFIED, VERSION, and JSON_DOCUMENT). The above SQL corresponds to the exact 'create table' statement SODA would generate for a default collection, except that it also has an additional ORDER_TIMESTAMP column. This column is a timestamp column, and it will be used for partitioning, as specified by the partition clause. In this example, we use range partitioning, though any other partitioning scheme can be used as well. Next, let's setup a trigger to populate the ORDER_TIMESTAMP partition column:

 

CREATE OR REPLACE TRIGGER MY_TEST_TRG
BEFORE INSERT or UPDATE on "MYCOL"
FOR EACH ROW
begin
 :new.ORDER_TIMESTAMP := Json_object_T.parse(:NEW.JSON_DOCUMENT).get('timestamp').to_timestamp;
end;

For this example, we assume that documents written into a SODA collection (using any of the SODA drivers, such as Java, node.js, python, etc) contain a "timestamp" field. The trigger we created uses JSON PLSQL object types package to extract the timestamp field into our ORDER_TIMESTAMP partition column, on every document insert or update operation that occurs in SODA. Of course, we could pick a different name for the "timestamp" field in JSON. We would then adjust the get('timestamp') function in the trigger accordingly, i.e. get('myFieldNameHere').

Optionally, we might want to allow row movement (e.g. if we want to allow a row to move to a different partition, as a consequence of a SODA update):

alter table mycol enable row movement;

Next, we cover how to use a virtual column instead of this trigger approach. After that, we show how to create a mapped collection on top of the table we've created, and verify that the whole thing works correctly.

Partitioning with a virtual column

Alternatively, a virtual column can be used as a partitioning column. The table definition would look as follows:

 

create table MYCOL ("ID" VARCHAR2(255) not null,
                    "CREATED_ON" timestamp default sys_extract_utc(SYSTIMESTAMP) not null,
                    "LAST_MODIFIED" timestamp default sys_extract_utc(SYSTIMESTAMP) not null,
                    "VERSION" varchar2(255) not null,
                    "JSON_DOCUMENT" BLOB,
                    "ORDER_TIMESTAMP" timestamp generated always as
                      (json_value("JSON_DOCUMENT", '$.timestamp' returning timestamp)) not null,
                    primary key ("ID"),
                    check ("JSON_DOCUMENT" is json format json strict))
                    LOB("JSON_DOCUMENT") STORE AS (CACHE)
                    partition by range(ORDER_TIMESTAMP)
                      (partition p0 values less than (timestamp '2019-01-01 00:00:00'),
                       partition p1 values less than (timestamp '2020-01-01 00:00:00'),
                       partition p2 values less than (timestamp '2021-01-01 00:00:00'))

json_value("JSON_DOCUMENT", '$.timestamp' returning timestamp)) not null

This expression will extract the value of the timestamp field in the JSON documents written through SODA. The range partitioning scheme used in this table is the same as what we've used in the trigger-based approach above. Of course, here too, any other partitioning scheme can be used.

Creating a mapped collection and verifying that it works

Now that we've created our custom collection table with a partition column populated by a trigger or as a virtual column, we create a mapped SODA collection on top of this table. This can be done from any SODA implemenation. Here's how to do it from PLSQL:

 

declare
   METADATA varchar2(8000);
   COL SODA_COLLECTION_T;
begin
   METADATA := '{"tableName" : "MYCOL",
                 "keyColumn":{"name":"ID",
                              "sqlType":"VARCHAR2",
                              "maxLength":255,
                              "assignmentMethod":"UUID"},
                 "contentColumn":{"name":"JSON_DOCUMENT",
                                  "sqlType":"BLOB",
                                  "compress":"NONE",
                                  "cache":true,
                                  "encrypt":"NONE",
                                  "validation":"STANDARD"},
                 "versionColumn":{"name":"VERSION",
                                  "method":"SHA256"},
                 "lastModifiedColumn":{"name":"LAST_MODIFIED"},
                 "creationTimeColumn":{"name":"CREATED_ON"},
                 "readOnly":false}';
   -- Create a collection using "map" mode, based on the table we've created above and specified in
   -- the custom metadata under "tableName" field.
   COL := dbms_soda.create_collection('MYCOL', METADATA, DBMS_SODA.CREATE_MODE_MAP);
end;

Here, we supply custom metadata, which describes the underlying table with five columns, to the create_collection method. Note that the table name is specified in the metadata, under the "tableName" field.

Now we can supply documents with a timestamp field, e.g. {"order" : "coffee", "timestamp" : "2020-01-01T01:01:01"}, to SODA insert or replace operations, and the partitioning column will be populated by the underlying trigger. For example, here's a code snippet from a SODA Java program which inserts such a document:

 

 
       OracleRDBMSClient cl = new OracleRDBMSClient();
       // JDBC connection
       Connection conn = ...;
       OracleDatabase db = cl.getDatabase(conn);
       // Open the collection we've created above
       OracleCollection col = db.openCollection("MYCOL");
       // Insert a document with a timestamp field (note: value must be in ISO8601 format!
       OracleDocument d = db.createDocumentFromString("{\"order\" : \"coffee\", \"timestamp\" : \"2017-01-01T00:00:00\"}");
       col.insertAndGet(d);


Note that the timestamp value in the JSON must be in the ISO8601 format. This is required by the PLSQL object types, that are used to extract this value into the timestamp partition column in the trigger approach, as well as the json_value used in the virtual column approach. After this Java code runs, you can verify that the underlying mycol table has a value in the ORDER_TIMESTAMP column:

SQL> select order_timestamp from mycol;

ORDER_TIMESTAMP
---------------------------------------------------------------------------
01-JAN-20 12.00.00.000000 AM

Note that partition columns are not yet usable from SODA QBEs (queries-by-example). That's on our roadmap. However, you can do all the usual SQL partition operations, such as dropping partitions, on tables backing SODA collections, such as the ones shown above. And you can of course run SQL queries (with or without SQL/JSON operators) on such tables, and, as usual, such queries can make use of partition columns.

One other thing to be aware of. When you drop a regular, non-mapped collection, you should never drop the table from SQL directly. Instead, drop the collection using a SODA API. For example, from PLSQL:

select dbms_soda.drop_collection('myCollectionName') from dual;

That will also drop the underlying table. When you drop a mapped collection, however, after dropping the collection from a SODA API, the underlying table will remain. So, for a mapped collection, if you want to drop the underlying table, it has to be dropped explicitly using SQL, *after* dropping the mapped collection from a SODA API. We don't drop the underlying table along with the mapped collection, because it could be a preexisting table that the user might not want to drop (as opposed to a table created by SODA itself, in the case of a regular, non-mapped, collection).

 

 

 

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.