KGC 2020 Tutorial: Modeling Evolving Data in Graphs While Preserving Backward Compatibility

We are giving a tutorial on modeling evolving graph data on May 5 2020 at the Knowledge Graph Conference (KGC 2020)
https://www.knowledgegraph.tech/the-knowledge-graph-conference-kgc/workshops-and-tutorials/#1580503925986-452b8bbf-6757

Tutorial slides available here: here

The tutorial will include several live demonstrations. This blog post describes how to set up the evnironment used during the tutorial for those who want to follow along or try things out themselves. All Oracle software used is free for evaluation purposes.

Setting up Virtual Box and Importing the VM

Download and Install Virtual Box:
https://www.virtualbox.org/

Download Pre-built DB Developer Day VM (7.3 GB):
https://www.oracle.com/database/technologies/databaseappdev-vm.html

Follow directions to import the VM at the bottom of the download page (~ 30 GB total with data).

Downloading Software and Data

After importing the Developer Day VM, startup the VM, open a Firefox browser window inside the VM (Applications->Favorites->Firefox) and download the following Oracle software.

Download the latest Oracle Jena Adapter from OTN
(Oracle Database 19c, 18c, and 12c Support for Apache Jena 3.1.0, Apache Jena Fuseki 2.4, and Protégé Desktop 5.2):
https://www.oracle.com/database/technologies/semantic-technologies-downloads.html

Download the latest Oracle Graph Server and Client from OTN
(Oracle Graph Server)
https://www.oracle.com/database/technologies/spatialandgraph/property-graph-features/graph-server-and-client/graph-server-and-client-downloads.html

Download Java 11, which is needed to run JShell (jdk-11.0.6_linux-x64_bin.tar.gz).
https://www.oracle.com/java/technologies/javase-jdk11-downloads.html

This tutorial will use some RDF data about movies. Specifically, it will use the Linked Movie Data Base dataset from a University of Toronto project described in this paper http://events.linkeddata.org/ldow2009/papers/ldow2009_paper12.pdf.

An RDF dump of this dataset is available here: http://www.cs.toronto.edu/~oktie/linkedmdb/

We will use the 3 million triple dataset (linkedmdb-18-05-2009-dump.tar.gz) for this demo. Follow the link and download the dataset using Firefox in your VM.

Configuring the Environment

1. Configuring Support for Apache Jena

Open a terminal (Applications->Favorites->Terminal) and execute the following steps.

[oracle@localhost oracle]$ cd /home/oracle
[oracle@localhost oracle]$ mkdir RDF
[oracle@localhost oracle]$ cd RDF
[oracle@localhost RDF]$ mkdir Jena
[oracle@localhost RDF]$ cd Jena/
[oracle@localhost Jena]$ mv ~/Downloads/Oracle19c_Jena-3.1.0_Build_20191127.zip .
[oracle@localhost Jena]$ ls
Oracle19c_Jena-3.1.0_Build_20191127.zip
[oracle@localhost Jena]$ unzip Oracle19c_Jena-3.1.0_Build_20191127.zip 
Archive:  Oracle19c_Jena-3.1.0_Build_20191127.zip
   creating: joseki/
  inflating: joseki/application.xml  
  …
  creating: fuseki/run/logs/
   creating: fuseki/run/system_files/
[oracle@localhost Jena]$ ls
bin             jar                                      protege_plugin
bug_notes.txt   javadoc                                  README
examples        joseki                                   release_notes.txt
fuseki          joseki-web-app                           sparqlgateway-web-app
fuseki-web-app  Oracle19c_Jena-3.1.0_Build_20191127.zip  version.txt

2. Configuring Oracle Graph Server and Client

2.1 Installing Oracle Graph Server and Client

Execute the following steps in a terminal.

[oracle@localhost Jena]$ cd /home/oracle/RDF/
[oracle@localhost RDF]$ ls
Jena
[oracle@localhost RDF]$ mv ~/Downloads/oracle-graph-20.1.0.x86_64.rpm .
[oracle@localhost RDF]$ ls
Jena  oracle-graph-20.1.0.x86_64.rpm
[oracle@localhost RDF]$ sudo rpm -i oracle-graph-20.1.0.x86_64.rpm 
[sudo] password for oracle: oracle
[oracle@localhost RDF]$ ls /etc/oracle/graph
log4j2-server.xml  log4j2.xml  pgx.conf  server.auth.conf  server.conf
[oracle@localhost RDF]$ ls /opt/oracle/graph
bin  COPYRIGHT  data  doc  examples  graphviz  hadoop  lib  pgx  scripts
[oracle@localhost RDF]$ sudo chown oracle /opt/oracle/graph/pgx/tmp_data/
[sudo] password for oracle:

2.2 Setting up Java 11

Execute the following commands from a terminal.

[oracle@localhost RDF]$ cd /home/oracle/RDF/
[oracle@localhost RDF]$ mv ~/Downloads/jdk-11.0.6_linux-x64_bin.tar.gz .
[oracle@localhost RDF]$ ls
jdk-11.0.6_linux-x64_bin.tar.gz  Jena  oracle-graph-20.1.0.x86_64.rpm
[oracle@localhost RDF]$ tar -xzf jdk-11.0.6_linux-x64_bin.tar.gz 
[oracle@localhost RDF]$ ls
jdk-11.0.6                       Jena
jdk-11.0.6_linux-x64_bin.tar.gz  oracle-graph-20.1.0.x86_64.rpm
[oracle@localhost RDF]$

3 Setting up RDF Knowledge Graph in Oracle Database

Oracle Database 19.3 should already be installed and running when you start the VM.

First, we will create a directory in /usr to store a tablespace for our RDF data.

Open a terminal and execute the following commands.

[oracle@localhost ~]$ cd /usr/
[oracle@localhost usr]$ sudo mkdir dbs
[sudo] password for oracle: 
[oracle@localhost usr]$ ls -l
total 4
dr-xr-xr-x. 1 root root 31422 May 31  2019 bin
drwxr-xr-x. 1 root root     0 May  1 15:01 dbs
drwxr-xr-x. 1 root root     0 Apr 10  2018 etc
drwxr-xr-x. 1 root root     0 Apr 10  2018 games
drwxr-xr-x. 1 root root  2626 May 31  2019 include
dr-xr-xr-x. 1 root root  4488 May 31  2019 lib
dr-xr-xr-x. 1 root root 76110 May 31  2019 lib64
drwxr-xr-x. 1 root root  8672 May 31  2019 libexec
drwxr-xr-x. 1 root root    90 Apr 10  2018 local
dr-xr-xr-x. 1 root root 12296 May  1 09:48 sbin
drwxr-xr-x. 1 root root  4080 May 31  2019 share
drwxr-xr-x. 1 root root    24 May 31  2019 src
lrwxrwxrwx. 1 root root    10 May 31  2019 tmp -> ../var/tmp
[oracle@localhost usr]$ sudo chown oracle dbs
[oracle@localhost usr]$ ls -l
total 4
dr-xr-xr-x. 1 root   root 31422 May 31  2019 bin
drwxr-xr-x. 1 oracle root     0 May  1 15:01 dbs
drwxr-xr-x. 1 root   root     0 Apr 10  2018 etc
drwxr-xr-x. 1 root   root     0 Apr 10  2018 games
drwxr-xr-x. 1 root   root  2626 May 31  2019 include
dr-xr-xr-x. 1 root   root  4488 May 31  2019 lib
dr-xr-xr-x. 1 root   root 76110 May 31  2019 lib64
drwxr-xr-x. 1 root   root  8672 May 31  2019 libexec
drwxr-xr-x. 1 root   root    90 Apr 10  2018 local
dr-xr-xr-x. 1 root   root 12296 May  1 09:48 sbin
drwxr-xr-x. 1 root   root  4080 May 31  2019 share
drwxr-xr-x. 1 root   root    24 May 31  2019 src
lrwxrwxrwx. 1 root   root    10 May 31  2019 tmp -> ../var/tmp
[oracle@localhost usr]$

To use the RDF Knowledge Graph feature, we need to create a database user and an RDF network.

Open a text editor (Applications->Accessories->Text Editor) and copy/paste the following SQL script.

set echo on;
set serverout on;
set linesize 160;
set pagesize 10000;
set timing on;
 
conn sys/oracle as sysdba;
alter session set container=orcl;
 
-- create rdfuser
grant connect,resource,create view to rdfuser identified by rdfuser;

-- create network and temporary tablespaces
-- using /usr/dbs/ instead of /u01/app/... because /usr is mounted on a different
-- virtual disk that has more free space
create tablespace RDFTBS datafile '/usr/dbs/rdftbs.dbf' size 512M reuse AUTOEXTEND ON NEXT 128M MAXSIZE 10G segment space management auto;
 
create temporary tablespace TEMPTBS tempfile '/usr/dbs/temptbs.dbf' size 128M reuse AUTOEXTEND ON NEXT 128M MAXSIZE 5G;
 
alter database default temporary tablespace TEMPTBS;

-- grant tablespace quotas
alter user rdfuser quota unlimited on RDFTBS; 
alter user rdfuser default tablespace RDFTBS;
alter user MDSYS quota unlimited on RDFTBS;

-- create semantic network
conn system/oracle@orcl;
exec sem_apis.create_sem_network('RDFTBS');
 
-- setup indexes
-- drop PSCGM index
exec sem_apis.drop_sem_index('PSCGM');
 
-- add SPCGM index
exec sem_apis.add_sem_index('SPCGM');
 
-- add CPSGM index
exec sem_apis.add_sem_index('CSPGM');

-- create a bulk load event trace table in rdfuser’s schema
conn rdfuser/rdfuser@orcl;
CREATE TABLE RDF$ET_TAB (
  proc_sid VARCHAR2(30), 
  proc_sig VARCHAR2(200),
  event_name varchar2(200),
  start_time timestamp,
  end_time timestamp,
  start_comment varchar2(1000) DEFAULT NULL,
  end_comment varchar2(1000) DEFAULT NULL
);

-- Grant privileges on RDF$ET_TAB to MDSYS
GRANT INSERT, UPDATE on RDF$ET_TAB to MDSYS;

Save the file as rdf_network_setup.sql in /home/oracle/RDF.

Double-click the sqlcl icon on the Desktop to start sqlcl. Use system/oracle for the username and password.

SQLcl: Release 19.1 Production on Fri May 01 15:23:10 2020

Copyright (c) 1982, 2020, Oracle.  All rights reserved.

Username? (''?) system
Password? (**********?) ******
Last Successful login time: Fri May 01 2020 15:23:17 -04:00

Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0


SQL>

Now execute the rdf_network_setup.sql script that we created.

SQL> @ /home/oracle/RDF/rdf_network_setup.sql 
SQL> set serverout on;
SQL> set linesize 160;
SQL> set pagesize 10000;
SQL> set timing on;
SQL> 
SQL> conn sys/oracle as sysdba;
Connected.
SQL> alter session set container=orcl;

Session altered.

Elapsed: 00:00:00.016
SQL> 
SQL> -- create rdfuser
SQL> grant connect,resource,create view to rdfuser identified by rdfuser;

Grant succeeded.

Elapsed: 00:00:00.077
SQL> 
SQL> -- create network and temporary tablespaces
SQL> -- using /usr/dbs/ instead of /u01/app/... because /usr is mounted on a different
SQL> -- virtual disk that has more free space
SQL> create tablespace RDFTBS datafile '/usr/dbs/rdftbs.dbf' size 512M reuse AUTOEXTEND ON NEXT 128M MAXSIZE 10G segment space management auto;

Tablespace created.

Elapsed: 00:00:01.372
SQL> 
SQL> create temporary tablespace TEMPTBS tempfile '/usr/dbs/temptbs.dbf' size 128M reuse AUTOEXTEND ON NEXT 128M MAXSIZE 5G;

Tablespace created.

Elapsed: 00:00:00.069
SQL> 
SQL> alter database default temporary tablespace TEMPTBS;

Database altered.

Elapsed: 00:00:00.030
SQL> 
SQL> -- grant tablespace quotas
SQL> alter user rdfuser quota unlimited on RDFTBS;

User altered.

Elapsed: 00:00:00.017
SQL> alter user rdfuser default tablespace RDFTBS;

User altered.

Elapsed: 00:00:00.010
SQL> alter user MDSYS quota unlimited on RDFTBS;

User altered.

Elapsed: 00:00:00.018
SQL> 
SQL> -- create semantic network
SQL> conn system/oracle@orcl;
Connected.
SQL> exec sem_apis.create_sem_network('RDFTBS');

PL/SQL procedure successfully completed.

Elapsed: 00:00:19.331
SQL> 
SQL> -- setup indexes
SQL> -- drop PSCGM index
SQL> exec sem_apis.drop_sem_index('PSCGM');

PL/SQL procedure successfully completed.

Elapsed: 00:00:01.770
SQL> 
SQL> -- add SPCGM index
SQL> exec sem_apis.add_sem_index('SPCGM');

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.472
SQL> 
SQL> -- add CPSGM index
SQL> exec sem_apis.add_sem_index('CSPGM');

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.056
SQL> 
Elapsed: 00:00:00.058
SQL>
SQL> 
SQL> -- create a bulk load event trace table in rdfuser’s schema
SQL> conn rdfuser/rdfuser@orcl;
Connected.
SQL> CREATE TABLE RDF$ET_TAB (
  2    proc_sid VARCHAR2(30), 
  3    proc_sig VARCHAR2(200),
  4    event_name varchar2(200),
  5    start_time timestamp,
  6    end_time timestamp,
  7    start_comment varchar2(1000) DEFAULT NULL,
  8    end_comment varchar2(1000) DEFAULT NULL
  9  );

Table created.

Elapsed: 00:00:00.032
SQL> 
SQL> -- Grant privileges on RDF$ET_TAB to MDSYS
SQL> GRANT INSERT, UPDATE on RDF$ET_TAB to MDSYS;

Grant succeeded.

Elapsed: 00:00:00.006
SQL>

Loading the Linked Movie Data Base RDF data.

We will use the orardfldr command line utility included with Support for Apache Jena.

First, prepare the LMDB data for loading. Open a terminal and execute the following commands.

[oracle@localhost ~]$ cd /home/oracle/RDF
[oracle@localhost RDF]$ mkdir LMDB
[oracle@localhost RDF]$ cd LMDB
[oracle@localhost LMDB]$ mkdir DATA
[oracle@localhost LMDB]$ cd DATA/
[oracle@localhost DATA]$ mv ~/Downloads/linkedmdb-18-05-2009-dump.tar.gz .
[oracle@localhost DATA]$ ls
linkedmdb-18-05-2009-dump.tar.gz
[oracle@localhost DATA]$ tar -xzf linkedmdb-18-05-2009-dump.tar.gz 
[oracle@localhost DATA]$ ls
dump.nt  linkedmdb-18-05-2009-dump.tar.gz
[oracle@localhost DATA]$ rm -f linkedmdb-18-05-2009-dump.tar.gz 
[oracle@localhost DATA]$ ls
dump.nt

Some URIs in the LMDB dataset have invalid characters that should be URL encoded (“, }, and `). We will use perl to do this URL encoding before loading the data. Execute the following commands from a terminal.

[oracle@localhost DATA]$ cd /home/oracle/RDF/LMDB/DATA/
[oracle@localhost DATA]$ ls
dump.nt
[oracle@localhost DATA]$ perl -pi -e 's/\<([^\>]*)"([^\>]*)\>/\<$1%22$2\>/g' dump.nt
[oracle@localhost DATA]$ perl -pi -e 's/\<([^\>]*)"([^\>]*)\>/\<$1%22$2\>/g' dump.nt
[oracle@localhost DATA]$ perl -pi -e 's/\<([^\>]*)\}([^\>]*)\>/\<$1%7D$2\>/g' dump.nt
[oracle@localhost DATA]$ perl -pi -e 's/\<([^\>]*) ([^\>]*)\>/\<$1%20$2\>/g' dump.nt
[oracle@localhost DATA]$ perl -pi -e 's/\<([^\>]*)`([^\>]*)\>/\<$1%60$2\>/g' dump.nt
[oracle@localhost DATA]$ perl -pi -e 's/\<([^\>]*)`([^\>]*)\>/\<$1%60$2\>/g' dump.nt
[oracle@localhost DATA]$ perl -pi -e 's/\<([^\>]*)`([^\>]*)\>/\<$1%60$2\>/g' dump.nt
[oracle@localhost DATA]$

Now we can setup the orardfldr utility. First, we need to define an $ORACLE_JENA_HOME environment variable and add $ORACLE_JENA_HOME/bin to our PATH. Run the following commands in a terminal.

[oracle@localhost DATA]$ cd /home/oracle/RDF/LMDB
[oracle@localhost LMDB]$ export ORACLE_JENA_HOME=/home/oracle/RDF/Jena
[oracle@localhost LMDB]$ export PATH=$PATH:$ORACLE_JENA_HOME/bin
[oracle@localhost LMDB]$ orardfldr --help
Usage: orardfldr --modelName=NAME --fileDir=DIR --lang=FMT --jdbcUrl=URL 
       --user=USER --password=PASSWORD [--bulkLoadFlags="FLAGS"]
       [--networkName=NAME] [--networkOwner="OWNER"] [--numThreads=INT]
       [--rebuildAppTabIdx=BOOLEAN] [--truncateStgTab=BOOLEAN]
Performs an RDF bulk load of all files in a directory into Oracle Database.

  Mandatory Arguments:
    --modelName=NAME             Target semantic model for the load
    --fileDir=DIR                Data directory for the load
    --lang=FMT                   Data format (N-TRIPLE, RDF/XML, TURTLE,
                                 N-QUADS, TRIG)
    --jdbcUrl=URL                JDBC url for the target database
    --user="USER"                Database user name (case sensitive)
    --password=PASSWORD          Database user password

  Optional Arguments:
    --bulkLoadFlags="FLAGS"      Flags string to use with
                                 SEM_APIS.BULK_LOAD_FROM_STAGING_TABLE
    --networkName=NAME           Name of the semantic network to load into
                                 (Default MDSYS network if omitted)
    --networkOwner="OWNER"       Owner of the semantic network to load into
                                 (case sensitive, MDSYS is used if omitted)
    --numThreads=INT             Number of threads to use when populating
                                 the staging table (default 1)
    --rebuildAppTabIdx=BOOLEAN   TRUE (default) to do bottom-up rebuilding
                                 of application table indexes
    --truncateStgTab=BOOLEAN     TRUE (default) to truncate pre-existing
                                 data in the staging table

[oracle@localhost LMDB]$

Next, invoke orardfldr on the /home/oracle/RDF/LMDB/DATA directory to load the LMDB N-triple file. This will create an RDF model named LMDB and perform a bulk load into the model. Expect this step to take up to 10 min depending on your VM resources.

[oracle@localhost LMDB]$ cd /home/oracle/RDF/LMDB
[oracle@localhost LMDB]$ orardfldr --modelName=LMDB --fileDir=./DATA --lang=N-TRIPLE --jdbcUrl=jdbc:oracle:thin:@localhost:1521/orcl --user=RDFUSER --password=rdfuser --bulkLoadFlags="MBV_METHOD=SHADOW PARALLEL=2"
loadRDF: enabling parallel DDL/DML/query for session [0]
loadRDF: truncating staging tables
loadRDF in parallel: PrepareWorker [0] running
PrepareWorker: thread [0] process to 10000
PrepareWorker: thread [0] process to 20000
…
PrepareWorker: thread [0] process to 3560000
PrepareWorker: thread [0] process to 3570000
PrepareWorker: thread [0] done to 0, file = ./DATA/dump.nt in (ms) 126155
loadRDF: preparing for bulk load
loadRDF: setting application table index RDFUSER.LMDB_IDX to UNUSABLE
loadRDF: starting bulk load
loadRDF: bulk load flags="MBV_METHOD=SHADOW PARALLEL=2"
loadRDF: bulk load completed in (ms) 541741
loadRDF: rebuilding application table index RDFUSER.LMDB_IDX
loadRDF: finished rebuilding application table index RDFUSER.LMDB_IDX in (ms) 33639
[oracle@localhost LMDB]$

Next we need to gather statistics for the RDF data we loaded. Open sqlcl by double-clicking on the sqlcl icon on the desktop. Use system/oracle for the user and password. Then execute the following commands to gather statistics.

SQLcl: Release 19.1 Production on Fri May 01 17:23:44 2020

Copyright (c) 1982, 2020, Oracle.  All rights reserved.

Username? (''?) system
Password? (**********?) ******
Last Successful login time: Fri May 01 2020 17:23:51 -04:00

Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0


SQL> exec sem_perf.gather_stats(estimate_percent=>20, degree=>2);

PL/SQL procedure successfully completed.

SQL>

Using SQL Developer

Now we can open SQL Developer and configure the RDF Semantic Graph component.

Double-click the SQL Developer icon on the Desktop.

First, we need to create connections for SYSTEM and RDFUSER. Click on the green plus sign on the top left to create a new connection.

Fill in the form using system/oracle, localhost, 1521 and orcl. Click test to verify your information and then click save. Note that orcl is the Service name not SID, which is selected by default.

Next, fill in the form with rdfuser/rdfuser, localhost, 1521 and orcl. Click test to verify your information and then click save to save the connection. Note that orcl is the Service name not SID, which is selected by default.

Before using the RDF Semantic Graph component of SQL Developer, we need to enable some features in the database. Expand the system connection in the connection tree and right-click on RDF Semantic Graph.

Select Setup RDF Semantic Network. Then click Apply in the dialog that opens.

Now the RDF Semantic Graph plugin for SQL Developer is ready to use on this database. This operation needs to be performed only once for each database.

Running SPARQL queries against the LMDB RDF Dataset

Expand the rdfuser connection in the connection tree of SQL Developer. Then expand RDF Semantic Graph, Networks, MDSYS, and Regular Models. Now click on LMDB under Regular Models. This will open a SPARQL editor for LMDB.

Now we can execute several example SPARQL queries such as the ones below. Click the green triangle to run the query. Also, the Templates drop down lists several templates for common query patterns that can be loaded into the editor.

A complete list of queries used in the tutorial is available here

# select modifiers
# Distinct genres of movies with Kenau Reeves
PREFIX  rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX  owl: <http://www.w3.org/2002/07/owl#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX movie: <http://data.linkedmdb.org/movie/>

SELECT DISTINCT ?gname
WHERE { ?movie movie:actor ?actor .
        ?actor movie:actor_name "Keanu Reeves" .
        ?movie movie:genre ?genre .
        ?genre movie:film_genre_name ?gname .
}

# Aggregates
# Find the 10 movie series with the most movies
PREFIX  rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX  owl: <http://www.w3.org/2002/07/owl#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX movie: <http://data.linkedmdb.org/movie/>

SELECT ?sname (COUNT(?movie) AS ?mcnt)
WHERE {
  ?movie movie:film_series ?series .
  ?series movie:film_series_name ?sname .
}
GROUP BY ?sname
ORDER BY DESC(?mcnt)
LIMIT 10

Performing SPARQL Update Operations on LMDB

We will use the SQL interface to perform updates on the LMDB model to show how this data can evolve with movie rating information.

Open a SQL worksheet for rdfuser. File->New. Then select Database Files and click OK.

Pick a file name and location and click OK.

Now we can perform some SPARQL Update operations using the SEM_APIS PL/SQL package. Some example updates are below. Be sure to select the rdfuser connection for this worksheet in the connection drop down menu on the top right. Some example updates are shown below. The full list of movie review updates is available here, and corresponding queries are available here.

begin
  sem_apis.update_model('LMDB',
   'PREFIX   movie: <http://data.linkedmdb.org/film/>
    PREFIX    foaf: <http://xmlns.com/foaf/0.1/>
    PREFIX dcterms: <http://purl.org/dc/terms/>
    PREFIX    sioc: <http://rdfs.org/sioc/ns#>
    PREFIX  schema: <http://schema.org/>
    PREFIX        : <http://example.com/data/>

    INSERT DATA {
      # John likes Office Space
      :person123 foaf:name "John" ;
                 sioc:likes movie:31916 . 
    }'
);
end;
/

begin
  sem_apis.update_model('LMDB',
   'PREFIX   movie: <http://data.linkedmdb.org/film/>
    PREFIX    foaf: <http://xmlns.com/foaf/0.1/>
    PREFIX dcterms: <http://purl.org/dc/terms/>
    PREFIX    sioc: <http://rdfs.org/sioc/ns#>
    PREFIX  schema: <http://schema.org/>
    PREFIX        : <http://example.com/data/>

    # remove triple
    DELETE DATA { :person123 sioc:likes movie:31916 . };
    INSERT DATA {
      # replace triple with quad assigning :edge1 as id
      GRAPH :edge1 { :person123 sioc:likes movie:31916 . }
      # add edge property for rating
      :edge1 schema:ratingValue 5 .
    }'
);
end;
/

Graph Analytics with RDF Data

The next portion of the tutorial will illustrate how to extract a subgraph from our LMDB RDF data and use an in-memory graph analytics engine on the extracted subgraph.

Extracting the Subgraph

We will use the SEM_MATCH SQL table function to create some database views that represent Vertex and Edge tables for our extracted subgraph. SEM_MATCH allows the execution of SPARQL queries within SQL statements.

Use a SQL worksheet to create the following relational views in the rdfuser schema. The full script is available here.

-- Create node and edge views for PGX

-- Actor View
CREATE VIEW ACTORS AS
SELECT ACTOR$RDFVID as ID, 'Actor' AS "label", NAME as "name"
  FROM TABLE(SEM_MATCH(
    'PREFIX  rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT ?actor ?name
WHERE { ?actor rdf:type <http://data.linkedmdb.org/movie/actor>;
               <http://data.linkedmdb.org/movie/actor_name> ?name }',
    SEM_MODELS('LMDB'),
    null, 
    null,
    null,
    null,
    ' PLUS_RDFT=VC DO_UNESCAPE=T ALLOW_DUP=T '
));
                   
-- Movie table 
CREATE VIEW MOVIES AS
SELECT MOVIE$RDFVID as ID, 'Movie' AS "label", MTITLE AS "title", MGENRE AS "genre"
  FROM TABLE(SEM_MATCH(
    'PREFIX  rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT ?movie (MAX(STR(?genre)) AS ?mGenre) (MAX(STR(?title)) AS ?mTitle)
WHERE { ?movie rdf:type <http://data.linkedmdb.org/movie/film>;
               <http://purl.org/dc/terms/title> ?title ;
               <http://data.linkedmdb.org/movie/genre>/<http://data.linkedmdb.org/movie/film_genre_name> ?genre }
GROUP BY ?movie
               ',
    SEM_MODELS('LMDB'),
    null, 
    null,
    null,
    null,
    ' PLUS_RDFT=VC DO_UNESCAPE=T ALLOW_DUP=T '
    ));
    
-- edge table
-- forward edge
CREATE VIEW ACTED_IN AS
SELECT ACTOR$RDFVID AS SOURCE_ID, MOVIE$RDFVID AS DEST_ID, 'acted_in' AS "label"
  FROM TABLE(SEM_MATCH(
    'PREFIX  rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT ?actor ?movie
WHERE { ?movie <http://data.linkedmdb.org/movie/actor> ?actor }',
    SEM_MODELS('LMDB'),
    null, 
    null,
    null,
    null,
    ' PLUS_RDFT=VC DO_UNESCAPE=T ALLOW_DUP=T '
    ));

-- backward edge
CREATE VIEW HAD_ACTOR AS 
SELECT MOVIE$RDFVID AS SOURCE_ID, ACTOR$RDFVID AS DEST_ID, 'had_actor' AS LABEL
  FROM TABLE(SEM_MATCH(
    'PREFIX  rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT ?actor ?movie
WHERE { ?movie <http://data.linkedmdb.org/movie/actor> ?actor }',
    SEM_MODELS('LMDB'),
    null, 
    null,
    null,
    null,
    ' PLUS_RDFT=VC DO_UNESCAPE=T ALLOW_DUP=T '
    ));

Setting up Oracle Graph Server and Client

We will use JShell for this portion of the tutorial. First, we need to open a terminal and setup the environment. Execute the following commands in a terminal.

[oracle@localhost oracle]$ export JAVA_HOME=/home/oracle/RDF/jdk-11.0.6
[oracle@localhost oracle]$ export PATH=$JAVA_HOME/bin:$PATH
[oracle@localhost oracle]$ which java
~/RDF/jdk-11.0.6/bin/java
[oracle@localhost oracle]$

In order to load a graph from Oracle Database, we need to setup a Java Keystore file for the database connection password, and we need to create a configuration file for the graph we are loading.

Create a file called config.json in the RDF directory with the following contents. The full config.json file is available here.

{
    "name":"lmdb",
    "jdbc_url":"jdbc:oracle:thin:@localhost:1521/orcl",
    "username":"rdfuser",
    "keystore_alias":"database1",
    "vertex_id_strategy": "keys_as_ids",
    "vertex_providers":[
        {
            "name":"Actor",
            "format":"rdbms",
            "database_table_name":"ACTORS",
            "key_column":"ID",
            "key_type": "long",
            "props":[
                {
                    "name":"name",
                    "type":"string"
                }
            ]
        },
        {
            "name":"Movie",
            "format":"rdbms",
            "database_table_name":"MOVIES",
            "key_column":"ID",
            "key_type": "long",
            "props":[
                {
                    "name":"title",
                    "type":"string"
                },
                {
                    "name":"genre",
                    "type":"string"
                }
            ]
        }
    ],
    "edge_providers":[
        {
            "name":"acted_in",
            "format":"rdbms",
            "database_table_name":"ACTED_IN",
            "source_column":"SOURCE_ID",
            "destination_column":"DEST_ID",
            "source_vertex_provider":"Actor",
            "destination_vertex_provider":"Movie"
        },
        {
            "name":"had_actor",
            "format":"rdbms",
            "database_table_name":"HAD_ACTOR",
            "source_column":"SOURCE_ID",
            "destination_column":"DEST_ID",
            "source_vertex_provider":"Movie",
            "destination_vertex_provider":"Actor"
        }
   ]
}

Next, we need to setup the Java Keystore file. Execute the following in your terminal.

[oracle@localhost oracle]$ cd /home/oracle/RDF
[oracle@localhost RDF]$ keytool -importpass -alias database1 -keystore keystore.p12 -storetype pkcs12
Enter keystore password:  
Re-enter new password: 
Enter the password to be stored:  
Re-enter password: 
[oracle@localhost RDF]$

Enter a password for your keystore file and then enter the password for your rdfuser database account.

Using Oracle Graph Server and Client with JShell

This portion of the tutorial will use JShell to demonstrate parts of the Java API for Graph Server and Client. Run the following commands from your terminal to start the opg-jshell-rdbms client.

[oracle@localhost RDF]$ unset CLASSPATH
[oracle@localhost RDF]$ /opt/oracle/graph/bin/opg-rdbms-jshell --secret_store keystore.p12
12:26:25,578 INFO Ctrl$1 - >>> start engine

enter password for keystore keystore.p12: 
For an introduction type: /help intro
Oracle Graph Server Shell 20.1.0
PGX server version: 19.4.0 type: SM running in embedded mode.
PGX server API version: 3.6.0
PGQL version: 1.2
Variables instance, session, and analyst ready to use.
opg-jshell-rdbms>

Run the following commands to load the extracted graph from LMDB into memory.

opg-jshell-rdbms> var graph = session.readGraphWithProperties("config.json");
graph ==> PgxGraph[name=lmdb,N=51456,E=124888,created=1588523322551]
opg-jshell-rdbms>

Now, we can use graph.queryPgql to run some PGQL queries. A full list of example queries is available here.

opg-jshell-rdbms> graph.queryPgql(
...> "SELECT a.name as name "+
...> "MATCH (m:Movie)-[:had_actor]->(a:Actor) "+
...> "WHERE m.title = 'Home Alone'").print();
+------------------+
| name             |
+------------------+
| Catherine O'Hara |
| Daniel Stern     |
| John Candy       |
| Macaulay Culkin  |
| John Heard       |
| Daniel Stern     |
| Joe Pesci        |
| Roberts Blossom  |
+------------------+
$2 ==> PgqlResultSetImpl[graph=lmdb,numResults=8]
opg-jshell-rdbms>
opg-jshell-rdbms> graph.queryPgql(
...> "SELECT COUNT(e) AS pathLen, ARRAY_AGG(t.title) AS movie, ARRAY_AGG(t.name) AS coStar "+
...> "MATCH SHORTEST ( (a) ((s)-[e:acted_in]-(t))* (b) ) "+
...> "WHERE a.name = 'Charlie Chaplin' AND b.name = 'Mr. T'").print();
+------------------------------------------------------------------------------------------------------+
| pathLen | movie                                        | coStar                                      |
+------------------------------------------------------------------------------------------------------+
| 6       | [Limelight, Home for the Holidays, Spy Hard] | [Geraldine Chaplin, Charles Durning, Mr. T] |
+------------------------------------------------------------------------------------------------------+
$3 ==> PgqlResultSetImpl[graph=lmdb,numResults=1]

We can also use the Analyst API to run graph algorithms over our extracted graph. Use the commands below to run Page Rank against the LMDB graph. Running the algorithm adds a pagerank property to each vertex, which we can access with PGQL queries.

opg-jshell-rdbms> var analyst = session.createAnalyst();
analyst ==> NamedArgumentAnalyst[session=6b26e194-3dfe-40af-9e14-e9e34031b546]
opg-jshell-rdbms> VertexProperty
   pagerank = analyst.pagerank(graph);
pagerank ==> VertexProperty[name=pagerank,type=double,graph=lmdb]
opg-jshell-rdbms> pagerank.getName();
$6 ==> "pagerank"
opg-jshell-rdbms> graph.queryPgql("select id(a), a.pagerank, a.title match (a:Movie) order by a.pagerank desc limit 10").print();
+-------------------------------------------------------------------------------------------+
| id(a)               | a.pagerank            | a.title                                     |
+-------------------------------------------------------------------------------------------+
| 9203356100102031272 | 4.5957911542206383E-4 | Stranger Than Fiction                       |
| 3043577596047303050 | 3.7362096519635195E-4 | Walk Hard: The Dewey Cox Story              |
| 1551281233598901313 | 3.0906541971418243E-4 | 30 Days of Night                            |
| 4716692856789145745 | 2.9467453955043946E-4 | Talladega Nights: The Ballad of Ricky Bobby |
| 5136965450075342208 | 2.7029926744780766E-4 | Untraceable                                 |
| 8906191549691061753 | 2.366584825237721E-4  | Baby Boy                                    |
| 1924525656707524741 | 2.2347530719121642E-4 | Night at the Museum                         |
| 2044045883429832681 | 2.0619976574073263E-4 | Adventures Into Digital Comics              |
| 3830516857956502081 | 2.057737297912671E-4  | The Other Boleyn Girl                       |
| 1250359656689937498 | 1.920351747698865E-4  | First Sunday                                |
+-------------------------------------------------------------------------------------------+
$7 ==> PgqlResultSetImpl[graph=lmdb,numResults=10]
opg-jshell-rdbms> graph.queryPgql("select id(a), a.pagerank, a.name match (a:Actor) order by a.pagerank desc limit 10").print();
+-----------------------------------------------------------------+
| id(a)               | a.pagerank            | a.name            |
+-----------------------------------------------------------------+
| 6918926442303567142 | 3.652943214921537E-4  | Oliver Hardy      |
| 5108249384479603329 | 3.534905433635076E-4  | Stan Laurel       |
| 2217693998232186748 | 3.5033153856543363E-4 | John Wayne        |
| 1521869729662604452 | 3.266064860979765E-4  | Claudette Colbert |
| 7570309037436615508 | 3.187072825866198E-4  | William Garwood   |
| 4463808826027376555 | 3.147493787789601E-4  | Charlie Chaplin   |
| 1130732531415155834 | 2.7138550552431096E-4 | Harry von Meter   |
| 6706176589560490413 | 2.5675796106955844E-4 | Jackie Chan       |
| 3000787937460459606 | 2.336318740626907E-4  | Vincent Price     |
| 7730444284418262623 | 2.2997707687822115E-4 | Joan Crawford     |
+-----------------------------------------------------------------+
$8 ==> PgqlResultSetImpl[graph=lmdb,numResults=10]
opg-jshell-rdbms>

KGC 2020 Tutorial: Modeling Evolving Data in Graphs While Preserving Backward Compatibility

Setting up Virtual Box and Importing the VM

Downloading Software and Data

Configuring the Environment

1. Configuring Support for Apache Jena

2. Configuring Oracle Graph Server and Client

2.1 Installing Oracle Graph Server and Client

2.2 Setting up Java 11

3 Setting up RDF Knowledge Graph in Oracle Database

Loading the Linked Movie Data Base RDF data.

Using SQL Developer

Running SPARQL queries against the LMDB RDF Dataset

Performing SPARQL Update Operations on LMDB

Graph Analytics with RDF Data

Extracting the Subgraph

Setting up Oracle Graph Server and Client

Using Oracle Graph Server and Client with JShell

Matthew Perry

Consultant Member of Technical Staff

Update Your Database Homes and Grid Infrastructure Via the OCI Web Console on the Exadata Cloud Service

Many Single-Purpose Databases versus a Converged Database

KGC 2020 Tutorial: Modeling Evolving Data in Graphs While Preserving Backward Compatibility

Setting up Virtual Box and Importing the VM

Downloading Software and Data

Configuring the Environment

1. Configuring Support for Apache Jena

2. Configuring Oracle Graph Server and Client

2.1 Installing Oracle Graph Server and Client

2.2 Setting up Java 11

3 Setting up RDF Knowledge Graph in Oracle Database

Loading the Linked Movie Data Base RDF data.

Using SQL Developer

Running SPARQL queries against the LMDB RDF Dataset

Performing SPARQL Update Operations on LMDB

Graph Analytics with RDF Data

Extracting the Subgraph

Setting up Oracle Graph Server and Client

Using Oracle Graph Server and Client with JShell

Authors

Matthew Perry

Consultant Member of Technical Staff

Update Your Database Homes and Grid Infrastructure Via the OCI Web Console on the Exadata Cloud Service

Many Single-Purpose Databases versus a Converged Database