Adding Data to Your Graph? Use of RDF Quads Guarantees Your Old Queries Remain Valid and Backward-Compatible with Your New Schema!

Oracle Database has support for Resource Description Framework (RDF) quadstore and Property Graph (PG). We compare, from a schema evolution and backward compatibility point of view, the handling of evolving real-world data modeled as graph using the following graph models:

  1. RDF Graphs (W3C RDF 1.1 Recommendation, 25-FEB-2014)
    1. RDF-T: Triples-Only
    2. RDF-TQ: Triples+Quads
    3. RDF-Q: Quads-Only
  2. Property Graphs (PG)

We will do a comparative analysis of the ways in which the different graph data models handle the types of additions, listed in the table below, in an evolving graph and their implications on pre-existing queries and schema for the graph.

#

Type of Addition to a Graph

(Cumulative) Content to model as graph

1

Vertex, Edge, Vertex-Property

John, whose net worth is $1 billion, donated to Top University. Mary, a child of John, got admitted to Top University.

2

Duplicate Edge

John … donated twice to Top University. …

3

Edge-Property

John … donated twice to Top University, in the years 2010 and 2012, respectively. Mary … got admitted to Top University in 2011.

4

Edge as Endpoint of an Edge

Bob suspects that John’s 2010 donation helped Mary’s admission.

5

Edge-Property as Endpoint of an Edge

John … donated twice to Top University,  in the years 2010 and 2012 (the year 2012 was confirmed by Alex)

6

Vertex-Property as Endpoint of an Edge

John, whose net worth is $1 billion (according to Dan),

 

The main finding that we will try to motivate and illustrate here is that the following ability – relevant to RDF triples, and for PG, to edges, edge-properties, and vertex-properties – is of critical importance for handling graph data addition in a query-preserving, backward-compatible manner:

  • Assign a unique id to a relationship (and property-value association).
  • Make use of that id as endpoints of a relationship
  • Do the above:
    • without affecting pre-existing queries, and
    • while maintaining backward compatibility of the evolving schema (used for query design purposes)

Motivation: Consider the rather mundane everyday content shown in the first row and its evolution to accommodate addition of the somewhat sensational assertion in the second row of the table below:

 

Type of Addition to a Graph

(Cumulative) Content to model as graph

Vertex, Edge, Vertex-Property

John donated to Top University. Mary, a child of John, got admitted to Top University.

Edge as Endpoint of an Edge

… John’s donation helped Mary’s admission.

 

Types of Transformation: We will make use of the following two types of transformation in the illustration below:

  • Vertexification of an edge, edge-property, vertex-property, or RDF triple.
    • Edge: A new vertex – “edge-vertex” – is created to represent the edge. A vertex-property reflecting the edge-type of the original edge is added to the edge-vertex. Two new edges are created connecting the edge-vertex to the two endpoints of the original edge. The original edge is deleted.
    • Vertex-Property: A new vertex – “vp-vertex” – is created to represent the vertex-property. The target property-value pair is removed from the original vertex and added to the vp-vertex. A new edge is created connecting the original vertex to the vp-vertex.
    • Edge-Property: A new vertex – “ep-vertex” – is created to represent the edge-property. The original edge is replaced with two edges connecting the ep-vertex to its two endpoints. The target property-value pair is removed from the original edge and added to the ep-vertex. Finally, the steps used for vertexification of a vertex-property (as noted above) are followed for this newly added vertex-property of the ep-vertex.
    • RDF triple: (In RDF terminology, this step is essentially the same as reification.) A new resource (or vertex) – “triple-vertex” – is created to represent the original triple. The original triple is replaced with three new triples created with the triple-vertex as the subject, recording respectively the subject, predicate, and object components of the original triple.
  • Quadification of an RDF triple: This is relevant for RDF-TQ only. A new resource (or vertex) – “triple-vertex” – is created to represent the original triple. A quad is created with the same subject, predicate, and object components as the original triple and the triple-vertex as the “named graph” component. The original triple is deleted. (Quadification is essentially the conversion of a triple or unnamed edge to a quad or named edge.)

 

Vertexification in PG and RDF-T: Representation of the original content in PG is shown on the left-side of the diagram below and that of the extended content on the right-side. (It will be quite similar for RDF Triples-Only model too.)

 

It is easy to see that a lot has changed in the schema to accommodate the small additional content. Two edge-types – “donatedTo” and “admittedTo” – have disappeared, and they have been replaced with two pairs of new edge-types <“donor” and “receiver”> and <”student” and “university”>, respectively. A new vertex-property, “event”, with value reflecting the original edge-type has been added. (In addition, a new edge-type, “helped”, has been added, but this is just an expected expansion of the original schema.)

 

What are the implications of this significant change in schema? Many original queries (e.g., “Who donated to Top University?”), that were designed based on the original schema, would not work anymore. Query designers have to familiarize themselves with the new schema and then rewrite the queries accordingly.

 

Ideally, in case of PG, adding a new edge of edge-type “helped” connecting the edge “e12: donatedTo” to the edge “e32: admittedTo”, should have been enough. Similarly, for RDF-T, a new triple connecting the “donatedTo” triple (as subject) to the “admittedTo” triple (as object) with “helped” as the predicate should have been enough.

 

However, that is not allowed in both PG and RDF-T. In PG, an edge cannot be an endpoint of an edge (and the same restriction applies to vertex-properties and edge-properties as well). In RDF-T, a triple cannot be the subject or object of another triple. In PG, even though every edge has a unique edge id, the edge id cannot be used for anything other than hanging edge-properties from the edge, and in RDF-T, there is no id for a triple.

 

These restrictions lead to the following: only vertices are allowed to be endpoints of an edge. The workaround that was used, therefore, to arrive at the right-side diagram above was to “vertexify” the relevant edges in PG to create new vertices (edge-vertices) to represent those edges, respectively, and then use those as the endpoints. Same workaround works for RDF-T too: to use an RDF triple as endpoint of an edge, i.e., as subject and/or object of a new triple, we must vertexify the triple to create a resource (triple-vertex) that will represent the triple and then use it in the new triple. (Note that vertexification is really similar to “reification”, but the word “vertexification” is probably more intuitive in the graph context.)

 

Quadification in RDF-TQ: Representation of the original content in RDF-TQ is shown on the left-side of the diagram below and that of the extended content on the right-side. (For simplicity, the value triples – equivalent of vertex-properties – are shown as boxes, attached to the subject node, containing the predicate-value pair.)

 

 

The change in this case looks much simpler. For each of the two relevant RDF triples that need to be connected (that is, used as the endpoints of an edge, or in RDF terminology, used as the subject or object component of a new triple), we can simply “quadify” it, that is, replace the triple with a quad that has the same subject, predicate, and object components as the original triple and has a newly created resource or vertex (triple-vertex) as its “named graph” component. The two triple-vertices created for the quadification of the two RDF triples are shown above as v4 and v5, respectively. We show only the relevant triples and quads (generated upon quadification) in the table below.

 

(relevant) triples before change

 

(relevant) triples/quads after change

:v1    :donatedTo    :v2 .

=>

graph :v4 {  :v1    :donatedTo    :v2  }

:v3    :admittedTo   :v2 .

graph :v5 {  :v3    :admittedTo   :v2  }

 

:v4    :helped    :v5 .

 

Most important thing to note here is that the original schema was not modified, but only expanded to include the new relationship type “helped”. The two relationship types “donatedTo” and “admittedTo” still continue to exist – they did not disappear. The triples that were converted to quads are still present – now as part of the quads – and pre-existing queries (such as, “Who donated to Top University?”) would continue to work without any change. A query designer does not need to re-learn a new version of the original schema to rewrite the original queries. (If interested in designing new queries, the query designer may want to learn about the newly added “helped” relationship type.)

 

RDF-Q does NOT need vertexification or quadification: Note that in an RDF Quads-Only model (that contains quads only, no triples), if the “named graph” component of a quad is used as a name that represents the triple portion of the quad, neither vertexification nor quadification is relevant. Since RDF does not impose any restrictions on use of such names of triples, the complex situations identified above – using triples as subject and/or object components of other triples – do not pose a problem in RDF-Q model.

 

Note: Although use of the “named graph” component of a quad in RDF-TQ and RDF-Q for representing edge-id conflicts with its use for partitioning of RDF data into a single unnamed and multiple named graphs, we do not address it in this article because PG also does not allow such partitioning of graphs. Nonetheless, we are currently exploring mixed use of the RDF “named graph” component – both as edge-id and for graph partitioning.

Related Work

Our March 2014 EDBT paper [1] discussed three approaches for representing property graphs in RDF: reification, named graphs, and subproperty (i.e., rdfs:subPropertyOf) based. The two types of transformation discussed here – vertexification and quadification – map directly to the methods utilizing reification and named graphs, respectively. The subproperty-based approach, very similar to the singleton-property based approach [2] introduced around the same time, is not discussed here in the interest of brevity because, with respect to schema backport-compatibility in an evolving graph, it has similar drawbacks as the reification-based (and hence, vertexification) approach, unless one (redundantly) stores the same triple in two different forms to allow old queries to continue to work.

 

Unlike [1], this article focuses mainly on the schema evolution and backward-compatibility problem for evolving graphs and explains with examples (see Appendix section below) how the quadification (named graph) based approach is able to guarantee backward-compatibility even as the schema evolves due to new data getting added to an existing graph.

Summary and Conclusion

First, based on the discussion above about the two types of transformation – vertexification and quadification – we can summarize their implications in the following table:

 

Transformation

All pre-existing queries remain valid?

New schema backward-compatible with original?

vertexification

No

No

quadification

Yes

Yes

 

Also, the following table shows when to use which of these two types of transformations. We used the case of adding an “edge as endpoint of an edge” (listed in row #4 in the table below) earlier to show why and how vertexification is used for the RDF Triples-Only and Property Graph models whereas use of the (much simpler) quadification transformation was sufficient for the RDF Triples+Quads model. The Appendix section below will go through a complete example to illustrate each of the types of addition listed in the table below for the Property Graph and RDF Triples+Quads models.

 

#

Type of Addition

to a Graph

RDF Triples-Only

Property Graph

RDF Triples+Quads

RDF
Quads-Only

1

Vertex, Edge, Vertex-Property

2

Duplicate Edge

Vertexify

Add quad

3

Edge-Property

Vertexify

Quadify

4

Edge as Endpoint of an Edge

Vertexify

Vertexify edge

Quadify

5

Edge-Property as Endpoint of an Edge

Vertexify

Vertexify (edge and) edge-property

Quadify

6

Vertex-Property as Endpoint of an Edge

Vertexify

Vertexify vertex-property

Quadify

 

Based on the discussions above (summarized in the two tables here), we posit that, with respect to maintaining backward compatibility of the evolving schema in the face of data additions to an existing graph, the following is a reasonable ranking of the different graph models:

RDF Triples-Only < Property Graph < RDF Triples+Quads < RDF Quads-Only

or, simply:

RDF Triples < Property Graph < RDF Quads

References

  1. Das, S., Srinivasan, J., Perry, M., Chong, E., Banerjee, J. A Tale of Two Graphs: Property Graphs as RDF in Oracle. Published in Proc. Of 17th International Conference on Extending Database Technology (Athens, Greece, March 24–28, 2014) EDBT’14, on OpenProceedings.org.
  2. Nguyen, V., Bodenreider, O., Sheth, A. Don’t Like RDF Reification? Making Statements About Statements Using Singleton Property. Published in Proc. of 23rd International Conference on World Wide Web WWW’14. April 2014.

Appendix: A Complete Example

To illustrate handling of change in graph data using both PG and RDF-TQ, we show below the steps involved in handling the cumulative content additions in the table shown at the beginning of this article. The changes for each step are shown in red color in the diagram. 

 

(Note: To avoid clutter in the diagrams below, RDF “value” triples, that connect subject resources to values, are shown like vertex-properties in PG. If the subject resource (vertex), due to quadification, represents a triple, then the value triples for that subject are shown like edge-property in PG.)

 

  1. Add Vertices, Edges, and Vertex-Properties

John, whose net worth is $1 billion, donated to Top University.
Mary, a
child of John, got admitted to Top University.

Property Graph

RDF Triples+Quads

  • Three vertices – v1, v2, and v3 – represent the three entities namely John, TopUniv, and Mary, respectively.
  • Vertex properties represent the names and, in case of John, his net worth too.
  • Three edges – e12, e32, and e31 – connect the ordered pairs (v1, v2), (v3, v2), and (v3, v1), reflecting the “donatedTo”, “admittedTo”, and “childOf” edge-types, respectively.

Similar to PG case, but no edge-ids. RDF triples only. No quads. (Diagram shows the “value” triples in the same way as PG vertex-properties.)

  • added triples:
    :v1  :name  “John” .
    :v1  :worth  “1 Bil” .
    :v2  :name  “TopUniv” .
    :v3  :name  “Mary” .
    :v1  :donatedTo  :v2 .
    :v3  :admittedTo  :v2 .
    :v3  :childOf  :v1 .

 

  1. Add Duplicate Edge

John … donated twice to Top University. …

Property Graph

RDF Triples+Quads

Add a new edge e12-2 with edge-type “donatedTo” with same endpoints as the first donation edge.

  • Add a quad with e12-2 as the triple-vertex (named graph) to reflect the second donation. (Note: Needs a quad to distinguish from the triple representing the first donation and thus avoid automatic de-duplication in RDF.)
  • added quad:
    graph  :e12-2  {  :v1  :donatedTo  :v2  }

 

 

 

  1. Add Edge-Properties

John … donated twice to Top University, in the years 2010 and 2012, respectively. Mary … got admitted to Top University in 2011.

Property Graph

RDF Triples+Quads

Hang the edge-properties from the three target edges e12, e32, and e12-2, respectively.

  • There are three target edges (triples) to hang the edge-properties from. One of these is already in quad form, with e12-2 as the triple-vertex (named graph).
  • Quadify the remaining two edges (triples) with e12 and e32 as the triple-vertices (named graphs).
  • Next, for each of the three target edges, simply add a (value) triple with the triple-vertex of the edge as the subject to reflect the desired edge-property.
  • quadified triples: (uses delete-triple followed by insert-quad)
    :v1  :donatedTo  :v2 .
    :v3  :admittedTo  :v2 .

    graph  :e12  {  :v1  :donatedTo  :v2  }
    graph  :e32  {  :v3  :admittedTo  :v2  }
    added triples:
    :e12  :year  2010 .
    :e12-2  :year  2012 .
    :e32  :year  2011 .

 

 

  1. Add Edge with Edges as Endpoints

Bob suspects that John’s 2010 donation helped Mary’s admission.

Property Graph

RDF Triples+Quads

  • Two new edges need to be created: a “helped” edge and a “suspects” edge. The “helped” edge would have two edges as its source and destination endpoints: the first “donatedTo” edge e12 and the “admittedTo” edge e32. The “suspects” edge would have a new “Bob” vertex as its source and the “helped” edge as its destination.
  • Vertexify edge e12: New edge-vertex v4 gets created with appropriate connections and vertex-properties. Edge e12 is deleted.
  • Vertexify edge e32: New edge-vertex v5 gets created with appropriate connections and vertex-properties. Edge e32 is deleted.
  • Assume that we create a hypothetical new edge, “e45: helped”, connecting v4 to v5 at this point.
  • Vertexify (hypothetical) edge e45: Since the “suspects” edge needs to have this e45 edge as an endpoint, we need to vertexify the hypothetical edge e45. New edge-vertex v6 gets created with appropriate connections and vertex-properties.
  • Finally, create the “e76: suspects” edge connecting the new “Bob” vertex (v7) to the edge-vertex (v6) representing the “helped” edge.

 

  • Two new edges need to be created: a “helped” edge and a “suspects” edge.
  • The “helped” edge would have two edges as its source and destination endpoints: the first “donatedTo” edge  and the “admittedTo” edge. Since both edges were already quadified (as edge-vertices :e12 and :e32) when adding edge-properties, simply connect those edge-vertices representing those two edges to create the “helped” edge as a triple.
  • The “suspects” edge would have the newly created “helped” edge as the destination and a new vertex for “Bob” as the source. Quadify the “helped” edge, as edge-vertex :e1232, and then create the “suspects” edge as a triple with the new “Bob” vertex (v7) as the subject and the edge-vertex :e1232 as the object.
  • added quad:
    graph  :e1232  {  :e12  :helped  :e32  }
    added triples:
    :v7  :name  “Bob” .
    :v7  :suspects  :e1232 .

 

 

 

 

  1. Add Edge with Edge-Property as Endpoint

John … donated twice to Top University, in the years 2010 and 2012 (the year 2012 was confirmed by Alex)

Property Graph

RDF Triples+Quads

  • Vertexify the e12-2 edge to create the edge-vertex v8 with event=“donatedTo” and “year=2012” as vertex properties.
  • Vertexify the year=2012 vertex-property created above to create the vp-vertex v9, connected from v8 by new edge “e89: year”, and “year=2012” as a vertex-property (transferred from v8).
  • Next, create the desired “e910: confBy” edge by connecting the above vp-vertex (v9) to a new “Alex” vertex (v10).
  • Quadify the “value” triple that was created in STEP 3 above:  “:e12-2  :year. 2012”. The new triple-vertex, e12-2year, that gets created here represents this “year=2012” edge-property for the second donation edge.
  • Next, create the desired “confBy” edge as a triple with the triple-vertex e12-2year as the subject and a new “Alex” vertex (v10) as the object.
  • quadified triples: (uses delete-triple followed by insert-quad)
    :e12-2  :year  2012 .
    graph  :e12-2year  {  :e12-2  :year  2012 }
    added triples:
    :v10  :name  “Alex” .
    :e12-2year  :confBy  :v10  .

 

  1. Add Edge with Vertex-Property as Endpoint

John, whose net worth is $1 billion (according to Dan),

Property Graph

RDF Triples+Quads

  • Vertexify the worth=“1 Bil” vertex-property that was created in STEP 1 above. The newly created vp-vertex is v11.
  • Add year=“1 Bil” as a vertex-property to the vp-vertex (v11).
  • Next, create the desired “e1112: accTo” edge by connecting the vp-vertex (v11) to a new “Dan” vertex (v12).
  • Quadify the (value) triple, worth=“1 Bil”, that was created in STEP 1 above. The newly created triple-vertex is v1worth.
  • Next, create the desired “accTo” edge as a triple with the triple-vertex v1worth as the subject and a new “Dan” vertex (v12) as the object.
  • quadified triples: (uses delete-triple followed by insert-quad)
    :v1  :worth  “1 Bil” .
    graph  :v1worth  { :v1  :worth  “1 Bil” }
    added triples:
    :v12  :name  “Dan” .
    :v1worth  :accTo  :v12 .