GenAI RAG Likes Explicit Relationships: Use Graphs!

May 16, 2024 | 11 minute read
Phil Cannata
Principal Enablement Specialist
Text Size 100%:

If you google “rag knowledge graphs” you will find numerous articles and videos extolling the benefit of combining RAG and “Knowledge Graphs”. This benefit mainly comes from making relationships EXPLICIT rather than implicit.  We will see how a graph data model can help represent explicit relationships and examine how that can be done with two popular graph models, Property Graphs and RDF Knowledge Graphs.  Oracle Graph supports both models.

To make things confusing, in the fast moving world of genAI people call Property Graphs as Knowledge Graphs but they don’t mean RDF Knowledge Graphs. Many Property Graph vendors highlight how Property Graphs can represent Knowledge Graphs. 

Almost all corporations store data in relational databases as tables like this,

MOVIE_GENRE Table

ID

Title

Genre

M1

The bird cage

Comedy

M2

Oliver

Musical

 

A better database design for this would be to have two tables, one for movies and one for genres like this,

MOVIE Table

ID

Title

Genre

M1

The bird cage

G1

M2

Oliver

G2

 

GENRE Table

ID

Type

G1

Comedy

G2

Musical

 

In these two tables, there is a relationship between a movie and its genre, and that relationship is captured IMPLICITLY by the Genre column in the movie table. Data and implicit relationships can be stored in relational databases in this way.

We will see how to define relationships explicitly using Property Graphs (with the SQL/PGQ standard) and RDF Knowledge Graphs and the pros and cons of each. But more importantly, we will see that modeling concepts in addition to explicit relationships will be necessary for a proper integration of RAG  and Knowledge Graphs. The future still holds a lot of changes for this area, best not to lock yourself into a single solution too early.

1. GenAI RAG and Graphs

What if instead of using tables, we stored data using nodes and edges like this,

This is an image of a node representing the title of a movie that is connected by a 'Has genre' relationship to a node representing the genre of the movie.  There are two such movies represented, 'The bird cage' has genre 'Comedy' and 'Oliver' has genre 'Musical.'

 

Now the genre relationship is EXPLICIT as a “Has genre” edge. This is what genAI RAG likes, explicit relationships, especially long chains of explicit relationships, which are very difficult to deal with in relational databases. Data and EXPLICIT relationships can be stored in what is called a Property Graph database. It can also be stored in an RDF Knowledge Graph database.

2. GenAI RAG and SQL/PGQ

In 2023 an ISO standard for Property Graphs called SQL/PGQ (SQL/Property Graph Query) was defined, see here and here. The great thing about SQL/PGQ is that it makes a relational database look like a Property Graph, which means existing data doesn’t need to be moved into a separate database in order to expose explicit relationships to genAI RAG.  In the Oracle Database implementation, the Property Graph is like a view on existing tables, so the data doesn’t even have to be moved to a separate schema.  Here’s how it works.

Assume we have the two tables (Movie and Genre) shown at the beginning of this blog. Using SQL/PGQ define the explicit relationships as follows:

  1. create property graph explicit_relationship_pg
  2. vertex tables (
  3.      movie
  4.          key (id)
  5.          label movie
  6.          properties all columns,
  7.      genre
  8.          key (id)
  9.          label genre
  10.     properties all columns
  11. )
  12. edge tables (
  13. movie as gives_genre
  14.     key (id)
  15.     source key (id) references movie (id)
  16.     destination key (has_genre) references genre (id)
  17.     label has_genre
  18.     properties all columns
  19. );

Lines 2 – 11 define the nodes (vertexes), lines 12 – 19 defines the edges. Now a query like the following can be used in a genAI prompt,

  1. select ‘Movie’ as label, t.*
  2. from   graph_table (explicit_relationship_pg
  3.     match
  4.     (m is movie) -[c is has_genre]-> (g is genre)
  5.     columns (m.title as title,g.type as type)
  6. ) t
  7. order by 1;

 

Lines 3- 5 are similar to the pattern specification syntax in any graph language, here they are wrapped inside a SQL query.

Oracle Database has the first implementation of SQL/PGQ.

The problems with this approach are:

  1. GenAI doesn’t understand SQL/PGQ yet, as it is a new standard.
  2. Explicit relationships can’t be inserted using SQL, as the standard does not yet include INSERT/UPDATE/DELETE for graphs. GenAI would need to be trained for this also.

3. GenAI RAG and RDF Knowledge Graphs

There is another standard, the W3C Resource Description Framework (RDF) that can be used for implementing explicit relationships. Here’s how the movie example would be defined in an RDF Knowledge Graph. (For something more complete, see the family tree example in the Oracle RDF Graph documentation.)

 

  1. :M1    rdf:type :Movie .
  2. :M1    :name    "The bird cage" .

 

  1. :G1    rdf:type :Genre .
  2. :G1    :type    "Comedy" .

 

  1. :M1    :has_genre :G1

 

Line 1 creates an M1 entity as a type (instance_of) Movie.

Line 2 adds a name attribute with a value of “The bird cage” to the M1 entity.

Line 3 creates a G1 entity as a type (instance_of) Genre.

Line 4 adds a type attribute with a value of “Comedy” to the G1 entity.

Line 5 creates an explicit relationship named “has_genre” between M1 and G1.

 

Now a query like the following can be used in a genAI prompt (this query uses the W3C SPARQL language),

SELECT ?movie ?title ?type

WHERE

  {?movie :has_genre ?genre .

   ?movie :title     ?title .

   ?genre :type      ?type .

  }

 

This query returns,

Movie    Title           Type

M1       The bird cage   Comedy

 

In contrast to Property Graphs and SQL/PGQ, RDF benefits from having been a standard for 20 years and the existence of plenty of linked data on the internet for training. GenAI can generate decent SPARQL queries if prompted with an ontology in a standard RDF serialization, and it seems to already be trained on public datasets like DBPedia.

Oracle Database has a complete implementation of RDF, which means you get RDF plus all of the enterprise features of an Oracle Database like transactional support, performance, security and reliability in a converged database. Oracle’s implementation is unique in that it is tightly integrated with SQL. This means an RDF database can be made to look like an SQL database as demonstrated in Building Rule-Based OLTP Systems Using Oracle RDF.

Although the approach outlined above requires the RDF data to be stored in native RDF format, Oracle Database also provides a way of viewing and querying relational data as RDF using a mapping specified using the W3C RDB2RDF mapping standard. The need to convert relational data to RDF when presenting as output of SPARQL queries may introduce some performance overhead, however, for more complex queries.

 

4. GenAI RAG and using RDF Knowledge Graphs to Model Concepts

Explicit relationships are sometimes not enough to model data. The use of Classes and Subclasses, as in the following figure, adds more context.

This figure represents that the movie 'The bird cage' is a type of 'Comedy Movie' which is a type of 'Movie.'

 

Here ComedyMovie is a subclass of Movie and “The bird cage” is in instance of ComedyMovie. This can be done in an RDF Knowledge Graph as follows.

:ComedyMovie rdfs:subClassOf :Movie .

:M1          rdf:type        :ComedyMovie .

:M1          :title          "The bird cage" .

 

Here’s one type of query for this database,

SELECT ?movie ?title

WHERE

  {?movie rdf:type :ComedyMovie .

   ?movie :title   ?title .

  }

This query returns the title of all comedy movies like this,

This figure illustrates the results of running the above query in a tool like SQL Developer.   The values returned are movie:M1 with title 'The bird cage.'

 

What if we want the title of all movies? This query will do it,

SELECT ?movie ?class ?title

WHERE

  {?movie rdf:type/rdfs:subClassOf* :Movie .

   ?movie rdf:type                  ?class .

   ?movie :title                    ?title .

  }

Which returns,

This figure represents the results returned from the above query, which is a row with three columns.  The first row is movie:M1 the second column is movie:ComedyMovie the third column is "The bird cage."

This is an extremely important query to be able to do when you have classes and subclasses, i.e., to get instances of the subclasses at the class level.

This is made possible by this syntax “rdf:type/rdfs:subClassOf*”, which is called SPARQL Property Paths. They are defined here. Property paths enable a very flexible and powerful way of navigating possible routes through a graph between two graph nodes

There are many additional types of concepts that are supported by RDF Knowledge Graphs like attribute value inheritance, automatically providing the inverse of a relationship and inferring that one entity is the same as another entity.  All of these help represent concepts and are not easily possible with Property Graphs.

These concepts and many more will eventually be needed for genAI RAG.

So, what is the downside of using RDF Knowledge Graphs and for genAI RAG?   RDF Knowledge Graphs, while based on formal semantics and a mature standard, can have a steep learning curve.   While RDF Knowledge Graphs work best when data is represented in a native RDF format, it is considered verbose and converting existing data into a native RDF format is perceived as a hurdle.  The model can also be seen as rigid, while the Property Graph model is perceived as simple and flexible. But what’s discussed in Building Rule-Based OLTP Systems Using Oracle RDF eliminates this problem.

Another potential problem with this approach is that many of the RDF concepts require an extra step called “entailment” as part of the implementation. Entailment adds extra computational requirements and usually requires materialization of the data.

5. Conclusion

At this point in time, are RDF Knowledge Graphs the best choice for RAG Knowledge Graphs because they can be used to provide a much more conceptual meaning to the data?  Or are Property Graphs better because of their flexibility, and concepts will be handled by genAI in a different way?  These questions are actively debated. It remains to be seen which graph model will best help genAI.

 

Appendix -  Oracle RDF Knowledge Graph Code used in this blog post

BEGIN

 SEM_APIS.UPDATE_MODEL('movies',

 'PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

 PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

 PREFIX movie: <http://movie.org/>

 INSERT DATA {

 movie:M1    rdf:type movie:Movie .

 movie:M1    movie:name "The bird cage" .

 

 movie:G1    rdf:type movie:Genre .

 movie:G1    movie:type "Comdey" .

 

 movie:M1    movie:has_genre movie:G1

  }',

 network_owner=>'RDFUSER',

 network_name=>'NET1');

 END;

/

 

PREFIX  rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

PREFIX  owl: <http://www.w3.org/2002/07/owl#>

PREFIX  xsd: <http://www.w3.org/2001/XMLSchema#>

PREFIX   dc: <http://purl.org/dc/elements/1.1/>

PREFIX foaf: <http://xmlns.com/foaf/0.1/>

PREFIX movie: <http://movie.org/>

 

 

SELECT ?movie ?title ?type

WHERE

  {?movie movie:has_genre ?genre .

   ?movie movie:title     ?title .

   ?genre movie:type      ?type .

  }

 

 

BEGIN

 SEM_APIS.UPDATE_MODEL('MovieClasses',

 'PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

 PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

 PREFIX movie: <http://movie.org/>

 INSERT DATA {

movie:ComedyMovie rdfs:subClassOf movie:Movie .

movie:M1          rdf:type        movie:ComedyMovie .

movie:M1          movie:title          "The bird cage" .

 

  }',

 network_owner=>'RDFUSER',

 network_name=>'NET1');

 END;

 

Query 1

 

PREFIX  rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

PREFIX  owl: <http://www.w3.org/2002/07/owl#>

PREFIX  xsd: <http://www.w3.org/2001/XMLSchema#>

PREFIX   dc: <http://purl.org/dc/elements/1.1/>

PREFIX foaf: <http://xmlns.com/foaf/0.1/>

PREFIX movie: <http://movie.org/>

 

SELECT ?movie ?title

WHERE

  {?movie rdf:type movie:ComedyMovie .

   ?movie movie:title     ?title .

  }

This figure illustrates the results of running the above query in a tool like SQL Developer.   The values returned are movie:M1 with title 'The bird cage.'

Query 2

 

PREFIX  rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

PREFIX  owl: <http://www.w3.org/2002/07/owl#>

PREFIX  xsd: <http://www.w3.org/2001/XMLSchema#>

PREFIX   dc: <http://purl.org/dc/elements/1.1/>

PREFIX foaf: <http://xmlns.com/foaf/0.1/>

PREFIX movie: <http://movie.org/>

 

SELECT ?movie ?class ?title

WHERE

  {?movie rdf:type/rdfs:subClassOf* movie:Movie .

   ?movie rdf:type ?class .

   ?movie movie:title     ?title .

  }

This figure represents the results returned from the above query, which is a row with three columns.  The first row is movie:M1 the second column is movie:ComedyMovie the third column is "The bird cage."

Phil Cannata

Principal Enablement Specialist

As a Principal Enablement Specialist in the Revenue Enablement organization, I'm responsible for needs analysis, design, development, deployment and delivery of global sales, presales technical and implementation level training for our Oracle field sales and partner communities.

Areas of focus include:
- Data Management
- Artificial Intelligence
- Observability and Enterprise Management

I received a Ph.D. in 1980 from the University of Notre Dame in High Energy Particle Physics and have worked in the Computer Science industry for over 43 years starting with doing Unix development at Bell Laboratories in Murray Hill, NJ in the early 80s. My most significant contribution to Unix was the design and implementation of Shared Memory, Semaphores, and Memory Mapped Files in Unix 4.2 and Unix 5.0. I was a Research Director at MCC in Austin, Texas and then worked at IBM and Sun Microsystems. I have also been an Adjunct Professor at the University of Texas, Austin for 18 years teaching "Data Management", "Data Visualization”, “Data Analytics”, "Programming Languages", "Data Structures and Algorithms in Java and Python", Software Design, and "Networking". I authored two on-line textbooks for his courses at UT, one on "Semantic Data Management" and the other on "Data Visualization". I'm also an Oracle Certified Professional and taught "Oracle Database 12c: Data Mining Techniques" for Oracle University and "Data Analytics", and "SQL" classes at General Assembly in Austin, TX. I have publish four books on Amazon with novel approaches to teaching quantum mechanics and quantum field theory. I also enjoy bioinformatics and I'm a professional keyboard musician.


Previous Post

Make Better Maps for Your Apps with Spatial Vector Tiles and H3 in Oracle Database 23ai

Denise Myrick | 7 min read

Next Post


Architecting Hyper-Scalable Infrastructure for AI and ML-Driven Fintech with Oracle Globally Distributed Database

Deeksha Sehgal | 9 min read