Today, a quick technical dive on the subject of graphs (in the sense of network) and their growing interest in generative AI, in particular on the subject of Now Famous Retrieval-Augmented Generation (RAG). We had already adopted the graph paradigm for a long time on other topics, but we are also increasingly including it in our generative AI client projects.

A solution: the Knowledge Graph (KG).

The problem : to the next question:”Who generated the most revenue between the GAFAM in the year Eulidia was founded?” , how to make the AI model understand that the GAFAMs correspond to 5 different companies and therefore then carry out five searches for their respective revenues?

Note that on concepts as well documented as the concept of “GAFAM”, GPT-4 is obviously doing very well without the help of KG. On the other hand, in particular contexts (business data, concepts specific to a company, department, or country, technical jargon, etc.), no off-the-shelf model will be able to fully understand the depth and complexity that certain concepts or entities possess.

A second problem : How to implement an additional layer of access rights management at the chunk level (= portion of documents)?

A solution: the graph (surprising isn't it?)

Characteristics of a Knowledge Graph

So, a graph is mainly:

- Nodes, or vertices (plural of vertex), or points, or nodes

- Links, or edges, or links

- relationships, sometimes directed (we will then speak of “directed graphs”).

Indeed, a link between two knots can be made in two directions: in one direction only (parenting is unidirectional: I am the mother of someone who cannot be my mother), or in both directions (I am someone's colleague, who is, in fact, and also my colleague).

Graphs can be used to improve:

  1. Managing the conversational structure of Large Language Models (LLM): this offers the opportunity to shape the desired interactions with our LLM. For example, it makes it possible to guide exchanges between a chatbot and the user. For more information, see: https://towardsdatascience.com/conversations-as-directed-graphs-with-lang-chain-46d70e1a846c
  2. Documentary research as part of an RAG.

For RAG, we generally use a slightly particular form of graph: Knowledge Graphs, i.e. graphs that allow information to be stored efficiently and to map the interactions between different entities.

Knowledge Graph use cases

These graphs are extremely useful for:

1. Improving a request 

for example, for the question”Who generated the most revenue between the GAFAM during the year Eulidia was founded?“, an LLM agent with access to a knowledge graph will be able to retrieve structured data on the year Eulidia was created, namely 2008 (https://www.eulidia.com/philosophie.html a little further down the page), and the names of the GAFAM entities. Also, we will thus be able to regenerate a query to know the financial results in 2008 of each of Google (Alphabet), Amazon, Facebook (Meta), Apple and finally Microsoft.

2. Create document hierarchies for documentary research

using a hierarchy of documents, identify which documents and extracts are the most relevant to the financial documents of each of the GAFAMs and return the relevant response.

3. Allow recursive queriesS

Allow recursive queries to query the documentary database again if the response obtained refers to a specific entity: let's take the following example: using recursive queries on the knowledge graph, an initial query returns that Amazon excluding subsidiaries generated $x$ in revenue. If this information about Amazon subsidiaries is well stored in a separate knowledge graph, you can query the financial bases specific to the subsidiaries (such as AKS) again in 2008.

Access, security and quality of the documentary database

Control access to the various documents in the documentary database

let's take the example of confidential databases such as financial documents: in this case, the data is actually open because companies are listed on the stock exchange. One could imagine adding a metadata tag associated with the financial documents node. It would indicate that access to this node is restricted for anyone who is not part of the finance team. Therefore, an access control rule prevents the financial data node from being included in the response to the person if it is not part of this team.

Improving the quality of responses: as a post-processing step

you could also choose to improve the generation with a knowledge graph specific to the tech industry. For example, you could include an explanation of the services these businesses offer.

Knowledge Graph example

So it's a very powerful approach!

But a question then arises: How do you generate these graphs? For example, here is a method for automatically generating them: https://bratanic-tomaz.medium.com/constructing-knowledge-graphs-from-text-using-openai-functions-096a6d010c17. On this point, there is certainly still room for improvement but it is only the beginning, and it is very promising.

This query feature with knowledge graphs is already available with Llamaindex (https://www.llamaindex.ai/) or Langchain. (https://www.langchain.com/). You will simply have to add a graph database manager such as Neo4j.

And to go further:

And on the fascinating subject of graph theory, we highly recommend the reference in this field: Network Science, by Albert-László Barabási (2016), which you can find here in a free ebook: https://barabasi.com/book/network-science#network-science.

#CONSULTING
#DATA
#EULIDIA
#AI