As we know, Large Language Models have come as one prominent  AI Application to generate text,image or video. Pure innovation is making strides everywhere. Its application is limitless from ChatBots to Langchain based applications, Market is flooded with various LLMs from key providers like OpenAI, Meta, Google, Anthropic. etc., depending on domain & size.

However these models are a bit older with files and data they are based on, we would like to enhance them with our local organisation’s private document, and provide that information to LLMs as context.

Aim  being when an LLM is asked a question, it does not just rely on what it already knows, instead, it first extracts relevant information from local knowledge sources, assuring generated outputs references from a vast amount of contextually enriched data (local files such as PDFs,DOC ..).

However,  challenge being  accuracy of retrieved information and data source heterogeneity and difficulties with ambiguous queries & clear understanding of context.

Engineers from Microsoft, have come up with sophisticated retrieval algorithms that can better understand the semantics of a query and could improve the relevance of fetched documents. Followed by good & efficiently indexing the knowledge base to speed up the process.

Called GaphRAG, RAG stores information in rows & columns of table databases, whereas GraphRAG stores it in Edges (can have properties) and Nodes (data record) of a graph, a difference. An edge can store additional information, say if Node represents Person, this can store its Name, Designation, Address etc. Queries can connect multiple graphs.

Magic lies in Knowledge Graphs, integrating graph databases with LLMs to enrich the model’s context before generating a response.

An LLM-generated knowledge graph built using GPT-4 Turbo.

@https://microsoft.github.io/graphrag/

I used OpenAI embeddings and Neo4J’s movies database for exploration, using its Cypher language for queries. Modelled entities as Nodes and Vertices. 

Some popular graphDB offerings are from Ontotext, NebulaGraph and Neo4J.

Suggest, refer to https://github.com/microsoft/graphrag for more details.

Please feel free to contact me (asheesh.mathur@gmail.com) for any clarifications