Indexing Architecture
Key Concepts
Knowledge Model
In order to support the GraphRAG system, the outputs of the indexing engine (in the Default Configuration Mode) are aligned to a knowledge model we call the GraphRAG Knowledge Model. This model is designed to be an abstraction over the underlying data storage technology, and to provide a common interface for the GraphRAG system to interact with. In normal use-cases the outputs of the GraphRAG Indexer would be loaded into a database system, and the GraphRAG's Query Engine would interact with the database using the knowledge model data-store types.
Workflows
Because of the complexity of our data indexing tasks, we needed to be able to express our data pipeline as series of multiple, interdependent workflows.
---
title: Sample Workflow DAG
---
stateDiagram-v2
[*] --> Prepare
Prepare --> Chunk
Chunk --> ExtractGraph
Chunk --> EmbedDocuments
ExtractGraph --> GenerateReports
ExtractGraph --> EmbedEntities
ExtractGraph --> EmbedGraph
LLM Caching
The GraphRAG library was designed with LLM interactions in mind, and a common setback when working with LLM APIs is various errors due to network latency, throttling, etc.. Because of these potential error cases, we've added a cache layer around LLM interactions. When completion requests are made using the same input set (prompt and tuning parameters), we return a cached result if one exists. This allows our indexer to be more resilient to network issues, to act idempotently, and to provide a more efficient end-user experience.