GraphRAG

Get Started

Requirements

Python 3.10-3.12

To get started with the GraphRAG system, you have a few options:

👉 Use the GraphRAG Accelerator solution
👉 Install from pypi.
👉 Use it from source

Quickstart

To get started with the GraphRAG system we recommend trying the Solution Accelerator package. This provides a user-friendly end-to-end experience with Azure resources.

Top-Level Modules

Indexing Pipeline Overview
Query Engine Overview

Overview

The following is a simple end-to-end example for using the GraphRAG system. It shows how to use the system to index some text, and then use the indexed data to answer questions about the documents.

Install GraphRAG

pip install graphrag

Running the Indexer

Now we need to set up a data project and some initial configuration. Let's set that up. We're using the default configuration mode, which you can customize as needed using a config file, which we recommend, or environment variables.

First let's get a sample dataset ready:

mkdir -p ./ragtest/input

Now let's get a copy of A Christmas Carol by Charles Dickens from a trusted source

curl https://www.gutenberg.org/cache/epub/24022/pg24022.txt > ./ragtest/input/book.txt

Next we'll inject some required config variables:

Set Up Your Workspace Variables

First let's make sure to setup the required environment variables. For details on these environment variables, and what environment variables are available, see the variables documentation.

To initialize your workspace, let's first run the graphrag.index --init command. Since we have already configured a directory named .ragtest` in the previous step, we can run the following command:

python -m graphrag.index --init --root ./ragtest

This will create two files: .env and settings.yaml in the ./ragtest directory.

OpenAI and Azure OpenAI

To run in OpenAI mode, just make sure to update the value of GRAPHRAG_API_KEY in the .env file with your OpenAI API key.

Azure OpenAI

In addition, Azure OpenAI users should set the following variables in the settings.yaml file. To find the appropriate sections, just search for the llm: configuration, you should see two sections, one for the chat endpoint and one for the embeddings endpoint. Here is an example of how to configure the chat endpoint:

type: azure_openai_chat # Or azure_openai_embedding for embeddings
api_base: https://<instance>.openai.azure.com
api_version: 2024-02-15-preview # You can customize this for other versions
deployment_name: <azure_model_deployment_name>

Running the Indexing pipeline

Finally we'll run the pipeline!

python -m graphrag.index --root ./ragtest

pipeline executing from the CLI

This process will take some time to run. This depends on the size of your input data, what model you're using, and the text chunk size being used (these can be configured in your settings.yml file). Once the pipeline is complete, you should see a new folder called ./ragtest/output/<timestamp>/artifacts with a series of parquet files.

Using the Query Engine

Running the Query Engine

Now let's ask some questions using this dataset.

Here is an example using Global search to ask a high-level question:

python -m graphrag.query \
--root ./ragtest \
--method global \
"What are the top themes in this story?"

Here is an example using Local search to ask a more specific question about a particular character:

python -m graphrag.query \
--root ./ragtest \
--method local \
"Who is Scrooge, and what are his main relationships?"

Please refer to Query Engine docs for detailed information about how to leverage our Local and Global search mechanisms for extracting meaningful insights from data after the Indexer has wrapped up execution.