Datasets
BenchmarkQED offers two datasets to facilitate the development and evaluation of Retrieval-Augmented Generation (RAG) systems:
- Podcast Transcripts: Contains transcripts from 70 episodes of the Behind the Tech podcast series. This is an updated version of the dataset featured in the GraphRAG paper.
- AP News: Includes 1,397 health-related news articles from the Associated Press.
To download these datasets programmatically, use the following commands:
- Podcast Transcripts:
- AP News:
Replace OUTPUT_DIR
with the path to the directory where you want the dataset to be saved.
You can also find these datasets in the datasets directory.