Skip to content

Development Guide

Requirements

Name Installation Purpose
Python 3.11+ Download The library is Python-based.
uv Instructions uv is used for package management and virtualenv management in Python codebases

Installing dependencies

# Install Python dependencies
uv sync

Generating synthetic queries

Follow these steps to generate synthetic queries using AutoQ:

  1. Set up your project directory:

    mkdir -p ./local/autoq_test
    cd ./local/autoq_test
    

  2. Create an input folder and add your input data:

    mkdir ./input
    
    Place your input files inside the ./input directory. To get started, you can use the AP News dataset provided in the datasets folder. To download this example dataset directly into your input folder, run:
    uv run benchmark-qed data download AP_news input
    
    You can also download directly to Azure Blob Storage. See the Datasets documentation for storage options.

  3. Initialize the configuration:

    uv run benchmark-qed config init autoq .
    
    This is the local-filesystem variant.

    Alternative blob variant (choose this instead of the local command above; do not run both):

    uv run benchmark-qed config init autoq . \
        --storage-type blob \
        --container-name my-container \
        --account-url https://<account>.blob.core.windows.net \
        --base-dir autoq_test
    
    This command creates two files in the ./autoq_test directory: - .env: Stores environment variables for the AutoQ pipeline. Open this file and replace <API_KEY> with your OpenAI or Azure API key. - settings.yaml: Contains pipeline settings. Edit this file as needed for your use case.

    The generated settings.yaml includes commented-out sections for configuring Azure Blob Storage as input and output backends. Uncomment and fill in the storage section under input to read data from blob storage, or the output_storage section to write results to blob storage instead of the local filesystem.

  4. Generate synthetic queries:

    uv run benchmark-qed autoq settings.yaml output
    
    This is the local-filesystem variant.

    Alternative blob-stored config variant (choose this instead of the local command above; do not run both):

    uv run benchmark-qed autoq blob://my-container/autoq_test/settings.yaml output \
        --account-url https://<account>.blob.core.windows.net
    
    This will process your input data and save the generated queries in the output directory.

    By default, AutoQ also generates assertions for data-driven queries. Assertions are testable factual statements that can be used to evaluate answer accuracy. You can configure assertion generation in settings.yaml:

    assertions:
      max_assertions: 20  # Set to 0 to disable, or null for unlimited
      enable_validation: true  # Enable to filter low-quality assertions (can be slow)
    

Comparing RAG answer pairs

Follow these steps to compare RAG answer pairs using the pairwise scoring pipeline:

  1. Set up your project directory:

    mkdir -p ./local/pairwise_test
    cd ./local/pairwise_test
    

  2. Create an input folder and add your question-answer data:

    mkdir ./input
    
    Copy your RAG answer files into the ./input directory. To get started, you can use the example RAG answers available in the example data folder. To download this example dataset directly into your input folder, run:
    uv run benchmark-qed data download example_answers input
    
    You can also download directly to Azure Blob Storage. See the Datasets documentation for storage options.

  3. Create a configuration file for pairwise comparison:

    uv run benchmark-qed config init autoe_pairwise .
    
    This is the local-filesystem variant.

    Alternative blob variant (choose this instead of the local command above; do not run both):

    uv run benchmark-qed config init autoe_pairwise . \
        --storage-type blob \
        --container-name my-container \
        --account-url https://<account>.blob.core.windows.net \
        --base-dir pairwise_test
    
    This command creates two files in the ./pairwise_test directory: - .env: Contains environment variables for the pairwise comparison tests. Open this file and replace <API_KEY> with your OpenAI or Azure API key. - settings.yaml: Contains pipeline settings, which you can modify as needed.

    The generated settings.yaml includes commented-out input_storage and output_storage sections for configuring Azure Blob Storage backends.

  4. Run the pairwise comparison:

    uv run benchmark-qed autoe pairwise-scores settings.yaml output
    
    This is the local-filesystem variant.

    Alternative blob-stored config variant (choose this instead of the local command above; do not run both):

    uv run benchmark-qed autoe pairwise-scores blob://my-container/pairwise_test/settings.yaml output \
        --account-url https://<account>.blob.core.windows.net
    
    The results will be saved in the output directory.

Scoring RAG answers against reference answers

Follow these steps to score RAG answers against reference answers using example data from the AP news dataset:

  1. Set up your project directory:

    mkdir -p ./local/reference_test
    cd ./local/reference_test
    

  2. Create an input folder and add your data:

    mkdir ./input
    
    Copy your RAG answers and reference answers into the input directory. To get started, you can use the example RAG answers available in the example data folder. To download this example dataset directly into your input folder, run:
    uv run benchmark-qed data download example_answers input
    
    You can also download directly to Azure Blob Storage. See the Datasets documentation for storage options.

  3. Create a configuration file for reference scoring:

    uv run benchmark-qed config init autoe_reference .
    
    This is the local-filesystem variant.

    Alternative blob variant (choose this instead of the local command above; do not run both):

    uv run benchmark-qed config init autoe_reference . \
        --storage-type blob \
        --container-name my-container \
        --account-url https://<account>.blob.core.windows.net \
        --base-dir reference_test
    
    This creates two files in the ./reference_test directory: - .env: Contains environment variables for the reference scoring pipeline. Open this file and replace <API_KEY> with your OpenAI or Azure API key. - settings.yaml: Contains pipeline settings, which you can modify as needed.

    The generated settings.yaml includes commented-out input_storage and output_storage sections for configuring Azure Blob Storage backends.

  4. Run the reference scoring:

    uv run benchmark-qed autoe reference-scores settings.yaml output
    
    This is the local-filesystem variant.

    Alternative blob-stored config variant (choose this instead of the local command above; do not run both):

    uv run benchmark-qed autoe reference-scores blob://my-container/reference_test/settings.yaml output \
        --account-url https://<account>.blob.core.windows.net
    
    The results will be saved in the output directory.

For detailed instructions on configuring and running AutoE subcommands, please refer to the AutoE CLI Documentation.

To learn how to use AutoE programmatically, please see the AutoE Notebook Example.

Running with Blob-Stored Configuration

If your settings.yaml is stored in Azure Blob Storage, pass a blob:// URI as the config path instead of a local file path.

Blob URI format

blob://<container-name>/<optional-base-dir>/settings.yaml

If you initialized config with --base-dir, include that same prefix in the URI.

Authentication options

For commands that read config from blob://, pass one of:

  • --account-url https://<account>.blob.core.windows.net (managed identity)
  • --connection-string "$AZURE_STORAGE_CONNECTION_STRING"

If neither flag is provided, auth falls back to environment variables:

  • AZURE_STORAGE_ACCOUNT_URL
  • AZURE_STORAGE_CONNECTION_STRING

AutoQ example

uv run benchmark-qed autoq blob://my-container/autoq_test/settings.yaml output \
    --account-url https://<account>.blob.core.windows.net

AutoE examples

Pairwise:

uv run benchmark-qed autoe pairwise-scores blob://my-container/pairwise_test/settings.yaml output \
    --account-url https://<account>.blob.core.windows.net

Reference:

uv run benchmark-qed autoe reference-scores blob://my-container/reference_test/settings.yaml output \
    --account-url https://<account>.blob.core.windows.net

Assertion:

uv run benchmark-qed autoe assertion-scores blob://my-container/assertion_test/settings.yaml output \
    --account-url https://<account>.blob.core.windows.net

When a blob config is loaded, the CLI downloads settings.yaml and sibling files (such as .env and prompts/) under the same prefix into a temporary local directory before execution.

Diving Deeper

To explore the query synthesis workflow in detail, please see the AutoQ CLI Documentation for command-line usage and the AutoQ Notebook Example for a step-by-step programmatic guide.

For a deeper understanding of AutoE evaluation pipelines, please refer to the AutoE CLI Documentation for available commands and the AutoE Notebook Example for hands-on examples.