Assertion Generation for Existing Questions¶

This notebook demonstrates how to generate assertions for existing data-local and data-global questions that were previously generated without assertions (e.g., when max_assertions=0 was used or assertions were disabled during question generation).

This is useful when you want to retroactively add assertion-based evaluation capabilities to existing question sets.

In [ ]:

Copied!

# Copyright (c) 2025 Microsoft Corporation.

import sys

sys.path.insert(1, "../../../")
# Copyright (c) 2025 Microsoft Corporation.

import sys

sys.path.insert(1, "../../../")

In [ ]:

Copied!

%load_ext dotenv
%dotenv
%load_ext dotenv
%dotenv

In [ ]:

Copied!





import asyncio
import logging
import os

from pydantic import SecretStr

from benchmark_qed.autoq.io.question import (
    load_questions,
    save_questions,
)
from benchmark_qed.autoq.question_gen.data_questions.assertion_gen import (
    AssertionValidator,
)
from benchmark_qed.config.llm_config import LLMConfig, LLMProvider
from benchmark_qed.llm.factory import ModelFactory

logging.basicConfig(level=logging.INFO)

if logging.getLogger("httpx") is not None:
    logging.getLogger("httpx").setLevel(logging.ERROR)
import asyncio
import logging
import os

from pydantic import SecretStr

from benchmark_qed.autoq.io.question import (
    load_questions,
    save_questions,
)
from benchmark_qed.autoq.question_gen.data_questions.assertion_gen import (
    AssertionValidator,
)
from benchmark_qed.config.llm_config import LLMConfig, LLMProvider
from benchmark_qed.llm.factory import ModelFactory

logging.basicConfig(level=logging.INFO)

if logging.getLogger("httpx") is not None:
    logging.getLogger("httpx").setLevel(logging.ERROR)

Shared Configuration¶

Common settings for data path and LLM model used by both local and global assertion generation.

In [ ]:

Copied!





# =============================================================================
# SHARED CONFIGURATION - Used by both local and global assertion generation
# =============================================================================

# DATA PATH - where to load/save questions
OUTPUT_QUESTIONS_PATH = "../../local/ap_news/output"

# MODEL CONFIGS
API_KEY = SecretStr(os.getenv("OPENAI_API_KEY", ""))
LLM_MODEL = "gpt-5.2"
LLM_PARAMS = {
    "temperature": 0.0,
    "seed": 42,
}

# CONCURRENCY - adjust based on your model rate limits
CONCURRENT_REQUESTS = 32

# Create LLM instance
llm = ModelFactory.create_chat_model(
    model_config=LLMConfig(
        model=LLM_MODEL,
        api_key=API_KEY,
        llm_provider=LLMProvider.OpenAIChat,
        call_args=LLM_PARAMS,
    )
)

print(f"LLM configured: {LLM_MODEL}")
print(f"Data path: {OUTPUT_QUESTIONS_PATH}")
# =============================================================================
# SHARED CONFIGURATION - Used by both local and global assertion generation
# =============================================================================

# DATA PATH - where to load/save questions
OUTPUT_QUESTIONS_PATH = "../../local/ap_news/output"

# MODEL CONFIGS
API_KEY = SecretStr(os.getenv("OPENAI_API_KEY", ""))
LLM_MODEL = "gpt-5.2"
LLM_PARAMS = {
    "temperature": 0.0,
    "seed": 42,
}

# CONCURRENCY - adjust based on your model rate limits
CONCURRENT_REQUESTS = 32

# Create LLM instance
llm = ModelFactory.create_chat_model(
    model_config=LLMConfig(
        model=LLM_MODEL,
        api_key=API_KEY,
        llm_provider=LLMProvider.OpenAIChat,
        call_args=LLM_PARAMS,
    )
)

print(f"LLM configured: {LLM_MODEL}")
print(f"Data path: {OUTPUT_QUESTIONS_PATH}")

Data-Local Assertions¶

Generate assertions for data-local questions. These are factual, single-source questions that require strict grounding validation.

In [ ]:

Copied!





# =============================================================================
# DATA-LOCAL SETTINGS
# =============================================================================

# Maximum assertions per question
LOCAL_MAX_ASSERTIONS = 20

# Validation settings
LOCAL_ENABLE_VALIDATION = True
LOCAL_MIN_VALIDATION_SCORE = (
    3  # Minimum score (1-5) for grounding, relevance, verifiability
)

# Parallelism - questions to process in parallel
LOCAL_CONCURRENT_QUESTIONS = 8

# Create local validator (uses local_validation_prompt - stricter, fact-focused)
local_validator = (
    AssertionValidator(
        llm=llm,
        llm_params=LLM_PARAMS,
        min_criterion_score=LOCAL_MIN_VALIDATION_SCORE,
        concurrent_validations=CONCURRENT_REQUESTS,
    )
    if LOCAL_ENABLE_VALIDATION
    else None
)

if local_validator:
    print(f"Local validation enabled (min score: {LOCAL_MIN_VALIDATION_SCORE}/5)")
    print("  - Uses local_validation_prompt (fact-focused, strict grounding)")
else:
    print("Local validation disabled")
# =============================================================================
# DATA-LOCAL SETTINGS
# =============================================================================

# Maximum assertions per question
LOCAL_MAX_ASSERTIONS = 20

# Validation settings
LOCAL_ENABLE_VALIDATION = True
LOCAL_MIN_VALIDATION_SCORE = (
    3  # Minimum score (1-5) for grounding, relevance, verifiability
)

# Parallelism - questions to process in parallel
LOCAL_CONCURRENT_QUESTIONS = 8

# Create local validator (uses local_validation_prompt - stricter, fact-focused)
local_validator = (
    AssertionValidator(
        llm=llm,
        llm_params=LLM_PARAMS,
        min_criterion_score=LOCAL_MIN_VALIDATION_SCORE,
        concurrent_validations=CONCURRENT_REQUESTS,
    )
    if LOCAL_ENABLE_VALIDATION
    else None
)

if local_validator:
    print(f"Local validation enabled (min score: {LOCAL_MIN_VALIDATION_SCORE}/5)")
    print("  - Uses local_validation_prompt (fact-focused, strict grounding)")
else:
    print("Local validation disabled")

In [ ]:

Copied!





from graphrag_storage.file_storage import FileStorage

from benchmark_qed.autoq.question_gen.data_questions.assertion_gen.local_claim_assertion_gen import (
    LocalClaimAssertionGenerator,
)

# Load existing data-local questions from disk
local_question_storage = FileStorage(f"{OUTPUT_QUESTIONS_PATH}/data_local_questions/")
existing_local_questions = asyncio.run(
    load_questions(local_question_storage, "selected_questions.json")
)

print(f"Loaded {len(existing_local_questions)} existing data-local questions")

# Filter questions that have claims
questions_with_claims = [
    q
    for q in existing_local_questions
    if hasattr(q, "attributes") and q.attributes and "claims" in q.attributes
]
questions_without_claims = [
    q
    for q in existing_local_questions
    if not (hasattr(q, "attributes") and q.attributes and "claims" in q.attributes)
]

print(
    f"Questions with claims: {len(questions_with_claims)}, without claims: {len(questions_without_claims)}"
)

# Initialize local assertion generator
local_assertion_generator = LocalClaimAssertionGenerator(
    llm=llm,
    max_assertions=LOCAL_MAX_ASSERTIONS,
    validator=local_validator,
    max_concurrent_questions=LOCAL_CONCURRENT_QUESTIONS,
)

# Generate assertions for all questions with claims
asyncio.run(
    local_assertion_generator.agenerate_assertions_for_questions(questions_with_claims)
)

# Combine back with questions that had no claims
updated_local_questions = questions_with_claims + questions_without_claims

# Save updated questions with assertions
asyncio.run(
    save_questions(
        updated_local_questions,
        local_question_storage,
        "selected_questions_with_assertions",
    )
)

# Show summary
print("\n=== SUMMARY ===")
print(f"Processed {len(questions_with_claims)} questions with claims")
total_assertions = sum(
    len(q.attributes.get("assertions", []))
    for q in questions_with_claims
    if q.attributes is not None
)
print(f"Total assertions generated: {total_assertions}")
print(
    f"Average assertions per question: {total_assertions / max(len(questions_with_claims), 1):.1f}"
)
from graphrag_storage.file_storage import FileStorage

from benchmark_qed.autoq.question_gen.data_questions.assertion_gen.local_claim_assertion_gen import (
    LocalClaimAssertionGenerator,
)

# Load existing data-local questions from disk
local_question_storage = FileStorage(f"{OUTPUT_QUESTIONS_PATH}/data_local_questions/")
existing_local_questions = asyncio.run(
    load_questions(local_question_storage, "selected_questions.json")
)

print(f"Loaded {len(existing_local_questions)} existing data-local questions")

# Filter questions that have claims
questions_with_claims = [
    q
    for q in existing_local_questions
    if hasattr(q, "attributes") and q.attributes and "claims" in q.attributes
]
questions_without_claims = [
    q
    for q in existing_local_questions
    if not (hasattr(q, "attributes") and q.attributes and "claims" in q.attributes)
]

print(
    f"Questions with claims: {len(questions_with_claims)}, without claims: {len(questions_without_claims)}"
)

# Initialize local assertion generator
local_assertion_generator = LocalClaimAssertionGenerator(
    llm=llm,
    max_assertions=LOCAL_MAX_ASSERTIONS,
    validator=local_validator,
    max_concurrent_questions=LOCAL_CONCURRENT_QUESTIONS,
)

# Generate assertions for all questions with claims
asyncio.run(
    local_assertion_generator.agenerate_assertions_for_questions(questions_with_claims)
)

# Combine back with questions that had no claims
updated_local_questions = questions_with_claims + questions_without_claims

# Save updated questions with assertions
asyncio.run(
    save_questions(
        updated_local_questions,
        local_question_storage,
        "selected_questions_with_assertions",
    )
)

# Show summary
print("\n=== SUMMARY ===")
print(f"Processed {len(questions_with_claims)} questions with claims")
total_assertions = sum(
    len(q.attributes.get("assertions", []))
    for q in questions_with_claims
    if q.attributes is not None
)
print(f"Total assertions generated: {total_assertions}")
print(
    f"Average assertions per question: {total_assertions / max(len(questions_with_claims), 1):.1f}"
)

Data-Global Assertions¶

Generate assertions for data-global questions. These are thematic, cross-document questions that use a map-reduce approach:

Map step: Generate factual assertions from claim batches (uses semantic grouping)
Reduce step: Consolidate into high-level thematic assertions

In [ ]:

Copied!





# =============================================================================
# DATA-GLOBAL SETTINGS
# =============================================================================

# Maximum assertions per question
GLOBAL_MAX_ASSERTIONS = 20

# Validation settings
GLOBAL_ENABLE_VALIDATION = True
GLOBAL_MIN_VALIDATION_SCORE = (
    3  # Minimum score (1-5) for grounding, relevance, verifiability
)

# Map-reduce batching settings
GLOBAL_BATCH_SIZE = 100  # Batch size when semantic grouping is disabled
GLOBAL_MAP_DATA_TOKENS = (
    8000  # Max tokens per cluster in map step (when semantic grouping enabled)
)
GLOBAL_REDUCE_DATA_TOKENS = 32000  # Max input tokens for reduce step

# Semantic grouping - groups similar claims together before map step
GLOBAL_ENABLE_SEMANTIC_GROUPING = True

# Validation phases
GLOBAL_VALIDATE_MAP_ASSERTIONS = (
    True  # Validate map assertions before reduce (increases LLM calls)
)
GLOBAL_VALIDATE_REDUCE_ASSERTIONS = True  # Validate final assertions after reduce

# Parallelism - questions to process in parallel (lower due to internal parallelism)
GLOBAL_CONCURRENT_QUESTIONS = 2

# Create text embedder for semantic grouping
text_embedder = None
if GLOBAL_ENABLE_SEMANTIC_GROUPING:
    from benchmark_qed.autod.data_processor.embedding import TextEmbedder
    from benchmark_qed.config.llm_config import (
        LLMConfig as EmbeddingConfig,
    )
    from benchmark_qed.config.llm_config import (
        LLMProvider as EmbeddingProvider,
    )
    from benchmark_qed.llm.factory import ModelFactory as EmbeddingModelFactory

    embedding_model = EmbeddingModelFactory.create_embedding_model(
        model_config=EmbeddingConfig(
            model="text-embedding-3-large",
            api_key=API_KEY,
            llm_provider=EmbeddingProvider.OpenAIEmbedding,
        )
    )
    text_embedder = TextEmbedder(embedding_model)
    print("Semantic grouping enabled - similar claims will be grouped together")

# Create map validator (fact-focused, for map assertions)
map_validator = (
    AssertionValidator(
        llm=llm,
        llm_params=LLM_PARAMS,
        min_criterion_score=GLOBAL_MIN_VALIDATION_SCORE,
        concurrent_validations=CONCURRENT_REQUESTS,
    )
    if GLOBAL_ENABLE_VALIDATION and GLOBAL_VALIDATE_MAP_ASSERTIONS
    else None
)

# Create reduce validator (thematic, for final assertions)
reduce_validator = None
if GLOBAL_ENABLE_VALIDATION and GLOBAL_VALIDATE_REDUCE_ASSERTIONS:
    from pathlib import Path

    from benchmark_qed.autoq.prompts import data_questions
    from benchmark_qed.config.utils import load_template_file

    global_validation_prompt = load_template_file(
        Path(data_questions.__file__).parent
        / "assertions"
        / "global_validation_prompt.txt"
    )
    reduce_validator = AssertionValidator(
        llm=llm,
        llm_params=LLM_PARAMS,
        min_criterion_score=GLOBAL_MIN_VALIDATION_SCORE,
        concurrent_validations=CONCURRENT_REQUESTS,
        validation_prompt=global_validation_prompt,
    )

print("\nGlobal assertion settings:")
print(f"  - Max assertions: {GLOBAL_MAX_ASSERTIONS}")
print(f"  - Map data tokens: {GLOBAL_MAP_DATA_TOKENS}")
print(f"  - Reduce data tokens: {GLOBAL_REDUCE_DATA_TOKENS}")
print(f"  - Semantic grouping: {GLOBAL_ENABLE_SEMANTIC_GROUPING}")
print(
    f"  - Map validation: {GLOBAL_VALIDATE_MAP_ASSERTIONS} (min score: {GLOBAL_MIN_VALIDATION_SCORE}/5)"
)
print(
    f"  - Reduce validation: {GLOBAL_VALIDATE_REDUCE_ASSERTIONS} (min score: {GLOBAL_MIN_VALIDATION_SCORE}/5)"
)
# =============================================================================
# DATA-GLOBAL SETTINGS
# =============================================================================

# Maximum assertions per question
GLOBAL_MAX_ASSERTIONS = 20

# Validation settings
GLOBAL_ENABLE_VALIDATION = True
GLOBAL_MIN_VALIDATION_SCORE = (
    3  # Minimum score (1-5) for grounding, relevance, verifiability
)

# Map-reduce batching settings
GLOBAL_BATCH_SIZE = 100  # Batch size when semantic grouping is disabled
GLOBAL_MAP_DATA_TOKENS = (
    8000  # Max tokens per cluster in map step (when semantic grouping enabled)
)
GLOBAL_REDUCE_DATA_TOKENS = 32000  # Max input tokens for reduce step

# Semantic grouping - groups similar claims together before map step
GLOBAL_ENABLE_SEMANTIC_GROUPING = True

# Validation phases
GLOBAL_VALIDATE_MAP_ASSERTIONS = (
    True  # Validate map assertions before reduce (increases LLM calls)
)
GLOBAL_VALIDATE_REDUCE_ASSERTIONS = True  # Validate final assertions after reduce

# Parallelism - questions to process in parallel (lower due to internal parallelism)
GLOBAL_CONCURRENT_QUESTIONS = 2

# Create text embedder for semantic grouping
text_embedder = None
if GLOBAL_ENABLE_SEMANTIC_GROUPING:
    from benchmark_qed.autod.data_processor.embedding import TextEmbedder
    from benchmark_qed.config.llm_config import (
        LLMConfig as EmbeddingConfig,
    )
    from benchmark_qed.config.llm_config import (
        LLMProvider as EmbeddingProvider,
    )
    from benchmark_qed.llm.factory import ModelFactory as EmbeddingModelFactory

    embedding_model = EmbeddingModelFactory.create_embedding_model(
        model_config=EmbeddingConfig(
            model="text-embedding-3-large",
            api_key=API_KEY,
            llm_provider=EmbeddingProvider.OpenAIEmbedding,
        )
    )
    text_embedder = TextEmbedder(embedding_model)
    print("Semantic grouping enabled - similar claims will be grouped together")

# Create map validator (fact-focused, for map assertions)
map_validator = (
    AssertionValidator(
        llm=llm,
        llm_params=LLM_PARAMS,
        min_criterion_score=GLOBAL_MIN_VALIDATION_SCORE,
        concurrent_validations=CONCURRENT_REQUESTS,
    )
    if GLOBAL_ENABLE_VALIDATION and GLOBAL_VALIDATE_MAP_ASSERTIONS
    else None
)

# Create reduce validator (thematic, for final assertions)
reduce_validator = None
if GLOBAL_ENABLE_VALIDATION and GLOBAL_VALIDATE_REDUCE_ASSERTIONS:
    from pathlib import Path

    from benchmark_qed.autoq.prompts import data_questions
    from benchmark_qed.config.utils import load_template_file

    global_validation_prompt = load_template_file(
        Path(data_questions.__file__).parent
        / "assertions"
        / "global_validation_prompt.txt"
    )
    reduce_validator = AssertionValidator(
        llm=llm,
        llm_params=LLM_PARAMS,
        min_criterion_score=GLOBAL_MIN_VALIDATION_SCORE,
        concurrent_validations=CONCURRENT_REQUESTS,
        validation_prompt=global_validation_prompt,
    )

print("\nGlobal assertion settings:")
print(f"  - Max assertions: {GLOBAL_MAX_ASSERTIONS}")
print(f"  - Map data tokens: {GLOBAL_MAP_DATA_TOKENS}")
print(f"  - Reduce data tokens: {GLOBAL_REDUCE_DATA_TOKENS}")
print(f"  - Semantic grouping: {GLOBAL_ENABLE_SEMANTIC_GROUPING}")
print(
    f"  - Map validation: {GLOBAL_VALIDATE_MAP_ASSERTIONS} (min score: {GLOBAL_MIN_VALIDATION_SCORE}/5)"
)
print(
    f"  - Reduce validation: {GLOBAL_VALIDATE_REDUCE_ASSERTIONS} (min score: {GLOBAL_MIN_VALIDATION_SCORE}/5)"
)

In [ ]:

Copied!





from benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen import (
    GlobalClaimAssertionGenerator,
)

# Load existing data-global questions from disk
global_question_storage = FileStorage(f"{OUTPUT_QUESTIONS_PATH}/data_global_questions/")
existing_global_questions = asyncio.run(
    load_questions(global_question_storage, "selected_questions.json")
)

print(f"Loaded {len(existing_global_questions)} existing data-global questions")

# Filter questions that have claims
questions_with_claims = [
    q
    for q in existing_global_questions
    if hasattr(q, "attributes") and q.attributes and "claims" in q.attributes
]
questions_without_claims = [
    q
    for q in existing_global_questions
    if not (hasattr(q, "attributes") and q.attributes and "claims" in q.attributes)
]

print(
    f"Questions with claims: {len(questions_with_claims)}, without claims: {len(questions_without_claims)}"
)

# Initialize global assertion generator
global_assertion_generator = GlobalClaimAssertionGenerator(
    llm=llm,
    max_assertions=GLOBAL_MAX_ASSERTIONS,
    batch_size=GLOBAL_BATCH_SIZE,
    map_data_tokens=GLOBAL_MAP_DATA_TOKENS,
    reduce_data_tokens=GLOBAL_REDUCE_DATA_TOKENS,
    concurrent_coroutines=CONCURRENT_REQUESTS,
    map_validator=map_validator,
    reduce_validator=reduce_validator,
    max_concurrent_questions=GLOBAL_CONCURRENT_QUESTIONS,
    text_embedder=text_embedder,
    enable_semantic_grouping=GLOBAL_ENABLE_SEMANTIC_GROUPING,
    validate_map_assertions=GLOBAL_VALIDATE_MAP_ASSERTIONS,
    validate_reduce_assertions=GLOBAL_VALIDATE_REDUCE_ASSERTIONS,
)

# Generate assertions for ALL questions with claims
asyncio.run(
    global_assertion_generator.agenerate_assertions_for_questions(questions_with_claims)
)

# Combine back with questions that had no claims
updated_global_questions = questions_with_claims + questions_without_claims

# Save updated questions with assertions
asyncio.run(
    save_questions(
        updated_global_questions,
        global_question_storage,
        "selected_questions_with_assertions",
    )
)

# Show summary
print("\n=== SUMMARY ===")
print(f"Processed {len(questions_with_claims)} questions with claims")
total_assertions = sum(
    len(q.attributes.get("assertions", []))
    for q in questions_with_claims
    if q.attributes is not None
)
total_map_assertions = sum(
    len(q.attributes.get("map_assertions", []))
    for q in questions_with_claims
    if q.attributes is not None
)
print(f"Total map assertions generated: {total_map_assertions}")
print(f"Total final assertions generated: {total_assertions}")
print(
    f"Average assertions per question: {total_assertions / max(len(questions_with_claims), 1):.1f}"
)

# Show per-question breakdown
print("\n=== PER-QUESTION BREAKDOWN ===")
for i, q in enumerate(questions_with_claims, 1):
    assertions = q.attributes.get("assertions", []) if q.attributes is not None else []
    map_assertions = (
        q.attributes.get("map_assertions", []) if q.attributes is not None else []
    )
    print(
        f"{i}. [{len(map_assertions)} map → {len(assertions)} final] {q.text[:60]}..."
    )
from benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen import (
    GlobalClaimAssertionGenerator,
)

# Load existing data-global questions from disk
global_question_storage = FileStorage(f"{OUTPUT_QUESTIONS_PATH}/data_global_questions/")
existing_global_questions = asyncio.run(
    load_questions(global_question_storage, "selected_questions.json")
)

print(f"Loaded {len(existing_global_questions)} existing data-global questions")

# Filter questions that have claims
questions_with_claims = [
    q
    for q in existing_global_questions
    if hasattr(q, "attributes") and q.attributes and "claims" in q.attributes
]
questions_without_claims = [
    q
    for q in existing_global_questions
    if not (hasattr(q, "attributes") and q.attributes and "claims" in q.attributes)
]

print(
    f"Questions with claims: {len(questions_with_claims)}, without claims: {len(questions_without_claims)}"
)

# Initialize global assertion generator
global_assertion_generator = GlobalClaimAssertionGenerator(
    llm=llm,
    max_assertions=GLOBAL_MAX_ASSERTIONS,
    batch_size=GLOBAL_BATCH_SIZE,
    map_data_tokens=GLOBAL_MAP_DATA_TOKENS,
    reduce_data_tokens=GLOBAL_REDUCE_DATA_TOKENS,
    concurrent_coroutines=CONCURRENT_REQUESTS,
    map_validator=map_validator,
    reduce_validator=reduce_validator,
    max_concurrent_questions=GLOBAL_CONCURRENT_QUESTIONS,
    text_embedder=text_embedder,
    enable_semantic_grouping=GLOBAL_ENABLE_SEMANTIC_GROUPING,
    validate_map_assertions=GLOBAL_VALIDATE_MAP_ASSERTIONS,
    validate_reduce_assertions=GLOBAL_VALIDATE_REDUCE_ASSERTIONS,
)

# Generate assertions for ALL questions with claims
asyncio.run(
    global_assertion_generator.agenerate_assertions_for_questions(questions_with_claims)
)

# Combine back with questions that had no claims
updated_global_questions = questions_with_claims + questions_without_claims

# Save updated questions with assertions
asyncio.run(
    save_questions(
        updated_global_questions,
        global_question_storage,
        "selected_questions_with_assertions",
    )
)

# Show summary
print("\n=== SUMMARY ===")
print(f"Processed {len(questions_with_claims)} questions with claims")
total_assertions = sum(
    len(q.attributes.get("assertions", []))
    for q in questions_with_claims
    if q.attributes is not None
)
total_map_assertions = sum(
    len(q.attributes.get("map_assertions", []))
    for q in questions_with_claims
    if q.attributes is not None
)
print(f"Total map assertions generated: {total_map_assertions}")
print(f"Total final assertions generated: {total_assertions}")
print(
    f"Average assertions per question: {total_assertions / max(len(questions_with_claims), 1):.1f}"
)

# Show per-question breakdown
print("\n=== PER-QUESTION BREAKDOWN ===")
for i, q in enumerate(questions_with_claims, 1):
    assertions = q.attributes.get("assertions", []) if q.attributes is not None else []
    map_assertions = (
        q.attributes.get("map_assertions", []) if q.attributes is not None else []
    )
    print(
        f"{i}. [{len(map_assertions)} map → {len(assertions)} final] {q.text[:60]}..."
    )

Notes on Assertion Generation¶

When to use this approach:

You have existing questions that were generated with max_assertions=0 or without assertion generation
You want to add evaluation capabilities to previously generated question sets
You need to regenerate assertions with different parameters or improved prompts

Input Requirements:

Questions must have claims in their attributes field
For data-local questions: claims should be a list of claim dictionaries
For data-global questions: claims can be in various formats (simple or complex)

Output Format:

Assertions are added to the question's attributes.assertions field
Each assertion contains a statement that can be used for evaluation
Questions without valid claims are left unchanged

Configuration Options:

MAX_ASSERTIONS: Maximum number of assertions to generate per question (default: 20)
ENABLE_VALIDATION: Set to True to validate final assertions for quality (default: True)
MIN_VALIDATION_SCORE: Minimum score (1-5) for validation criteria (default: 3)
BATCH_SIZE: For global questions, controls how many claims are processed together (when semantic grouping disabled)
MAP_DATA_TOKENS: For global questions, max tokens per cluster in the map step (default: 12000, used when semantic grouping enabled)
REDUCE_DATA_TOKENS: For global questions, max input data tokens for the reduce step (default: 32000)
CONCURRENT_REQUESTS: Controls parallel LLM calls for batch processing and validation

Semantic Claim Grouping (Global Questions Only):

ENABLE_SEMANTIC_GROUPING: Set to True to group similar claims together before the map step
When enabled, claims are embedded and clustered using ConstraintKMeans to ensure similar claims are processed together
Clusters are constrained by MAP_DATA_TOKENS to ensure each batch fits within the token limit
This reduces redundancy in map assertions by consolidating semantically similar claims
Requires a text_embedder to be provided to the GlobalClaimAssertionGenerator

Map Assertion Validation (Global Questions Only):

VALIDATE_MAP_ASSERTIONS: Set to True to validate map assertions before the reduce step (default: False)
When enabled, map assertions are validated using the same criteria as final assertions
Low-quality map assertions are filtered out before being passed to the reduce step
Trade-offs:
- Pro: Better quality input to reduce step, fewer low-quality assertions to consolidate
- Con: Increases LLM calls (validation for both map and final assertions)
Recommended when you have many low-quality claims or want highest quality final assertions

Reduce Assertion Validation (Global Questions Only):

VALIDATE_REDUCE_ASSERTIONS: Set to False to skip validation of final assertions (default: True)
When enabled, final assertions are validated after the reduce step
Useful to disable if you've already validated map assertions and want to save LLM calls

Parallelism Settings:

CONCURRENT_LOCAL_QUESTIONS: Questions to process in parallel for local assertions (default: 8)
CONCURRENT_GLOBAL_QUESTIONS: Questions to process in parallel for global assertions (default: 2, lower due to internal parallelism; set to 1 for sequential)

Validation: When ENABLE_VALIDATION=True, each assertion is checked for:

Grounding: Is the assertion factually supported by source texts?
Relevance: Is the assertion useful for evaluating answers to the question?
Verifiability: Is the assertion clear and objectively checkable?

Assertions must score at least MIN_VALIDATION_SCORE on all three criteria to pass validation and be included in the final assertion set.

Important: Validator Prompts for Global Assertions Global assertions use a map-reduce approach with different validation requirements:

Map assertions are factual and specific → use local_validation_prompt (stricter grounding)
Reduce assertions are thematic and synthesize across sources → use global_validation_prompt (allows synthesis)

The notebook creates separate validators:

local_validator: For data-local questions and map assertions (fact-focused)
global_validator: For reduce/final assertions in data-global questions (thematic)

Using the wrong validation prompt (e.g., local prompt for thematic assertions) will result in very low validation pass rates.