Data Provenance Model for AI Agents¶
Purpose: Track the origin, lineage, and transformation of data that flows through AI agent systems. Required by EU AI Act Article 10 (data governance) for high-risk AI systems.
Deadline: EU AI Act high-risk provisions apply from August 2, 2026.
1. Overview¶
AI agents consume data from multiple sources (tools, APIs, databases, other agents) and produce outputs that may affect individuals. Data provenance tracking answers:
- Where did the data come from? (origin)
- How was it transformed? (lineage)
- What decisions was it used for? (purpose)
- What is its quality and classification? (metadata)
2. Provenance Record Schema¶
# Example provenance record for a tool call
provenance:
record_id: "prov-2026-07-15-abc123"
timestamp: "2026-07-15T10:30:00Z"
agent_did: "did:agentmesh:customer-service-agent"
# Source: where did the data come from?
source:
type: tool_output # tool_output | api_response | agent_message | user_input | database | file
tool_name: read_customer_record
endpoint: "https://api.internal/customers/12345"
source_agent_did: null # if from another agent
source_classification: confidential
# Data metadata
data:
classification: confidential # public | internal | confidential | restricted
contains_pii: true
pii_types: [name, email, phone]
jurisdiction: [EU, US]
size_bytes: 2048
hash: "sha256:a1b2c3..." # content hash for integrity verification
# Transformation: what happened to it?
transformation:
type: none # none | aggregation | anonymization | redaction | enrichment
description: "Raw customer record, no transformation"
# Purpose: what was it used for?
purpose:
decision_type: customer_inquiry_response
policy_decision: allow
policy_rule: allow-read-customer
audit_entry_id: "audit_abc123"
# Retention
retention:
policy: "3_years"
expiry: "2029-07-15T10:30:00Z"
legal_hold: false
3. Provenance Chain¶
When data flows through multiple agents or transformation steps, provenance records form a chain:
User Input (prov-001)
โ
โผ
Agent A reads customer DB (prov-002, parent: prov-001)
โ
โผ
Agent A sends summary to Agent B (prov-003, parent: prov-002)
โ transformation: aggregation
โ classification ratchet: confidential โ confidential
โ
โผ
Agent B generates response (prov-004, parent: prov-003)
โ transformation: enrichment
โ
โผ
Response to user (prov-005, parent: prov-004)
โ policy check: pre_output stage
โ PII redacted
4. Integration with AGT¶
4.1 With Session State (Attribute Ratchets)¶
Data provenance feeds the monotonic attribute ratchets:
from agentmesh.governance import SessionState, SessionAttribute
state = SessionState([
SessionAttribute(
name="data_sensitivity",
ordering=["public", "internal", "confidential", "restricted"],
monotonic=True,
),
])
# After tool returns, check provenance and ratchet sensitivity
provenance = get_provenance(tool_output)
state.set("data_sensitivity", provenance["data"]["classification"])
# If tool returned "confidential" data, sensitivity ratchets up
# and cannot be reset for the rest of the session
4.2 With Multi-Stage Pipeline¶
rules:
# post_tool stage: check data classification from tool output
- name: block-restricted-forwarding
stage: post_tool
condition: "tool.output.classification == 'restricted'"
action: deny
description: "Restricted data cannot be forwarded to other agents"
# pre_output stage: check provenance before responding
- name: redact-pii-in-response
stage: pre_output
condition: "response.provenance.contains_pii"
action: require_approval
description: "Responses containing PII require human review"
4.3 With Audit Trail¶
Every provenance record links to an audit entry:
from agentmesh.governance import AuditLog
audit = AuditLog()
audit.log(
event_type="data_provenance",
agent_did="did:agentmesh:agent-1",
action="read_customer_record",
data={
"provenance_id": "prov-002",
"source": "tool:read_customer_record",
"classification": "confidential",
"contains_pii": True,
"pii_types": ["name", "email"],
},
outcome="allow",
)
5. EU AI Act Article 10 Mapping¶
| Article 10 Requirement | AGT Provenance Feature |
|---|---|
| 10(2)(a) Relevant design choices | Provenance schema documents data source selection |
| 10(2)(b) Data collection processes | source field tracks origin and collection method |
| 10(2)(c) Data preparation operations | transformation field records all processing steps |
| 10(2)(d) Formulation of assumptions | Policy YAML documents decision criteria |
| 10(2)(e) Assessment of availability, quantity, and suitability | Data quality metadata (classification, size, hash) |
| 10(2)(f) Examination for bias | pii_types and jurisdiction enable bias analysis |
| 10(3) Data governance and management practices | Provenance chain + retention policy |
| 10(4) Specific considerations for personal data | contains_pii, pii_types, jurisdiction fields |
| 10(5) Specific data processing provisions | transformation field, attribute ratchets |
6. Implementation Status¶
| Component | Status | Notes |
|---|---|---|
| Provenance schema | โ Defined | This document |
| Audit trail integration | โ Shipped | AuditLog.log(event_type="data_provenance", ...) |
| Session state ratchets | โ Shipped | SessionState with monotonic attributes |
| Multi-stage policy | โ Shipped | post_tool stage checks tool output classification |
| Provenance chain tracking | ๐ Planned | Linked provenance records with parent IDs |
| Provenance Python API | ๐ Planned | ProvenanceTracker class |
| Automated PII detection in provenance | ๐ Planned | Integration with advisory classifiers |
Related: Record Retention Policy ยท FRIA Template ยท EU AI Act Checklist ยท Tutorial 39 โ DLP Attribute Ratchets