Tutorial 29 โ Agent Discovery: Finding Shadow AI in Your Organization¶
Package:
agent-discoveryยท Time: 20 minutes ยท Prerequisites: Python 3.11+
What You'll Learn¶
- How to scan your environment for AI agents running outside governance
- How deduplication prevents overcounting across scanners
- How to reconcile discovered agents against your registry
- How risk scoring identifies the most urgent shadow agents
- How to integrate discovery into your CI/CD pipeline
Why Agent Discovery?¶
The Agent Governance Toolkit excels at governing agents that register with it. But you can't govern what you can't see.
68% of enterprises have AI agents running outside IT visibility. These "shadow agents" operate without identity, without audit trails, and without policy enforcement โ creating compliance risk and security exposure.
Agent Discovery closes the loop:
Discover โ Inventory โ Reconcile โ Govern
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Continuous
Step 1: Install¶
For GitHub scanning:
Step 2: Scan Local Processes¶
The process scanner detects running AI agent processes by matching command-line patterns against 11 known frameworks.
import asyncio
from agent_discovery.scanners import ProcessScanner
async def main():
scanner = ProcessScanner()
result = await scanner.scan()
print(f"Scanned {result.scanned_targets} processes")
print(f"Found {result.agent_count} agents")
for agent in result.agents:
print(f" ๐ค {agent.name}")
print(f" Type: {agent.agent_type}")
print(f" Confidence: {agent.confidence:.0%}")
for ev in agent.evidence:
print(f" Evidence: {ev.detail}")
asyncio.run(main())
Security note: The process scanner automatically redacts API keys, tokens, and JWTs from command-line arguments. No secrets are stored.
Step 3: Scan Filesystem for Config Artifacts¶
The config scanner walks directories looking for agent configuration files โ agentmesh.yaml, crewai.yaml, mcp.json, Dockerfiles with agent images, etc.
import asyncio
from agent_discovery.scanners import ConfigScanner
async def main():
scanner = ConfigScanner()
result = await scanner.scan(
paths=["/opt/agents", "/home/deploy/projects"],
max_depth=5,
)
print(f"Found {result.agent_count} agent configurations")
for agent in result.agents:
print(f" ๐ {agent.name}")
print(f" Path: {agent.tags.get('config_file', 'N/A')}")
asyncio.run(main())
Step 4: Build an Inventory with Deduplication¶
When the same agent is found by multiple scanners (e.g., running as a process AND has a config file), the inventory merges them into one logical agent:
import asyncio
from agent_discovery import AgentInventory
from agent_discovery.scanners import ProcessScanner, ConfigScanner
async def main():
inventory = AgentInventory(storage_path="~/.agent-governance-python/agent-discovery/inventory.json")
# Run multiple scanners
process_result = await ProcessScanner().scan()
config_result = await ConfigScanner().scan(paths=["."])
# Ingest โ deduplication happens automatically via fingerprints
stats1 = inventory.ingest(process_result)
stats2 = inventory.ingest(config_result)
print(f"Process scan: {stats1['new']} new, {stats1['updated']} updated")
print(f"Config scan: {stats2['new']} new, {stats2['updated']} updated")
print(f"Total unique agents: {inventory.count}")
# Search and filter
mcp_servers = inventory.search(agent_type="mcp-server")
print(f"\nMCP Servers found: {len(mcp_servers)}")
asyncio.run(main())
Step 5: Reconcile Against Your Registry¶
The reconciler compares discovered agents against your governance registry to find shadow agents:
import asyncio
from agent_discovery import AgentInventory, Reconciler, RiskScorer
from agent_discovery.reconciler import StaticRegistryProvider
from agent_discovery.scanners import ProcessScanner, ConfigScanner
async def main():
# Build inventory
inventory = AgentInventory()
inventory.ingest(await ProcessScanner().scan())
inventory.ingest(await ConfigScanner().scan(paths=["."]))
# Define known/registered agents
registry = StaticRegistryProvider([
{"did": "did:agent:prod-assistant", "name": "Production Assistant"},
{"did": "did:agent:code-reviewer", "name": "Code Review Bot"},
{"fingerprint": "abc123", "name": "Deploy Agent"},
])
# Reconcile
reconciler = Reconciler(inventory, registry)
shadow_agents = await reconciler.reconcile()
# Score risk
scorer = RiskScorer()
for shadow in shadow_agents:
shadow.risk = scorer.score(shadow.agent)
print(f"\nโ ๏ธ SHADOW AGENT: {shadow.agent.name}")
print(f" Risk: {shadow.risk.level.value.upper()} ({shadow.risk.score:.0f}/100)")
print(f" Factors:")
for factor in shadow.risk.factors:
print(f" - {factor}")
print(f" Actions:")
for action in shadow.recommended_actions:
print(f" โ {action}")
asyncio.run(main())
Step 6: Use the CLI¶
For quick scans, use the CLI directly:
# Full scan with table output
agent-discovery scan
# Scan specific paths
agent-discovery scan -s config -p /opt/agents -p /home/deploy
# GitHub org scan
agent-discovery scan -s github --github-org my-company
# View inventory
agent-discovery inventory -o summary
# Reconcile against registered agents
agent-discovery reconcile --registry-file known-agents.json
# JSON output for automation
agent-discovery scan -o json | jq '.[] | select(.agent_type == "mcp-server")'
Step 7: Write a Custom Scanner¶
Extend discovery by writing your own scanner:
from agent_discovery.scanners.base import BaseScanner, registry
from agent_discovery.models import (
ScanResult, DiscoveredAgent, Evidence, DetectionBasis
)
@registry.register
class KubernetesScanner(BaseScanner):
"""Scan Kubernetes for agent pods."""
@property
def name(self) -> str:
return "kubernetes"
async def scan(self, **kwargs) -> ScanResult:
result = ScanResult(scanner_name=self.name)
# Your K8s discovery logic here:
# - List pods with agent labels
# - Check container images for agent frameworks
# - Inspect service annotations
return result
Step 8: CI/CD Integration¶
Add agent discovery to your CI pipeline to catch new shadow agents:
# .github/workflows/agent-audit.yml
name: Agent Discovery Audit
on:
schedule:
- cron: '0 8 * * 1' # Weekly Monday 8am
workflow_dispatch:
jobs:
discover:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install agent-discovery
run: pip install agent-discovery[github]
- name: Scan repository
run: |
agent-discovery scan -s config -p . -o json > discovery.json
- name: Check for shadow agents
run: |
SHADOW_COUNT=$(agent-discovery reconcile \
--registry-file known-agents.json \
-o json | python -c "import sys,json; print(len(json.load(sys.stdin)))")
if [ "$SHADOW_COUNT" -gt "0" ]; then
echo "::warning::Found $SHADOW_COUNT shadow agents!"
fi
- name: Upload results
uses: actions/upload-artifact@v4
with:
name: agent-discovery-report
path: discovery.json
Risk Scoring Reference¶
| Factor | Points | Description |
|---|---|---|
| No identity (DID/SPIFFE) | +30 | Agent has no cryptographic identity |
| No owner | +20 | No responsible party assigned |
| Shadow/unregistered status | +20 | Not in any governance registry |
| High-risk agent type | +15 | AutoGen, CrewAI, LangChain, OpenAI Agents |
| Medium-risk agent type | +10 | MCP Server, Semantic Kernel, PydanticAI |
| Ungoverned >30 days | +10 | Long time without governance |
| Ungoverned 7-30 days | +5 | Growing governance gap |
| Low confidence detection | -10 | May be false positive |
Risk Levels: Critical (75+) ยท High (50-74) ยท Medium (25-49) ยท Low (10-24) ยท Info (<10)
Next Steps¶
- Register shadow agents:
agentmesh identity create --name "My Agent" - Apply governance: Tutorial 01 โ Policy Engine
- Secure MCP servers: Tutorial 27 โ MCP Scan CLI
- Set up monitoring: Tutorial 13 โ Observability
Related¶
- Agent Discovery README โ Full API reference
- Trust & Identity โ Register agents with AgentMesh
- Compliance Verification โ Prove governance coverage