Skip to content

Tutorial 29 โ€” Agent Discovery: Finding Shadow AI in Your Organization

Package: agent-discovery ยท Time: 20 minutes ยท Prerequisites: Python 3.11+


What You'll Learn

  • How to scan your environment for AI agents running outside governance
  • How deduplication prevents overcounting across scanners
  • How to reconcile discovered agents against your registry
  • How risk scoring identifies the most urgent shadow agents
  • How to integrate discovery into your CI/CD pipeline

Why Agent Discovery?

The Agent Governance Toolkit excels at governing agents that register with it. But you can't govern what you can't see.

68% of enterprises have AI agents running outside IT visibility. These "shadow agents" operate without identity, without audit trails, and without policy enforcement โ€” creating compliance risk and security exposure.

Agent Discovery closes the loop:

Discover โ†’ Inventory โ†’ Reconcile โ†’ Govern
    โ†‘                                  โ”‚
    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                 Continuous

Step 1: Install

pip install agent-discovery

For GitHub scanning:

pip install agent-discovery[github]


Step 2: Scan Local Processes

The process scanner detects running AI agent processes by matching command-line patterns against 11 known frameworks.

import asyncio
from agent_discovery.scanners import ProcessScanner

async def main():
    scanner = ProcessScanner()
    result = await scanner.scan()

    print(f"Scanned {result.scanned_targets} processes")
    print(f"Found {result.agent_count} agents")

    for agent in result.agents:
        print(f"  ๐Ÿค– {agent.name}")
        print(f"     Type: {agent.agent_type}")
        print(f"     Confidence: {agent.confidence:.0%}")
        for ev in agent.evidence:
            print(f"     Evidence: {ev.detail}")

asyncio.run(main())

Security note: The process scanner automatically redacts API keys, tokens, and JWTs from command-line arguments. No secrets are stored.


Step 3: Scan Filesystem for Config Artifacts

The config scanner walks directories looking for agent configuration files โ€” agentmesh.yaml, crewai.yaml, mcp.json, Dockerfiles with agent images, etc.

import asyncio
from agent_discovery.scanners import ConfigScanner

async def main():
    scanner = ConfigScanner()
    result = await scanner.scan(
        paths=["/opt/agents", "/home/deploy/projects"],
        max_depth=5,
    )

    print(f"Found {result.agent_count} agent configurations")
    for agent in result.agents:
        print(f"  ๐Ÿ“ {agent.name}")
        print(f"     Path: {agent.tags.get('config_file', 'N/A')}")

asyncio.run(main())

Step 4: Build an Inventory with Deduplication

When the same agent is found by multiple scanners (e.g., running as a process AND has a config file), the inventory merges them into one logical agent:

import asyncio
from agent_discovery import AgentInventory
from agent_discovery.scanners import ProcessScanner, ConfigScanner

async def main():
    inventory = AgentInventory(storage_path="~/.agent-governance-python/agent-discovery/inventory.json")

    # Run multiple scanners
    process_result = await ProcessScanner().scan()
    config_result = await ConfigScanner().scan(paths=["."])

    # Ingest โ€” deduplication happens automatically via fingerprints
    stats1 = inventory.ingest(process_result)
    stats2 = inventory.ingest(config_result)

    print(f"Process scan: {stats1['new']} new, {stats1['updated']} updated")
    print(f"Config scan:  {stats2['new']} new, {stats2['updated']} updated")
    print(f"Total unique agents: {inventory.count}")

    # Search and filter
    mcp_servers = inventory.search(agent_type="mcp-server")
    print(f"\nMCP Servers found: {len(mcp_servers)}")

asyncio.run(main())

Step 5: Reconcile Against Your Registry

The reconciler compares discovered agents against your governance registry to find shadow agents:

import asyncio
from agent_discovery import AgentInventory, Reconciler, RiskScorer
from agent_discovery.reconciler import StaticRegistryProvider
from agent_discovery.scanners import ProcessScanner, ConfigScanner

async def main():
    # Build inventory
    inventory = AgentInventory()
    inventory.ingest(await ProcessScanner().scan())
    inventory.ingest(await ConfigScanner().scan(paths=["."]))

    # Define known/registered agents
    registry = StaticRegistryProvider([
        {"did": "did:agent:prod-assistant", "name": "Production Assistant"},
        {"did": "did:agent:code-reviewer", "name": "Code Review Bot"},
        {"fingerprint": "abc123", "name": "Deploy Agent"},
    ])

    # Reconcile
    reconciler = Reconciler(inventory, registry)
    shadow_agents = await reconciler.reconcile()

    # Score risk
    scorer = RiskScorer()
    for shadow in shadow_agents:
        shadow.risk = scorer.score(shadow.agent)

        print(f"\nโš ๏ธ  SHADOW AGENT: {shadow.agent.name}")
        print(f"   Risk: {shadow.risk.level.value.upper()} ({shadow.risk.score:.0f}/100)")
        print(f"   Factors:")
        for factor in shadow.risk.factors:
            print(f"     - {factor}")
        print(f"   Actions:")
        for action in shadow.recommended_actions:
            print(f"     โ†’ {action}")

asyncio.run(main())

Step 6: Use the CLI

For quick scans, use the CLI directly:

# Full scan with table output
agent-discovery scan

# Scan specific paths
agent-discovery scan -s config -p /opt/agents -p /home/deploy

# GitHub org scan
agent-discovery scan -s github --github-org my-company

# View inventory
agent-discovery inventory -o summary

# Reconcile against registered agents
agent-discovery reconcile --registry-file known-agents.json

# JSON output for automation
agent-discovery scan -o json | jq '.[] | select(.agent_type == "mcp-server")'

Step 7: Write a Custom Scanner

Extend discovery by writing your own scanner:

from agent_discovery.scanners.base import BaseScanner, registry
from agent_discovery.models import (
    ScanResult, DiscoveredAgent, Evidence, DetectionBasis
)

@registry.register
class KubernetesScanner(BaseScanner):
    """Scan Kubernetes for agent pods."""

    @property
    def name(self) -> str:
        return "kubernetes"

    async def scan(self, **kwargs) -> ScanResult:
        result = ScanResult(scanner_name=self.name)
        # Your K8s discovery logic here:
        # - List pods with agent labels
        # - Check container images for agent frameworks
        # - Inspect service annotations
        return result

Step 8: CI/CD Integration

Add agent discovery to your CI pipeline to catch new shadow agents:

# .github/workflows/agent-audit.yml
name: Agent Discovery Audit
on:
  schedule:
    - cron: '0 8 * * 1'  # Weekly Monday 8am
  workflow_dispatch:

jobs:
  discover:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Install agent-discovery
        run: pip install agent-discovery[github]

      - name: Scan repository
        run: |
          agent-discovery scan -s config -p . -o json > discovery.json

      - name: Check for shadow agents
        run: |
          SHADOW_COUNT=$(agent-discovery reconcile \
            --registry-file known-agents.json \
            -o json | python -c "import sys,json; print(len(json.load(sys.stdin)))")
          if [ "$SHADOW_COUNT" -gt "0" ]; then
            echo "::warning::Found $SHADOW_COUNT shadow agents!"
          fi

      - name: Upload results
        uses: actions/upload-artifact@v4
        with:
          name: agent-discovery-report
          path: discovery.json

Risk Scoring Reference

Factor Points Description
No identity (DID/SPIFFE) +30 Agent has no cryptographic identity
No owner +20 No responsible party assigned
Shadow/unregistered status +20 Not in any governance registry
High-risk agent type +15 AutoGen, CrewAI, LangChain, OpenAI Agents
Medium-risk agent type +10 MCP Server, Semantic Kernel, PydanticAI
Ungoverned >30 days +10 Long time without governance
Ungoverned 7-30 days +5 Growing governance gap
Low confidence detection -10 May be false positive

Risk Levels: Critical (75+) ยท High (50-74) ยท Medium (25-49) ยท Low (10-24) ยท Info (<10)


Next Steps