Tutorial 46: Contributor Governance¶

Time: 30 minutes · Level: Intermediate · Prerequisites: Tutorial 45 (Shift-Left Governance)

Detect coordinated inauthentic behavior, product placement campaigns, and credibility farming before untrusted code or content enters your repository. This tutorial covers AGT's full contributor governance toolchain: from running a single check against a GitHub username to deploying automated CI workflows that screen every PR and issue author.

Scope: contributor reputation analysis, coordination detection, cross-project scanning Tools: contributor_check.py, credential_audit.py, cluster_detect.py, GitHub Actions Audience: OSS maintainers, security teams, DevRel teams managing open-source projects

What You'll Learn¶

Section	Topic
Why Contributor Governance?	The problem of inauthentic contributions in AI/agent repos
Quick Start	Run your first contributor check in 60 seconds
Understanding Signals	All 12 detection signals explained with real examples
The Three Scripts	contributor_check.py, credential_audit.py, cluster_detect.py
CI Integration	Deploy the contributor-check GitHub Action
Cross-Project Scanning	Scan issue authors across entire ecosystems
Tuning and False Positives	Adjusting thresholds, handling legitimate contributors
Case Studies	Real-world detection examples from AGT, A2A, and MCP Servers
Cross-Reference	Related tutorials

Why Contributor Governance?¶

Open-source AI agent projects face a unique attack surface: contributors who use the project's credibility to promote their own products. Unlike traditional supply chain attacks (malicious code in dependencies), these attacks operate through social engineering of trust:

Product placement: Filing PRs or issues that embed links, integrations, or references to the contributor's own unrelated project
Credibility farming: Forking popular awesome-lists and curated repositories to manufacture a profile that appears legitimate
Credential laundering: Getting a small PR merged, then citing that merge as a credential in issues filed across dozens of other repos
Feature cloning: Creating a near-copy of an existing project, then filing issues in related repos to promote the clone
Coordinated networks: Multiple accounts working together to cross-promote products, share forks, and amplify each other's issues

These patterns are especially prevalent in the AI agent governance space, where projects are new, standards bodies are forming, and there is strong incentive to position products as "the standard."

Quick Start¶

Prerequisites¶

Python 3.10+
A GitHub personal access token (or gh CLI authenticated)

Your First Check¶

# Set your GitHub token
export GITHUB_TOKEN="ghp_your_token_here"

# Or use gh CLI authentication (no token needed)
gh auth login

# Run a contributor check
cd agent-governance-toolkit
python scripts/contributor_check.py --username <github-handle>

Example output for a clean contributor:

Reputation Report: clean-developer
Risk: LOW
No signals detected.

Example output for a suspicious account:

Reputation Report: suspicious-account
Risk: HIGH

Signals:
  [HIGH] recent_repo_burst: 41 repos created in last 90 days
  [HIGH] cross_repo_spray: Issues filed in 72 repos within 7 days
  [HIGH] self_promotion_spray: 15 issues promoting own repos across 12 orgs
  [HIGH] thin_credibility: Repo 'my-project' (22d old, 0 stars) promoted across 28 orgs
  [HIGH] coordinated_promotion: 8 thin repos promoted to overlapping org set

JSON Output¶

For programmatic use, add --json:

python scripts/contributor_check.py --username <handle> --json

Target Repository Context¶

Add --repo to enable feature overlap detection and credential spray checks:

python scripts/contributor_check.py \
  --username <handle> \
  --repo microsoft/agent-governance-toolkit

This enables three additional signals: feature_overlap, credential_citation/ credential_laundering, and provides richer context for thin_credibility.

Understanding Signals¶

AGT's contributor check evaluates accounts across 12 behavioral signals, grouped into four categories.

Account Shape Signals¶

These signals analyze the GitHub profile itself, without fetching repos or issues.

Signal	Severity	Triggers When
`repo_velocity`	MEDIUM/HIGH	More than 0.2 repos/day with 10+ repos (MEDIUM) or 0.5/day with 15+ (HIGH)
`new_account_burst`	MEDIUM/HIGH	Account < 90 days old with 20+ repos (HIGH) or < 180 days with 30+ (MEDIUM)
`following_farming`	MEDIUM/HIGH	Following:follower ratio > 5:1 with 100+ following (MEDIUM) or > 20:1 (HIGH)
`zero_followers`	MEDIUM	0 followers despite 5+ public repos

Example: An account created 33 days ago with 27 repos and zero followers triggers new_account_burst (HIGH), repo_velocity (HIGH), and zero_followers (MEDIUM).

Repository Pattern Signals¶

These signals analyze the user's repositories for suspicious creation patterns.

Signal	Severity	Triggers When
`governance_theme_concentration`	MEDIUM	> 50% of repos are governance/security themed (with 5+ total repos)
`recent_repo_burst`	HIGH	15+ repos created in the last 90 days
`awesome_fork_burst`	HIGH	3+ awesome-list forks within 72 hours
`fork_burst`	MEDIUM	5+ general forks within 72 hours
`batch_repo_naming`	MEDIUM/HIGH	3+ repos with the same suffix (e.g., `*-mcp`) created within 48 hours, all with < 10 stars
`feature_overlap`	MEDIUM/HIGH	A non-fork repo matches 3+ (MEDIUM) or 4+ (HIGH) of 6 AGT feature buckets

Awesome fork burst explained: Credibility farming accounts fork curated "awesome" lists to pad their profile with seemingly relevant repos. A burst of 3+ awesome-list forks in 72 hours is a strong signal because legitimate developers rarely fork multiple curated lists in rapid succession.

Feature overlap explained: AGT defines six feature buckets that together characterize its unique capability set:

1. mcp_security    - MCP scanner, tool poisoning, rug pull detection
2. policy_engine   - Policy evaluation, Cedar/YAML policies, deny-by-default
3. identity_crypto - Ed25519 agent identity, zero-trust identity, agent DID
4. runtime_controls - Execution sandbox, kill switch, circuit breaker
5. audit_trust     - Audit trail, trust scoring, hash-chain logs
6. compliance      - OWASP Agentic, EU AI Act, NIST AI RMF

A repo matching 4+ of these 6 categories is likely a concept clone, especially if it is young and has few stars.

Batch repo naming explained: Accounts that mass-create repos with templated names (e.g., agent-workflow-mcp, agent-security-mcp, agent-guard-mcp all in one day) are likely producing thin placeholder projects for spam promotion. The check only flags low-star repos (< 10 stars) to avoid false-positiving on legitimate ecosystems.

Issue Spray Signals¶

These signals analyze the user's issue-filing patterns across GitHub.

Signal	Severity	Triggers When
`cross_repo_spray`	HIGH	Issues filed in 5+ distinct repos within 7 days
`cross_repo_spread`	MEDIUM	Issues filed across 8+ distinct repos total
`self_promotion_spray`	MEDIUM/HIGH	3+ issues promoting the author's own repos across 2+ orgs (MEDIUM) or 5+ issues across 3+ orgs (HIGH)
`credential_citation`	MEDIUM	Citing a target repo's merges in issues across 1+ other repos
`credential_laundering`	HIGH	Citing a target repo's merges in issues across 3+ other repos

Self-promotion spray explained: This is the key signal that separates legitimate protocol contributors from product placement accounts. Both may file issues across many repos (triggering cross_repo_spray), but only spammers' issues reference their own repos.

The check uses strong matching: - Full owner/repo references - github.com/owner/repo URLs - Distinctive repo names (4+ chars, excluding generic words like app, api, web)

Forked repos are excluded from the author's repo list to prevent false matches.

Credibility Signals¶

These signals analyze the user's project promotion patterns.

Signal	Severity	Triggers When
`thin_credibility`	MEDIUM/HIGH	A repo < 60 days old with < 5 stars is mentioned in issues filed in 1 org (MEDIUM) or 2+ orgs (HIGH)
`coordinated_promotion`	HIGH	3+ thin repos promoted to overlapping org sets (Jaccard similarity >= 0.6 on 50%+ of pairs)

Coordinated promotion explained: When an account creates multiple thin repos and promotes them all to the same set of target organizations, it indicates automated or systematic spam. The check computes pairwise Jaccard similarity between the promoted-org sets and flags when the overlap pattern is systematic.

The Three Scripts¶

AGT provides three complementary contributor governance scripts.

contributor_check.py¶

The primary tool. Runs all signal checks and produces a risk assessment.

# Basic profile check
python scripts/contributor_check.py --username <handle>

# With target repo context (enables feature_overlap + credential checks)
python scripts/contributor_check.py --username <handle> --repo <owner/repo>

# JSON output for programmatic use
python scripts/contributor_check.py --username <handle> --json

Exit codes: - 0 = LOW risk - 1 = MEDIUM risk - 2 = HIGH risk

Risk computation: 2+ HIGH severity signals = HIGH risk. 1 HIGH or 3+ MEDIUM signals = MEDIUM risk. Otherwise LOW.

credential_audit.py¶

Deep-dive tool for investigating credential laundering. Checks if a user has merged PRs in one repo and cites those merges in issues filed against other repos.

python scripts/credential_audit.py --username <handle> --repo <target-repo>

Use this for manual investigation of HIGH-risk accounts detected by contributor_check.py.

cluster_detect.py¶

Network analysis tool for mapping coordination between accounts. Starting from a seed account, it maps shared forks, co-comments, and synchronized filing patterns.

python scripts/cluster_detect.py --seed <handle>

Note: cluster_detect.py is API-heavy and may hit GitHub's secondary rate limits when analyzing large networks. Use it for targeted investigation, not bulk scanning.

CI Integration¶

The Contributor Check GitHub Action¶

Deploy automated screening on every PR and issue:

# .github/workflows/contributor-check.yml
name: Contributor Reputation Check

on:
  pull_request_target:
    types: [opened]
  issues:
    types: [opened]

permissions:
  contents: read
  issues: write
  pull-requests: write

jobs:
  check:
    runs-on: ubuntu-latest
    if: github.actor != 'dependabot[bot]'
    steps:
      - name: Checkout AGT action
        uses: actions/checkout@v4
        with:
          repository: microsoft/agent-governance-toolkit
          sparse-checkout: |
            scripts
            .github/actions/contributor-check
          path: agt

      - name: Run contributor check
        uses: ./agt/.github/actions/contributor-check
        with:
          github-token: ${{ secrets.GITHUB_TOKEN }}
          checks: profile,credential
          risk-threshold: MEDIUM

What the Action Does¶

Extracts the PR/issue author's username
Runs contributor_check.py with --repo set to the current repository
Optionally runs credential_audit.py for deeper analysis
Posts a collapsible comment with findings (MEDIUM or higher)
Adds risk labels (needs-review:MEDIUM or needs-review:HIGH)
Comments are idempotent: re-runs update instead of duplicating

Security Note¶

Use pull_request_target (not pull_request) so the action has write permissions to comment and label on fork PRs. Do not run untrusted PR code before this action in the same workflow.

Cross-Project Scanning¶

For maintainers responsible for ecosystem health, contributor_check.py can scan all issue authors across a project:

Step 1: Extract Issue Authors¶

# Using gh CLI to list issue authors
gh issue list --repo <owner/repo> --state all --json author --limit 100 \
  | jq -r '.[].author.login' | sort -u > authors.txt

Step 2: Batch Scan¶

# Scan each author with target repo context
while IFS= read -r author; do
  echo "=== $author ==="
  python scripts/contributor_check.py \
    --username "$author" \
    --repo <owner/repo> \
    --json
done < authors.txt > scan_results.jsonl

Step 3: Filter HIGH Risk¶

# Extract HIGH risk accounts
cat scan_results.jsonl | \
  python -c "
import sys, json
for line in sys.stdin:
    if line.startswith('{'):
        d = json.loads(line)
        if d['risk'] == 'HIGH':
            print(f\"{d['username']}: {len(d['signals'])} signals\")
"

Cross-Ecosystem Scanning¶

For scanning across multiple repositories (as we did with AAIF, A2A, and MCP Servers), maintain a list of target repos and deduplicate authors:

repos=("aaif/project-proposals" "google/A2A" "modelcontextprotocol/servers")
for repo in "${repos[@]}"; do
  gh issue list --repo "$repo" --state all --json author --limit 100 \
    | jq -r '.[].author.login'
done | sort -u > all_authors.txt

This approach revealed shared bad actors across ecosystems. For example, the same accounts were found filing product placement issues in AAIF, Google A2A, and MCP Servers simultaneously.

Tuning and False Positives¶

Common False Positive: Protocol/Spec Contributors¶

Legitimate protocol contributors (e.g., people working on HTTP specs, A2A protocol, MCP standards) file issues across many repos as part of their work. This triggers cross_repo_spray but should not flag as suspicious.

How AGT handles this: The self_promotion_spray signal differentiates between "filing issues about protocol topics" and "promoting your own repos." A spec contributor filing HTTP/3 tracking issues across 15 repos will trigger cross_repo_spray (MEDIUM overall) but not self_promotion_spray, keeping them at MEDIUM rather than HIGH.

Common False Positive: Active Open-Source Contributors¶

Developers who genuinely contribute across many repos may trigger cross_repo_spray. Without self-promotion or thin-credibility signals, they remain at MEDIUM.

Adjusting the Risk Threshold¶

In the GitHub Action, set risk-threshold to control when comments are posted:

risk-threshold: HIGH - Only flag the most suspicious accounts (fewer false positives, may miss some bad actors)
risk-threshold: MEDIUM - Flag anything suspicious (more false positives, catches more bad actors)

Excluding Known Accounts¶

For known legitimate contributors who trigger false positives, add them to an allow list in the workflow:

- name: Run contributor check
  if: >
    github.actor != 'dependabot[bot]' &&
    github.actor != 'known-legit-contributor'
  uses: ./agt/.github/actions/contributor-check

Case Studies¶

Case Study 1: Credibility Farming Network¶

Detected: A network of 5 linked accounts operating across AAIF, Google A2A, and AGT. The primary account created 46 repos, sprayed issues across 72 repos in 7 days, and promoted thin-credibility projects across 28 organizations.

Signals fired: - recent_repo_burst (41 repos in 90 days) - cross_repo_spray (72 repos in 7 days) - self_promotion_spray (promoting own repos across 12+ orgs) - thin_credibility (multiple 0-star repos promoted across 28 orgs) - coordinated_promotion (8 thin repos targeting overlapping org sets) - credential_citation (citing AGT merges as credentials elsewhere)

Action taken: 33 files and 5,544 lines of product placement content removed from AGT. Tracking issues created for legitimate feature gaps identified during cleanup.

Case Study 2: Quiet Feature Clone¶

Detected: An account that cloned AGT's entire feature set into a 22-day-old repo with 1 star, then filed a project proposal in AAIF.

Why existing checks missed it: No spray pattern (only 1 issue), no following farming, no unusual account age. Traditional signals were clean.

New signals that caught it: - awesome_fork_burst (5 awesome-list forks in 72 hours) - feature_overlap (6/6 AGT feature buckets matched) - thin_credibility (22-day-old, 1-star repo promoted in aaif)

This case drove the development of the feature_overlap and fork_burst signals.

Case Study 3: MCP Server Spam Ring¶

Detected: Two accounts mass-creating thin MCP server repos (0 stars, < 5 days old) and promoting them identically across modelcontextprotocol, punkpeye, aimcp, and chatmcp organizations.

Signals fired: - batch_repo_naming (22 repos ending in -mcp created in one batch) - thin_credibility (22 thin repos promoted across 3-4 orgs each) - coordinated_promotion (all repos targeting the same org set) - new_account_burst (141-day account with 35 repos, 0 followers)

This case drove the development of the batch_repo_naming and coordinated_promotion signals.

Cross-Reference¶

Concept	Tutorial
Shift-left governance overview	Tutorial 45 -- Shift-Left Governance
CI/CD security tooling	Tutorial 25 -- Security Hardening
Trust and identity fundamentals	Tutorial 02 -- Trust & Identity
Advanced trust and behavior	Tutorial 17 -- Advanced Trust
Plugin marketplace governance	Tutorial 10 -- Plugin Marketplace
SBOM and supply chain	Tutorial 26 -- SBOM & Signing

Source Files¶

Component	Location
Contributor check script	`scripts/contributor_check.py`
Credential audit script	`scripts/credential_audit.py`
Cluster detection script	`scripts/cluster_detect.py`
Contributor check tests	`scripts/tests/test_contributor_check.py`
GitHub Action workflow	`.github/workflows/contributor-check.yml`
Composite action	`.github/actions/contributor-check/`

Next Steps¶

Deploy the contributor-check action to your repository using the workflow template in CI Integration
Run a baseline scan of your existing contributors to identify any historical product placement content
Set up periodic re-scans as new accounts and patterns emerge
Read Tutorial 45 (Shift-Left Governance) for the complete shift-left pipeline that contributor governance fits into
Read Tutorial 17 (Advanced Trust) for runtime trust scoring that complements build-time contributor checks