Tutorial 46: Contributor Governance¶
Time: 30 minutes · Level: Intermediate · Prerequisites: Tutorial 45 (Shift-Left Governance)
Detect coordinated inauthentic behavior, product placement campaigns, and credibility farming before untrusted code or content enters your repository. This tutorial covers AGT's full contributor governance toolchain: from running a single check against a GitHub username to deploying automated CI workflows that screen every PR and issue author.
Scope: contributor reputation analysis, coordination detection, cross-project scanning Tools: contributor_check.py, credential_audit.py, cluster_detect.py, GitHub Actions Audience: OSS maintainers, security teams, DevRel teams managing open-source projects
What You'll Learn¶
| Section | Topic |
|---|---|
| Why Contributor Governance? | The problem of inauthentic contributions in AI/agent repos |
| Quick Start | Run your first contributor check in 60 seconds |
| Understanding Signals | All 12 detection signals explained with real examples |
| The Three Scripts | contributor_check.py, credential_audit.py, cluster_detect.py |
| CI Integration | Deploy the contributor-check GitHub Action |
| Cross-Project Scanning | Scan issue authors across entire ecosystems |
| Tuning and False Positives | Adjusting thresholds, handling legitimate contributors |
| Case Studies | Real-world detection examples from AGT, A2A, and MCP Servers |
| Cross-Reference | Related tutorials |
Why Contributor Governance?¶
Open-source AI agent projects face a unique attack surface: contributors who use the project's credibility to promote their own products. Unlike traditional supply chain attacks (malicious code in dependencies), these attacks operate through social engineering of trust:
- Product placement: Filing PRs or issues that embed links, integrations, or references to the contributor's own unrelated project
- Credibility farming: Forking popular awesome-lists and curated repositories to manufacture a profile that appears legitimate
- Credential laundering: Getting a small PR merged, then citing that merge as a credential in issues filed across dozens of other repos
- Feature cloning: Creating a near-copy of an existing project, then filing issues in related repos to promote the clone
- Coordinated networks: Multiple accounts working together to cross-promote products, share forks, and amplify each other's issues
These patterns are especially prevalent in the AI agent governance space, where projects are new, standards bodies are forming, and there is strong incentive to position products as "the standard."
Quick Start¶
Prerequisites¶
- Python 3.10+
- A GitHub personal access token (or
ghCLI authenticated)
Your First Check¶
# Set your GitHub token
export GITHUB_TOKEN="ghp_your_token_here"
# Or use gh CLI authentication (no token needed)
gh auth login
# Run a contributor check
cd agent-governance-toolkit
python scripts/contributor_check.py --username <github-handle>
Example output for a clean contributor:
Example output for a suspicious account:
Reputation Report: suspicious-account
Risk: HIGH
Signals:
[HIGH] recent_repo_burst: 41 repos created in last 90 days
[HIGH] cross_repo_spray: Issues filed in 72 repos within 7 days
[HIGH] self_promotion_spray: 15 issues promoting own repos across 12 orgs
[HIGH] thin_credibility: Repo 'my-project' (22d old, 0 stars) promoted across 28 orgs
[HIGH] coordinated_promotion: 8 thin repos promoted to overlapping org set
JSON Output¶
For programmatic use, add --json:
Target Repository Context¶
Add --repo to enable feature overlap detection and credential spray checks:
python scripts/contributor_check.py \
--username <handle> \
--repo microsoft/agent-governance-toolkit
This enables three additional signals: feature_overlap, credential_citation/ credential_laundering, and provides richer context for thin_credibility.
Understanding Signals¶
AGT's contributor check evaluates accounts across 12 behavioral signals, grouped into four categories.
Account Shape Signals¶
These signals analyze the GitHub profile itself, without fetching repos or issues.
| Signal | Severity | Triggers When |
|---|---|---|
repo_velocity | MEDIUM/HIGH | More than 0.2 repos/day with 10+ repos (MEDIUM) or 0.5/day with 15+ (HIGH) |
new_account_burst | MEDIUM/HIGH | Account < 90 days old with 20+ repos (HIGH) or < 180 days with 30+ (MEDIUM) |
following_farming | MEDIUM/HIGH | Following:follower ratio > 5:1 with 100+ following (MEDIUM) or > 20:1 (HIGH) |
zero_followers | MEDIUM | 0 followers despite 5+ public repos |
Example: An account created 33 days ago with 27 repos and zero followers triggers new_account_burst (HIGH), repo_velocity (HIGH), and zero_followers (MEDIUM).
Repository Pattern Signals¶
These signals analyze the user's repositories for suspicious creation patterns.
| Signal | Severity | Triggers When |
|---|---|---|
governance_theme_concentration | MEDIUM | > 50% of repos are governance/security themed (with 5+ total repos) |
recent_repo_burst | HIGH | 15+ repos created in the last 90 days |
awesome_fork_burst | HIGH | 3+ awesome-list forks within 72 hours |
fork_burst | MEDIUM | 5+ general forks within 72 hours |
batch_repo_naming | MEDIUM/HIGH | 3+ repos with the same suffix (e.g., *-mcp) created within 48 hours, all with < 10 stars |
feature_overlap | MEDIUM/HIGH | A non-fork repo matches 3+ (MEDIUM) or 4+ (HIGH) of 6 AGT feature buckets |
Awesome fork burst explained: Credibility farming accounts fork curated "awesome" lists to pad their profile with seemingly relevant repos. A burst of 3+ awesome-list forks in 72 hours is a strong signal because legitimate developers rarely fork multiple curated lists in rapid succession.
Feature overlap explained: AGT defines six feature buckets that together characterize its unique capability set:
1. mcp_security - MCP scanner, tool poisoning, rug pull detection
2. policy_engine - Policy evaluation, Cedar/YAML policies, deny-by-default
3. identity_crypto - Ed25519 agent identity, zero-trust identity, agent DID
4. runtime_controls - Execution sandbox, kill switch, circuit breaker
5. audit_trust - Audit trail, trust scoring, hash-chain logs
6. compliance - OWASP Agentic, EU AI Act, NIST AI RMF
A repo matching 4+ of these 6 categories is likely a concept clone, especially if it is young and has few stars.
Batch repo naming explained: Accounts that mass-create repos with templated names (e.g., agent-workflow-mcp, agent-security-mcp, agent-guard-mcp all in one day) are likely producing thin placeholder projects for spam promotion. The check only flags low-star repos (< 10 stars) to avoid false-positiving on legitimate ecosystems.
Issue Spray Signals¶
These signals analyze the user's issue-filing patterns across GitHub.
| Signal | Severity | Triggers When |
|---|---|---|
cross_repo_spray | HIGH | Issues filed in 5+ distinct repos within 7 days |
cross_repo_spread | MEDIUM | Issues filed across 8+ distinct repos total |
self_promotion_spray | MEDIUM/HIGH | 3+ issues promoting the author's own repos across 2+ orgs (MEDIUM) or 5+ issues across 3+ orgs (HIGH) |
credential_citation | MEDIUM | Citing a target repo's merges in issues across 1+ other repos |
credential_laundering | HIGH | Citing a target repo's merges in issues across 3+ other repos |
Self-promotion spray explained: This is the key signal that separates legitimate protocol contributors from product placement accounts. Both may file issues across many repos (triggering cross_repo_spray), but only spammers' issues reference their own repos.
The check uses strong matching: - Full owner/repo references - github.com/owner/repo URLs - Distinctive repo names (4+ chars, excluding generic words like app, api, web)
Forked repos are excluded from the author's repo list to prevent false matches.
Credibility Signals¶
These signals analyze the user's project promotion patterns.
| Signal | Severity | Triggers When |
|---|---|---|
thin_credibility | MEDIUM/HIGH | A repo < 60 days old with < 5 stars is mentioned in issues filed in 1 org (MEDIUM) or 2+ orgs (HIGH) |
coordinated_promotion | HIGH | 3+ thin repos promoted to overlapping org sets (Jaccard similarity >= 0.6 on 50%+ of pairs) |
Coordinated promotion explained: When an account creates multiple thin repos and promotes them all to the same set of target organizations, it indicates automated or systematic spam. The check computes pairwise Jaccard similarity between the promoted-org sets and flags when the overlap pattern is systematic.
The Three Scripts¶
AGT provides three complementary contributor governance scripts.
contributor_check.py¶
The primary tool. Runs all signal checks and produces a risk assessment.
# Basic profile check
python scripts/contributor_check.py --username <handle>
# With target repo context (enables feature_overlap + credential checks)
python scripts/contributor_check.py --username <handle> --repo <owner/repo>
# JSON output for programmatic use
python scripts/contributor_check.py --username <handle> --json
Exit codes: - 0 = LOW risk - 1 = MEDIUM risk - 2 = HIGH risk
Risk computation: 2+ HIGH severity signals = HIGH risk. 1 HIGH or 3+ MEDIUM signals = MEDIUM risk. Otherwise LOW.
credential_audit.py¶
Deep-dive tool for investigating credential laundering. Checks if a user has merged PRs in one repo and cites those merges in issues filed against other repos.
Use this for manual investigation of HIGH-risk accounts detected by contributor_check.py.
cluster_detect.py¶
Network analysis tool for mapping coordination between accounts. Starting from a seed account, it maps shared forks, co-comments, and synchronized filing patterns.
Note: cluster_detect.py is API-heavy and may hit GitHub's secondary rate limits when analyzing large networks. Use it for targeted investigation, not bulk scanning.
CI Integration¶
The Contributor Check GitHub Action¶
Deploy automated screening on every PR and issue:
# .github/workflows/contributor-check.yml
name: Contributor Reputation Check
on:
pull_request_target:
types: [opened]
issues:
types: [opened]
permissions:
contents: read
issues: write
pull-requests: write
jobs:
check:
runs-on: ubuntu-latest
if: github.actor != 'dependabot[bot]'
steps:
- name: Checkout AGT action
uses: actions/checkout@v4
with:
repository: microsoft/agent-governance-toolkit
sparse-checkout: |
scripts
.github/actions/contributor-check
path: agt
- name: Run contributor check
uses: ./agt/.github/actions/contributor-check
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
checks: profile,credential
risk-threshold: MEDIUM
What the Action Does¶
- Extracts the PR/issue author's username
- Runs contributor_check.py with
--reposet to the current repository - Optionally runs credential_audit.py for deeper analysis
- Posts a collapsible comment with findings (MEDIUM or higher)
- Adds risk labels (
needs-review:MEDIUMorneeds-review:HIGH) - Comments are idempotent: re-runs update instead of duplicating
Security Note¶
Use pull_request_target (not pull_request) so the action has write permissions to comment and label on fork PRs. Do not run untrusted PR code before this action in the same workflow.
Cross-Project Scanning¶
For maintainers responsible for ecosystem health, contributor_check.py can scan all issue authors across a project:
Step 1: Extract Issue Authors¶
# Using gh CLI to list issue authors
gh issue list --repo <owner/repo> --state all --json author --limit 100 \
| jq -r '.[].author.login' | sort -u > authors.txt
Step 2: Batch Scan¶
# Scan each author with target repo context
while IFS= read -r author; do
echo "=== $author ==="
python scripts/contributor_check.py \
--username "$author" \
--repo <owner/repo> \
--json
done < authors.txt > scan_results.jsonl
Step 3: Filter HIGH Risk¶
# Extract HIGH risk accounts
cat scan_results.jsonl | \
python -c "
import sys, json
for line in sys.stdin:
if line.startswith('{'):
d = json.loads(line)
if d['risk'] == 'HIGH':
print(f\"{d['username']}: {len(d['signals'])} signals\")
"
Cross-Ecosystem Scanning¶
For scanning across multiple repositories (as we did with AAIF, A2A, and MCP Servers), maintain a list of target repos and deduplicate authors:
repos=("aaif/project-proposals" "google/A2A" "modelcontextprotocol/servers")
for repo in "${repos[@]}"; do
gh issue list --repo "$repo" --state all --json author --limit 100 \
| jq -r '.[].author.login'
done | sort -u > all_authors.txt
This approach revealed shared bad actors across ecosystems. For example, the same accounts were found filing product placement issues in AAIF, Google A2A, and MCP Servers simultaneously.
Tuning and False Positives¶
Common False Positive: Protocol/Spec Contributors¶
Legitimate protocol contributors (e.g., people working on HTTP specs, A2A protocol, MCP standards) file issues across many repos as part of their work. This triggers cross_repo_spray but should not flag as suspicious.
How AGT handles this: The self_promotion_spray signal differentiates between "filing issues about protocol topics" and "promoting your own repos." A spec contributor filing HTTP/3 tracking issues across 15 repos will trigger cross_repo_spray (MEDIUM overall) but not self_promotion_spray, keeping them at MEDIUM rather than HIGH.
Common False Positive: Active Open-Source Contributors¶
Developers who genuinely contribute across many repos may trigger cross_repo_spray. Without self-promotion or thin-credibility signals, they remain at MEDIUM.
Adjusting the Risk Threshold¶
In the GitHub Action, set risk-threshold to control when comments are posted:
risk-threshold: HIGH- Only flag the most suspicious accounts (fewer false positives, may miss some bad actors)risk-threshold: MEDIUM- Flag anything suspicious (more false positives, catches more bad actors)
Excluding Known Accounts¶
For known legitimate contributors who trigger false positives, add them to an allow list in the workflow:
- name: Run contributor check
if: >
github.actor != 'dependabot[bot]' &&
github.actor != 'known-legit-contributor'
uses: ./agt/.github/actions/contributor-check
Case Studies¶
Case Study 1: Credibility Farming Network¶
Detected: A network of 5 linked accounts operating across AAIF, Google A2A, and AGT. The primary account created 46 repos, sprayed issues across 72 repos in 7 days, and promoted thin-credibility projects across 28 organizations.
Signals fired: - recent_repo_burst (41 repos in 90 days) - cross_repo_spray (72 repos in 7 days) - self_promotion_spray (promoting own repos across 12+ orgs) - thin_credibility (multiple 0-star repos promoted across 28 orgs) - coordinated_promotion (8 thin repos targeting overlapping org sets) - credential_citation (citing AGT merges as credentials elsewhere)
Action taken: 33 files and 5,544 lines of product placement content removed from AGT. Tracking issues created for legitimate feature gaps identified during cleanup.
Case Study 2: Quiet Feature Clone¶
Detected: An account that cloned AGT's entire feature set into a 22-day-old repo with 1 star, then filed a project proposal in AAIF.
Why existing checks missed it: No spray pattern (only 1 issue), no following farming, no unusual account age. Traditional signals were clean.
New signals that caught it: - awesome_fork_burst (5 awesome-list forks in 72 hours) - feature_overlap (6/6 AGT feature buckets matched) - thin_credibility (22-day-old, 1-star repo promoted in aaif)
This case drove the development of the feature_overlap and fork_burst signals.
Case Study 3: MCP Server Spam Ring¶
Detected: Two accounts mass-creating thin MCP server repos (0 stars, < 5 days old) and promoting them identically across modelcontextprotocol, punkpeye, aimcp, and chatmcp organizations.
Signals fired: - batch_repo_naming (22 repos ending in -mcp created in one batch) - thin_credibility (22 thin repos promoted across 3-4 orgs each) - coordinated_promotion (all repos targeting the same org set) - new_account_burst (141-day account with 35 repos, 0 followers)
This case drove the development of the batch_repo_naming and coordinated_promotion signals.
Cross-Reference¶
| Concept | Tutorial |
|---|---|
| Shift-left governance overview | Tutorial 45 -- Shift-Left Governance |
| CI/CD security tooling | Tutorial 25 -- Security Hardening |
| Trust and identity fundamentals | Tutorial 02 -- Trust & Identity |
| Advanced trust and behavior | Tutorial 17 -- Advanced Trust |
| Plugin marketplace governance | Tutorial 10 -- Plugin Marketplace |
| SBOM and supply chain | Tutorial 26 -- SBOM & Signing |
Source Files¶
| Component | Location |
|---|---|
| Contributor check script | scripts/contributor_check.py |
| Credential audit script | scripts/credential_audit.py |
| Cluster detection script | scripts/cluster_detect.py |
| Contributor check tests | scripts/tests/test_contributor_check.py |
| GitHub Action workflow | .github/workflows/contributor-check.yml |
| Composite action | .github/actions/contributor-check/ |
Next Steps¶
- Deploy the contributor-check action to your repository using the workflow template in CI Integration
- Run a baseline scan of your existing contributors to identify any historical product placement content
- Set up periodic re-scans as new accounts and patterns emerge
- Read Tutorial 45 (Shift-Left Governance) for the complete shift-left pipeline that contributor governance fits into
- Read Tutorial 17 (Advanced Trust) for runtime trust scoring that complements build-time contributor checks