Skip to main content

Code Review

The code review system provides two complementary review passes that run before you open a pull request. A functional review catches logic errors, edge cases, and error handling gaps. A standards review enforces project-defined coding conventions through dynamically loaded skills. An orchestrator agent combines both into a single merged report.

Most review feedback arrives after a PR is already open, when context switching and rework costs are highest. Running these agents on a local branch before pushing catches issues while the code is still fresh.

Why Pre-PR Code Review?

BenefitDescription
Earlier defect detectionCatches functional bugs on the branch, before reviewers spend time on a PR
Consistent standards coverageEvery diff gets the same skill-based analysis regardless of which reviewer picks up the PR
Extensible language supportTeams add their own skills without modifying the review agents
Actionable outputEvery finding includes file paths, line numbers, current code, and a suggested fix

TIP

New to hve-core code review? Start with the functional review prompt on your current branch to see the output format, then move to the full orchestrated review once you are comfortable with the workflow.

Architecture

flowchart TD
subgraph Prompts
P1["code-review-functional<br/>.prompt.md"]
P2["code-review-full<br/>.prompt.md"]
end

subgraph Agents
AF["Code Review<br/>Functional"]
AS["Code Review<br/>Standards"]
AO["Code Review Full<br/>(Orchestrator)"]
end

subgraph "Shared Protocols"
D["Diff Computation<br/>Protocol"]
R["Review Artifacts<br/>Protocol"]
PR["pr-reference<br/>Skill"]
end

subgraph Skills
S1["python-foundational"]
S2["(future language<br/>skills)"]
S3["Enterprise<br/>custom skills"]
end

subgraph Templates
T1["Standards Output<br/>Format"]
T2["Full Review<br/>Output Format"]
T3["Engineering<br/>Fundamentals"]
end

P1 -->|"invokes"| AF
P2 -->|"invokes"| AO
AO -->|"Step 1"| D
AO -->|"Step 1"| PR
AO -->|"Step 2 parallel"| AF & AS
AF -->|"follows"| R
AS -->|"follows"| R
AS -->|"loads at runtime"| S1 & S2 & S3
AS -->|"formats with"| T1
AS -->|"applies"| T3
AO -->|"Step 3 merge"| T2

The orchestrator computes the diff once in Step 1 using the pr-reference skill, writes a shared diff-state.json, then dispatches both subagents in parallel in Step 2. Each subagent writes structured JSON findings to disk. In Step 3, the orchestrator reads both findings files and merges them into a single deduplicated report using the full review output format template.

The Three Agents

NOTE

The Functional and Standards agents are dual-mode: they operate independently when invoked from the Chat panel and run as subagents with lane boundaries when the orchestrator dispatches them. This differs from the separate-file subagent pattern used elsewhere in the repo (e.g., RPI's dedicated subagent files). The dual-mode design avoids duplicating agent definitions while supporting both standalone and orchestrated use.

Code Review Functional

Analyzes branch diffs for functional correctness across five focus areas:

Focus AreaWhat It Catches
LogicIncorrect control flow, wrong boolean conditions, off-by-one errors
Edge CasesUnhandled boundaries, missing null checks, empty collection handling
Error HandlingUncaught exceptions, swallowed errors, resource cleanup gaps
ConcurrencyRace conditions, deadlock potential, shared mutable state without synchronization
ContractAPI misuse, type mismatches at boundaries, violated preconditions

Findings are severity-ordered (Critical, High, Medium, Low) with concrete code fixes. The agent includes false positive mitigation filters to keep noise low.

Code Review Standards

Enforces project-defined coding standards through dynamically loaded skills. The agent is language-agnostic: it scans the workspace for **/SKILL.md files, matches them against the languages in the diff, and loads up to 8 relevant skills per review.

Skills provide the domain-specific checklists. The standards agent provides the review protocol, output format, and verdict logic. See Language Skills for details on the built-in skills and how to create your own.

Code Review Full (Orchestrator)

Runs both agents in parallel and produces a merged report:

StepWhat happens
Compute DiffGenerates a structured XML diff via the pr-reference skill, captures working-tree changes, and writes diff-state.json with branch metadata, file list, and T-shirt size classification
Parallel DispatchDispatches Functional and Standards subagents simultaneously with lane directives that prevent overlapping findings (functional correctness vs skill-backed standards)
Merge ReportReads both subagents' structured JSON findings, applies transformation rules (deduplication, severity sorting, source tagging), and writes a merged review.md plus metadata.json

The orchestrator classifies review size into T-shirt tiers (XS through XL) and adapts its strategy accordingly. Small reviews dispatch a single pair of subagents. Large reviews (50+ files) split the file list into batches with one Functional + Standards pair per batch.

Lane directives in the dispatch prompts tell each subagent what to focus on and what to skip, reducing duplicate findings in the merged output. The Functional agent covers logic, edge cases, and contract violations. The Standards agent covers skill-backed coding conventions.

The merged report includes severity-tagged findings from both sources, a unified changed files table, combined testing recommendations, and acceptance criteria coverage when a story reference is provided.

How the Orchestrated Review Works

flowchart LR
subgraph "Step 1"
A1["pr-reference<br/>skill"] --> A2["diff-state.json<br/>+ T-shirt size"]
end

subgraph "Step 2 (parallel)"
B1["Functional<br/>agent"]
B2["Standards<br/>agent"]
end

subgraph "Step 3"
C1["Merge findings<br/>+ write report"]
end

A2 --> B1 & B2
B1 --> C1
B2 --> C1

The orchestrator's three steps are visible to the user through progress announcements emitted at each stage:

StepAnnouncementWhat happens
1Diff computedpr-reference generates a structured XML diff; the orchestrator writes diff-state.json with file list, extensions, and T-shirt size
2aReviews dispatchedBoth agents start in parallel with lane directives
2bReviews completeBoth agents have written JSON findings to disk
3Merged reportFindings are deduplicated, severity-sorted, source-tagged, and written as review.md + metadata.json

T-Shirt Size Classification

The orchestrator classifies each review to choose the right dispatch strategy:

SizeFilesDiff LinesStrategy
XS<5<100Single parallel pair
S5-19100-399Single parallel pair
M20-49400-999Single parallel pair
L50-991,000-2,999Batches of 30 files per pair
XL100+3,000+Multi-round batches, high-risk first

When files and lines fall in different tiers, the orchestrator uses the smaller tier to avoid over-batching.

Usage

Full Orchestrated Review

Run both reviews in a single pass:

/code-review-full

Pass a work item reference to enable acceptance criteria coverage:

/code-review-full story=AB#456

The orchestrator passes the story reference to the Standards agent, which includes an Acceptance Criteria Coverage table in its report.

Functional Review

Run the functional review prompt from the Copilot Chat panel:

/code-review-functional

Optionally specify a base branch:

/code-review-functional baseBranch=origin/develop

Defaults to origin/main when no base branch is specified.

Standards Review

The standards review does not have a standalone prompt. Invoke the Code Review Standards agent directly from the Copilot Chat panel and describe what you want reviewed. The agent detects the diff automatically using the diff computation protocol.

Review Output

Both agents produce severity-ordered findings. Each finding includes:

  • A descriptive title and severity level (Critical, High, Medium, Low)
  • The file path and line range where the issue appears
  • The current code from the diff that has the issue
  • A suggested fix with replacement code
  • The category and (for standards findings) the skill that surfaced the finding
  • A source tag ([Functional] or [Standards]) in orchestrated mode

Structured JSON Contracts

When running under the orchestrator, subagents write findings as structured JSON rather than markdown. This enables deterministic merging without LLM re-parsing. The JSON schema is defined in the Full Review Output Format template, which both the orchestrator and subagents reference as the authoritative data contract.

The data flow through the orchestrator:

diff-state.json          (orchestrator writes, subagents read)

functional-findings.json (functional subagent writes)
standards-findings.json (standards subagent writes)

review.md + metadata.json (orchestrator merges and writes)

Lane Separation

The orchestrator's dispatch prompts include lane directives that tell each subagent what to focus on and what to skip:

SubagentIn-laneOut-of-lane
FunctionalLogic errors, edge cases, error handling, concurrency, contract violationsCoding style, naming conventions, skill-backed standards
StandardsSkill-backed coding standards violationsLogic errors, edge cases, behavioral bugs (unless a skill explicitly covers them)

This reduces duplicate findings in the merged report and keeps each subagent focused on its domain.

Verdict Scale

ConditionVerdict
Any Critical or High findingsRequest changes
Only Medium or Low findingsApprove with comments
No findingsApprove

The orchestrator uses the stricter verdict when merging: if either subagent would request changes, the merged report requests changes.

Artifact Persistence

Review artifacts are saved to .copilot-tracking/reviews/code-reviews/{branch-slug}/ with two files:

  • review.md: the full review report in the standards output format
  • metadata.json: a machine-readable summary for automation

The metadata.json file contains fields that CI pipelines, pre-commit hooks, and custom scripts can consume:

{
"schema_version": "1",
"branch": "feat/my-feature",
"head_commit": "abc123...",
"reviewed_at": "2026-03-28T15:30:00Z",
"verdict": "request_changes",
"files_changed": ["src/main.py", "src/utils.py"],
"findings_count": {
"critical": 0,
"high": 2,
"medium": 1,
"low": 0
},
"reviewer": "code-review-full"
}

The verdict field holds one of three values: approve, approve_with_comments, or request_changes. A pre-commit hook can read this file and block commits when the verdict is request_changes, ensuring review findings are addressed before code leaves the local branch. For example:

verdict=$(jq -r '.verdict' .copilot-tracking/reviews/code-reviews/*/metadata.json 2>/dev/null)
if [ "$verdict" = "request_changes" ]; then
echo "Code review requires changes. Fix findings before committing."
exit 1
fi

What You Need

RequirementDetails
VS Code + CopilotGitHub Copilot Chat with agent mode enabled
Git branchA local branch with commits ahead of the base branch
hve-core collectionThe coding-standards or hve-core-all collection installed
pr-reference skillIncluded in the coding-standards collection; generates the XML diff

The agents work with any programming language. Standards enforcement requires skills that match the languages in your diff. If no matching skills are found, the standards agent notes the gap and restricts its verdict.

Extending with Custom Skills

The standards agent discovers skills dynamically at review time. You extend coverage by adding SKILL.md files to your repository without modifying the agent itself. See Language Skills for the full guide on built-in skills, skill stacking, and authoring enterprise-specific standards.

🤖 Crafted with precision by ✨Copilot following brilliant human instruction, then carefully refined by our team of discerning human reviewers.