Evaluation Criteria

“Trade-offs, not solutions.”

This module is the scoring rubric for your architecture. Before you commit to a technology, you must evaluate the project against four hard constraints: Architectural Load (Complexity), Delivery Capabilities (Skills), Economic Model (Budget), and Risk Profile (Governance).

Use this page to turn abstract requirements into defensible engineering decisions.

1. Complexity Assessment (Architectural Load)
2. Skills & Resources (Delivery Team)
3. Budget Assessment
4. Time to Production (The Runway)
5. Governance & Compliance (The Security Perimeter)
1. The Trust Boundary Decision
2. Action Safety & Content Safety
  1. The Action Safety Guardrail Playbook
6. Scale & Performance (The Envelope)
Evaluation Checklist
Next Steps

1. Complexity Assessment (Architectural Load)

The Trade-off: The “Inverse Law of Control.” As you adopt higher-level abstractions (SaaS), you gain velocity but lose granular control over the runtime and orchestration.

The Analogy: Think of this like Construction.

Tier 1 (Furnished Condo): You live in someone else’s building. You can change the decor (System Prompt), but you can’t move the walls.
Tier 2 (Prefab Home): You assemble modular rooms. You pick the appliances (Connectors) and the layout, but the foundation is pre-poured.
Tier 3 (Custom Build): You pour the concrete. You choose the wiring, HVAC, and security. If the roof leaks, you fix it.
Tier 4 (Skyscraper): Complex engineering. Requires steel frameworks, wind tunnel testing (Evals), and a specialized crew to operate.

Question: What is the structural complexity of the solution?

Complexity Tier	System Characteristics	Architecture Pattern	Recommended Technology
Tier 1: Informational (Read-Only)	Single knowledge domain. No state retention. Answers questions based on grounded data.	RAG (Retrieval Augmented Generation)	M365 Copilot. Leverage the Graph; no custom orchestration.
Tier 2: Transactional (Deterministic)	Linear workflows (“If X, then Y”). Defined API calls. Human approval steps required.	Orchestrated Workflow	Copilot Studio. Visual flow designer with Power Platform connectors.
Tier 3: Reasoning (Probabilistic)	Multi-step planning. The agent must decide which tool to use. Latency-critical. Custom vector stores.	ReAct / Plan-and-Execute	M365 Agents SDK (if M365-hosted) or Microsoft Foundry (if Azure-hosted) or Microsoft Foundry + Copilot Studio.
Tier 4: Autonomous (Nondeterministic)	Multi-agent collaboration (Swarm). Recursive self-correction. Long-running asynchronous tasks.	Multi-Agent Systems (MAS)	Microsoft Foundry, Agent Frameworkm, or Both. Full code-first control if needed

2. Skills & Resources (Delivery Team)

The Trade-off: Abstraction vs. Rigor. Low-code tools abstract away infrastructure but constrain the developer’s workflow (ALM). Pro-code tools offer infinite flexibility but require you to manage the plumbing (Identity, Networking, State).

Question: What is the composition of the delivery team?

Team Profile	Optimization	Recommended Path
Makers / Fusion	Velocity. Focus on subject matter expertise and business logic.	Copilot Studio + AI Builder. Rapid prototyping without infrastructure overhead.
Pro Developers	Lifecycle. Focus on CI/CD, unit testing, and version control.	M365 Agents SDK, Foundry, Foundry Agent Service, Agent Framework. Code-first approaches that fit into existing developer skills.
Data Scientists	Precision. Technical experts focused on data, grounding, and model behavior (not necessarily app dev).	Foundry Agent Service or Copilot Studio. Allows technical non-developers to deploy custom models as agents without managing full-stack infrastructure or writing application code.

The Convergence Principle: Do not treat “Pro Code” and “Low Code” as binary. A Principal Architect often uses Copilot Studio to handle authentication and UI state, while delegating complex reasoning to a Foundry Azure Function.

3. Budget Assessment

Architects must speak the language of finance. You need to capture Total Cost of Ownership (TCO)—licensing, consumption, and the engineering hours required to build it.

Question: Is your budget model based on Capital Expenditure (Pre-paid seats) or Operating Expenditure (Consumption)?

Economic Model	Typical Entry Cost	Best Fit For…
Entitlement (Sunk Cost)	$0 / Included	M365 Copilot Chat. If users are already licensed, maximizing usage here is the highest ROI baseline. Zero incremental cost.
Per-User Licensing	< $500/mo (Starts at $30/user)	M365 Copilot. Best for equipping knowledge workers. Costs scale linearly with headcount ($30 -> $60 -> $90). Predictable OpEx.
Capacity Packs	$200 - $10k/mo	Copilot Studio. Best for internal tools. You buy “blocks” of messages (e.g., 25k messages/month). Predictable billing that doesn’t spike if one user goes rogue.[^copilot-cost]
Metered Consumption	$0 start -> Scale to $10k+	Microsoft Foundry / Agent Service. Best for B2C apps or high-volume automation. You pay per token/hour. Low barrier to entry, but requires strict Quota Management to prevent cost overruns.[^foundry-cost]
Unified Pre-Purchase (P3)	$19k+ / year (tiered)	Copilot Studio + Microsoft Foundry combined. Buy Agent Commit Units (ACUs) in a single pool that covers both platforms. 1 ACU ≈ $1 retail usage. Eliminates the need to choose between platforms at the procurement level. Best for organizations running workloads across both layers.¹

Procurement Simplification: The Agent Pre-Purchase Plan (P3) removes the “OR” from budget conversations. If your architecture spans Copilot Studio and Foundry (the “Better Together” pattern), P3 lets one pool of ACUs follow the workload across both platforms. See Agent Pre-Purchase Plan.

Consumption models require Quota Management. An ungoverned autonomous agent can burn through a monthly token budget in hours if it enters a loop. Always implement spending caps in Azure Cost Management.

4. Time to Production (The Runway)

The Trade-off: Convenience vs. Customization. You can launch a standard pattern in days, or a bespoke architecture in months.

Timeline	Capability	Why
Days	M365 Copilot / Declarative Agents	No infrastructure to deploy. Content is already indexed by Graph.
Weeks	Copilot Studio	Visual canvas allows rapid iteration, but requires testing approval flows and connectors.
Months	Foundry / Agents SDK	Requires establishing Azure Landing Zones, VNets, and CI/CD pipelines before the first agent runs.

5. Governance & Compliance (The Security Perimeter)

This is the “Go/No-Go” gate. You must define your Data Boundary and Action Safety.

The Trust Boundary Decision

Question: Does the data stay in the SaaS tenant or move to your Azure subscription?

M365 Trust Boundary: Data remains within the Microsoft 365 tenant boundary. Adheres to existing tenant certifications (ISO, SOC, HIPAA). No model training on customer data. -> Use M365 Copilot.
Power Platform Boundary: Inherits compliance from the Connectors used. If you connect to a non-compliant third-party API, the agent inherits that risk. -> Use Copilot Studio.
Azure Landing Zone: Data moves into your customer-controlled subscription. You own the VNet injection, Private Links, and Encryption Keys (CMK). -> Use Microsoft Foundry.

Action Safety & Content Safety

Question: What is the “Blast Radius” of a mistake?

Read-Only Risk: Hallucination/Grounding errors. (Mitigation: RAG + Citations).
Destructive Risk: Data modification/Deletion. (Mitigation: Human-in-the-loop).

The Action Safety Guardrail Playbook

Use this rubric to design approval checkpoints before promoting an agent to production.

Risk Level	Definition	Guardrail Requirement
Low (Read)	Search, lookup, summarize.	Audit Log. Log the query and response for post-hoc analysis.
Medium (Write)	Create draft, update status.	User Confirmation. The agent presents a draft/plan; User must explicitly click “Execute.”
High (Destructive)	Delete, transfer funds, change permission.	Middleware Interception. The agent triggers a request; a Service Owner must approve via a separate channel.[^humanreview]

Implementation Example (Pro-Code): For high-risk actions in code-first agents, implement middleware that intercepts the specific tool call:

async function executeToolWithApproval(toolName: string, params: any) {
  if (isDestructive(toolName)) {
    const approval = await requestHumanApproval(toolName, params);
    if (!approval.approved) { return { error: "Action rejected by reviewer" }; }
  }
  return await executeTool(toolName, params);
}

6. Scale & Performance (The Envelope)

Question: What are the latency and throughput requirements?

Scale	Volume	Copilot Studio Fit	Microsoft Foundry Fit
Departmental	Low Volume	✅ Ideal. Instant scaling, managed limits.	⚠️ Overhead. High setup time for low volume.
Enterprise	Medium Volume	✅ Strong. Requires Capacity Planning.	✅ Strong. Standard Pay-As-You-Go models.
High-Scale	High Volume (1M+ req/day)	⚠️ Throttling Risk. Check Environment RPM limits.	✅ Ideal. Use PTU for guaranteed latency and throughput.[^1]

Key Limits to Watch:

Copilot Studio: Generative AI requests consume “Copilot Credits” at a different rate than standard messages. Limits are often around 8,000 requests per minute (RPM) per environment, but vary by license.[^2]
Azure OpenAI: TPM (Tokens Per Minute) quotas are regional. For mission-critical scale, design for multi-region failover and model redundancy.

Evaluation Checklist

Before moving to the Implementation Patterns, confirm you have scored the scenario:

Architecture:

Complexity Level (Config vs. Engineering) identified?
Data boundaries mapped?

Resources:

Team capability (Maker vs. Dev vs. DS) aligned to tool?
ALM/DevOps requirements defined?

Governance:

Trust boundary defined (M365 vs. Azure)?
Action Safety Guardrails defined?

Budget:

Cost model selected (License vs. Metered)?
Estimated monthly spend band identified?

Next Steps

Feature comparison: → Feature Comparison

Visual guidance: → Visual Framework

Real examples: → Scenarios

Architecture patterns: → Implementation Patterns

Last Updated: February 5, 2026

Next: Implementation Patterns - Apply the scoring outcomes to pick execution patterns

Optimize Microsoft Foundry and Copilot Credit costs with Microsoft Agent pre-purchase plan, Microsoft Learn. Updated 2026-01-15. https://learn.microsoft.com/en-us/azure/cost-management-billing/reservations/agent-pre-purchase ↩

Evaluation Criteria

Table of contents

1. Complexity Assessment (Architectural Load)

2. Skills & Resources (Delivery Team)

3. Budget Assessment

4. Time to Production (The Runway)

5. Governance & Compliance (The Security Perimeter)

The Trust Boundary Decision

Action Safety & Content Safety

The Action Safety Guardrail Playbook

6. Scale & Performance (The Envelope)

Evaluation Checklist

Next Steps