Evaluation Criteria

Use this module as the scoring rubric before selecting Microsoft AI technologies.

Table of contents

  1. Purpose
  2. Evaluation Framework
    1. Technical Complexity Assessment
    2. Skills & Resources
    3. Budget Assessment
    4. Time to Production
    5. Governance & Compliance
    6. Memory, Analytics & Conversation History
    7. Scale & Performance
  3. Decision Matrix
  4. Evaluation Checklist
  5. Next Steps

Purpose

This module anchors the broader Microsoft AI Decision Framework by teaching how to score readiness across complexity, skills, budget, and governance before you dive into detailed technology choices.

Treat the scenarios and recommendations below as illustrative patterns that reinforce the concepts explained in the rest of the repo—not turnkey prescriptions. The goal is to make your tradeoffs explicit (speed vs. control, governance vs. flexibility) so the final platform choice is easier to defend and operate.


Evaluation Framework

Technical Complexity Assessment

Question: How complex is your use case?

Complexity Level Characteristics Recommended Technologies
Low Simple Q&A, existing knowledge base, single data source M365 Copilot, Graph Connectors, Declarative Agents
Medium Multiple data sources, workflow automation, approvals Copilot Studio, AI Builder, Power Automate
High Custom models, multi-agent orchestration, custom evaluation Azure AI Foundry, M365 Agents SDK, Agent Framework, Agent Framework + AG-UI (Preview)1
Very High Multi-step reasoning, complex state management Azure AI Foundry + Agent Framework, custom pipelines

Key Indicators:

  • Data sources: 1 (low) vs 5+ (high)
  • Workflow: Linear (low) vs branching/parallel (high)
  • Reasoning: Simple lookup (low) vs multi-step inference (high)
  • Evaluation: User feedback (low) vs custom metrics (high)

Skills & Resources

Question: What skills do you have?

Skill Level Resources Time Recommended
Makers No devs, 1-3 people 10-20 hrs/week Copilot Studio, AI Builder
Makers + Dev Occasional dev help 20-30 hrs/week Studio + custom actions
Pro Developers Full dev team 40+ hrs/week M365 SDK, Azure AI Foundry
Data Scientists ML expertise 40+ hrs/week Azure AI Foundry, custom models

Budget Assessment

Budget decisions need to capture total cost of ownership—per-user Microsoft 365 licensing, Copilot Studio credits, Azure consumption, and the engineering hours required to build, test, and operate the stack. Use the groupings in Decision Framework – Question 2 (Build Style) and the blueprints in Implementation Patterns to determine which layers you will combine before picking a band.234

Mix-and-match approaches (for example, a Copilot Studio front door that delegates orchestration to Azure AI Foundry or Agent Service) let you shift costs between licensing and Azure meters as the solution matures—plan for that runway instead of assuming a static stack.5

Budget Band Typical Stack Primary Cost Drivers Mix & Match Considerations
Free / Included / < $500/mo Free (included) Microsoft 365 Copilot Chat + Graph Connectors + declarative agents Baseline Microsoft 365 licensing (Copilot Chat included, zero incremental license cost), tenant governance time, light maker effort Keep workloads inside Layer 1–2 to prove value quickly; reuse connectors later in Copilot Studio or Azure AI Foundry as needs grow
$500-$2K/mo Microsoft 365 Copilot add-on pilots, Copilot Studio Lite/Full, AI Builder, Power Automate approvals Per-user Microsoft 365 Copilot licenses (Graph grounding, in-app copilots), Copilot Credits for generative answers/flows, connector usage, part-time maker/developer capacity3 Layer Microsoft 365 Copilot on top of Copilot Chat for hero experiences; add Studio and targeted Azure APIs only when custom logic is required; budget for approval flows and DLP policy design
$2K-$10K/mo Copilot Studio front door + Azure AI Foundry/Agent Service, M365 Agents SDK pilots Azure OpenAI tokens, App Service or Functions hosting, vector stores, CI/CD + observability effort24 Split workloads so Studio handles UX/governance while Azure hosts orchestration; shift spend toward Azure as more automation moves out of Studio
$10K+/mo Full Azure AI stack (Agent Framework, Logic Apps, Foundry Agent Service) with enterprise data plane Dedicated AI/ML teams, Azure landing zone hardening, custom evaluations, agent telemetry pipelines4 Expect blended spend: Studio or M365 endpoints for front doors plus Azure services for workflows, memory, and safety tooling

Note: These dollar ranges illustrate relative investment levels rather than contractual pricing. Always reconcile them with your actual licensing agreements, Copilot Credit allocations, Azure consumption forecasts, and staffing models.


Time to Production

Time-to-production is also multi-track: it is common to land a Copilot Studio or Microsoft 365 Copilot pilot in days while Azure AI Foundry foundations (data ingestion, landing zones, evaluations) progress in parallel. Blend these tracks to shrink overall runway rather than treating each technology choice as mutually exclusive.54

Timeline Representative Path Key Activities Levers & Tradeoffs
Days Enable free (included) Microsoft 365 Copilot Chat + Graph Connectors pilot Confirm eligible licensing, scope data sources, run adoption enablement Fastest path to value (no incremental license spend) but limited action automation; great for validating demand before deeper investment
1-2 Weeks Turn on Microsoft 365 Copilot add-on for a focused group and/or launch Copilot Studio template with governed connectors and approvals Approve premium licenses, configure Copilot Credits capacity, attach Power Automate approvals, enforce DLP policies Low-code build speed; adding Microsoft 365 Copilot unlocks work-grounded chat and in-app copilots while Studio handles hosting and security3
1-2 Months Keep Copilot Studio as the front door while adding custom actions, BYOM prompts, or Azure AI Foundry orchestration Build APIs or Foundry Agent Service skills, set up CI/CD, connect to private data via gateways Perfect for hybrid stacks—Studio pilots stay live while Azure capabilities roll in; schedule time for testing, observability, and managed gateway setup54
3-6 Months Productionize pro-code agents (M365 Agents SDK, Agent Framework, Azure AI Foundry) with enterprise landing zone controls Provision VNets/private endpoints, implement tool guardrails, run evaluations, harden telemetry + governance Highest control and autonomy; longer runway driven by infra changes, compliance evidence, and multi-environment DevOps. Parallel pilots in Studio/M365 keep users engaged during the build

Note: These timeframes are directional examples spanning pilots through production handoff. Actual schedules depend on compliance reviews, procurement, existing Azure landing zones, and how many layers (M365, Copilot Studio, Azure) you run in parallel.


Governance & Compliance

Level Constraints Approach Why
High M365 tenant boundary only, strict DLP, no training on data M365 Copilot M365 trust boundary, strictest governance
Medium-High Data sovereignty, RBAC, external connector controls, gateway for private access Copilot Studio + DLP Configurable, external connectors inherit compliance
High (Azure) VNet isolation, private endpoints, CMK, no public egress Azure AI Foundry/Agent Service Azure landing zone controls, sovereign data
High (Custom) Self-hosted, custom VNet, air-gapped support M365 Agents SDK Customer controls all networking
Low Minimal, hosting platform dependent Agent Framework Maximum flexibility

Critical Considerations - Data Boundary:

  • M365 Copilot: Data NEVER leaves M365 tenant boundary. No training on tenant data. Inherits M365 compliance (GDPR, HIPAA, FedRAMP). Microsoft Purview for AI (Preview) extends DLP and sensitivity labels to prevent agents from summarizing “Internal Only” documents.
  • Copilot Studio: ⚠️ External connectors (custom APIs, non-Microsoft systems) inherit external system’s compliance posture. Web search leaves enterprise boundary (NOT covered by DPA). Compliance certifications: HIPAA, FedRAMP, SOC, ISO, PCI DSS. Power Platform DLP policies (environment/tenant-level) control connector access. Purview DSPM (Preview) now audits unauthenticated user activity.
  • Azure AI Foundry/Agent Service: Full Azure controls (VNet, private endpoints, CMK, Azure Policy). Governed by YOUR Azure landing zone. Microsoft Defender for AI (Preview) provides runtime threat protection. Azure AI Content Safety provides Prompt Shields (GA) and Spotlighting (Preview) to block jailbreaks and injection attacks.

Critical Considerations - Network Isolation:

  • Azure AI Foundry/Agent Service: ✅ Full private networking support. Standard Setup with Private Networking = no public egress by default. Supports air-gapped environments.
  • Copilot Studio: ⚠️ Does NOT execute in customer VNet. Requires on-premises data gateway (for on-prem systems) OR VNet data gateway (GA, for Azure resources). Not suitable for fully air-gapped without gateway setup.
  • M365 Agents SDK: ✅ Self-hosted = customer controls networking. Supports VNet integration, private endpoints, air-gapped Azure deployments. Full responsibility for network security.
  • M365 Copilot: ❌ No custom VNet support (fully managed SaaS). Requires gateway architecture for on-prem data access.

Critical Considerations - Permissions & Identity Model:

  • M365 Copilot: ✅ Always user-scoped. “It only sees what I can see” = TRUE (architecturally guaranteed). Actions attributed to individual users.
  • Copilot Studio: ⚠️ Dual auth mode: User authentication (user-scoped) OR Agent author authentication (service account). Service accounts can exceed individual user permissions. Critical for actions that write data.
  • Azure AI Foundry: ⚠️ API key (bypasses user identity) OR Entra ID (per-user RBAC, managed identities). Microsoft Entra Agent ID (Preview) assigns unique identities to agents for granular permission management and auditability.
  • M365 Agents SDK: ⚠️ Custom auth design: Delegated permissions (user-scoped) OR Application permissions (service principal, tenant-wide scope). Requires documented auth architecture for audit.

Action Safety & Content Safety

Key considerations:

  • M365 Copilot: ✅ User-in-the-loop always (drafts only, user executes). Cannot take destructive actions. ✅ Content moderation + prompt injection defenses built-in.
  • Copilot Studio: ⚠️ Actions can execute (Power Automate flows, custom connectors). No built-in human approval. Add approval workflows for destructive actions. ✅ Content moderation blocks malicious prompts.
  • Azure AI Foundry/Agent Service: ⚠️ Tool calling with autonomous planning loops. No built-in approval. Implement human-in-the-loop + OpenTelemetry tracing. ✅ Content safety via Azure AI Content Safety (Prompt Shields GA, Spotlighting Preview).
  • M365 Agents SDK: ⚠️ Custom design (developer responsibility). No built-in action safety or approval. Implement custom guardrails. Content safety depends on implementation.

Action Safety Guardrail Playbook

Why it matters: These controls operationalize the Action Safety score. Use them to design approval checkpoints, logging, and escalation workflows before promoting an agent to production.

Guardrail Workflow:

  1. Classify actions by risk (read-only, write, destructive) and document examples.
  2. Decide approval paths for medium/high risk actions (manager approval, security desk, or service owner).
  3. Enforce boundaries by limiting which tools can execute autonomously and which require explicit human confirmation.
  4. Monitor and trace every execution (OpenTelemetry, Purview, Application Insights) so auditors can reconstruct the decision path.

Action Risk Classification

Risk Level Examples Approval Required Recommended Treatment
Low (Read) Search, lookup, reporting No Allow automatically; log only
Medium (Write) Create record, update status, send notification Optional Approval or post-execution review in regulated domains
High (Destructive) Delete, disable, approve spend, security changes Yes Always require human-in-the-loop before execution

Guardrails by Platform

M365 Copilot (built-in experiences)

  • User always executes the final action; drafts remain user-scoped. No extra guardrails required beyond Purview auditing.
  • Reference: Microsoft 365 Copilot security (Updated: 2024-10-10)

Copilot Studio (SaaS)

  • Add Start and wait for an approval inside Power Automate flows before any destructive action.
  • Validate parameters with condition blocks (e.g., amount thresholds) and environment-level DLP policies.
  • Enable Dataverse audit logs so transcripts/actions are reviewable by compliance teams.
  • Reference: Business applications integrations with Copilot Studio (Updated: 2024-10-05)

Azure AI Foundry / Agent Service (PaaS)

  • Implement human-in-the-loop middleware for high-risk tool calls; store approvals in durable storage.
  • Use OpenTelemetry tracing and Azure Monitor to capture full tool execution chains.
  • Apply Azure Policy/API Management allowlists so agents can only call approved endpoints.
  • Reference: Foundry Agent Service transparency note (Updated: 2024-10-15)

M365 Agents SDK (custom code)

  • Build explicit approval middleware that routes destructive actions to a reviewer before execution.
  • Classify actions in code (read/write/destructive) and enforce deny-by-default for destructive categories.
  • Log every action with initiating user identity via Application Insights or custom telemetry.
  • Reference: Microsoft 365 Agents SDK security (Updated: 2024-10-22)

Architecture Recipes

  • Approval Workflow (Power Automate): User Request → Agent Plans Action → [If Destructive] → Trigger Approval Flow → Manager Approves/Rejects → Execute or Cancel
  • Human-in-the-Loop Middleware (custom code):
      async function executeToolWithApproval(toolName: string, params: any) {
        if (isDestructive(toolName)) {
          const approval = await requestHumanApproval(toolName, params);
          if (!approval.approved) {
            return { error: "Action rejected by reviewer" };
          }
        }
        return await executeTool(toolName, params);
      }
    
  • Read-Only Core with Delegated Execution: Agent → Generates plan → Presents to user → User clicks "Execute" → Authenticated API call runs under user's identity

Action Safety Decision Matrix

Technology Built-in Safety Approval Mechanism Tracing Best For
M365 Copilot ✅ User-in-the-loop N/A (draft drafts only) Purview audit logs Maximum safety
Copilot Studio ⚠️ Actions can execute Power Automate approvals Dataverse audit logs Low-code with approvals
Azure AI Foundry / Agent Service ⚠️ Autonomous execution Custom middleware OpenTelemetry, Azure Monitor Pro-code with tracing
M365 Agents SDK ❌ Custom design Custom implementation Custom telemetry Full custom control

Documentation tip: capture the guardrail decisions in your change log so audit reviewers know why specific approvals and rate limits exist.

Critical Considerations - Proactive Capabilities (Question 9):

  • Reactive only: M365 Copilot, Copilot Studio declarative agents (user-initiated interactions only).67
  • Proactive capable: Copilot Studio custom engine agents (Power Automate triggers), Logic Apps (event-driven), Azure AI Foundry (Functions/triggers), M365 Agents SDK (custom event listeners).78910

Memory, Analytics & Conversation History

Question for Legal/Audit Teams:

Understanding where conversation data is stored and who can access it is critical for regulated industries (healthcare, finance, legal).

Key Questions to Answer:

  • Where is conversation history stored?
  • How long is it retained?
  • Who can access it?
  • Can we scrub PII from transcripts?
  • Does this meet our compliance requirements (HIPAA, GDPR, SOX)?

Technology Comparison: See Quick Reference - Memory & Analytics for detailed comparison table.

Critical Distinctions:

  • M365 Copilot: Grounding ONLY (NO per-user memory extractable by admins). Activity history in user mailbox (Purview-governed).
  • Copilot Studio: Grounding + Dataverse memory variables. Full transcript access for admins (critical for regulated review).
  • Foundry Agent Service: BYO thread storage (customer Cosmos DB). Customer responsible for retention, PII scrubbing.
  • M365 Agents SDK: Custom implementation (developer implements everything).

💡 Cross-reference: See Decision Framework Q3 - Grounding vs Memory vs Analytics distinction


Scale & Performance

Understanding Performance Trade-Offs:

Performance optimization varies by platform and use case. Choose based on your priorities:

Dimension Copilot Studio Azure AI Foundry Decision Driver
Development Velocity Days to weeks (low-code platform) Weeks to months (custom code) Speed to market vs full customization
Connector Ecosystem 1,400+ Power Platform connectors Custom API integrations Pre-built integrations vs custom control
Runtime Latency <1s (managed orchestration) <100ms (direct API access) Response time requirements
Operational Overhead Low (Microsoft-managed SaaS) Higher (self-managed PaaS) Operational preferences
Context Window 400k (platform-managed)11 400k (full model access) Context vs convenience trade-off
Throughput Scaling Requests per minute (RPM)-based (message billing) Tokens per minute (TPM)-based (provisioned capacity) Cost model preferences

Scenario-Based Selection:

Choose Copilot Studio when:

  • Speed to market is critical (rapid iteration, quick delivery)
  • Leveraging Power Platform’s 1,400+ built-in connectors and triggers
  • Prefer managed infrastructure (SaaS convenience)
  • Response times <1s are acceptable
  • Built-in governance and ALM are priorities
  • Team includes makers and developers
  • Testing: Power CAT Copilot Studio Kit for automated agent validation (response match, topic match, generative answers, multi-turn scenarios)

Choose Azure AI Foundry when:

  • Latency <100ms is required (real-time applications)
  • Need full architectural control and optimization
  • High-throughput scenarios (PTU provisioning)
  • Testing: Prompt flow evaluations with built-in metrics (groundedness, relevance, coherence) or custom evaluators
  • Custom patterns beyond platform capabilities
  • Custom integrations outside standard connector ecosystem
  • Team has Azure and AI engineering expertise
  • Pair with the AG-UI protocol (Preview) when you need SSE streaming, shared state, or human approvals surfaced directly in custom web or mobile clients.1

Complementary Use:

Many organizations use both platforms - Copilot Studio for rapid deployment with rich connector integration, Azure AI Foundry for performance-critical custom applications. The platforms serve complementary needs, not competitive choices.

Scale Planning by User Volume

Scale Users Requests/Day Copilot Studio Fit Azure AI Foundry Fit
Small <100 <1K ✅ Ideal (rapid deployment) ⚠️ May be over-engineered
Medium 100-1K 1K-50K ✅ Strong (with capacity planning) ✅ Strong (with cost optimization)
Large 1K-10K 50K-500K ⚠️ Monitor quotas carefully ✅ Ideal (PTU provisioning)
Enterprise 10K+ 500K+ ⚠️ Requires premium capacity ✅ Ideal (enterprise scale)

Rate Limits & Cost Model Trade-Offs

Copilot Studio:

  • Environment-level quotas:
    • General messages: 8,000 requests per minute (RPM) per Dataverse environment
    • Generative AI: 50-100+ RPM / 1,000-2,000+ requests per hour (RPH) (scales with capacity packs: 50 RPM/1K RPH for 1-10 packs, up to 100 RPM/2K RPH for 51+ packs or PAYG)
  • Shared across agents: All agents in environment share quota pool
  • Channel limits: Includes support for Teams, Microsoft 365, SharePoint, Web (Direct Line), Mobile, Azure Bot Service, and numerous third-party channels
  • Power Platform requests: 250K/24h for standard subscription (flows triggered by agents)
  • Cost predictability: Prepaid packs or PAYG ($0.01/Copilot Credit)
  • Increase limits: Allowed 25% overage of pooled credits; purchase additional credit packs or enable pay-as-you-Go for overages
  • Best for: Predictable workloads, call centers, internal use cases with capacity planning

Azure AI Foundry / Agent Service:

  • Tokens per minute (TPM) quotas: Per region/model (request increases available)
  • Variable cost: Per-token billing scales with traffic
  • Cost optimization: Model routing, provisioned throughput units (PTU) reservations, caching strategies
  • Best for: Variable traffic patterns, public-facing channels with guardrails

M365 Agents SDK:

  • Custom auto-scaling: Developer controls rate limiting and scaling logic
  • Flexible cost: Hosting + token-based (if using Azure OpenAI)
  • Full control: Custom traffic management and optimization
  • Best for: Complex custom requirements, existing Azure infrastructure

Sources:

Last Updated: 2025-11-10


💡 Cross-reference: See Decision Framework Q6


Decision Matrix

Situation Start Path
Makers, low budget, M365 data M365 + Graph Connectors → Studio
Makers, workflows Copilot Studio + AI Builder → Add BYOK
Devs, multi-channel M365 SDK + Studio → Azure AI Foundry
Devs, custom models Azure AI Foundry → Add Agent Framework
Data scientists Foundry + Agent Framework → BYOM

Evaluation Checklist

Technical:

  • Identified data sources
  • Assessed data quality
  • Determined integrations
  • Evaluated constraints

Skills:

  • Confirmed skill levels
  • Identified team size
  • Assessed learning curve
  • Planned knowledge transfer

Budget:

  • Estimated setup costs
  • Projected ongoing costs
  • Set realistic timeline
  • Planned contingency

Governance:

  • Reviewed data sovereignty
  • Confirmed compliance needs
  • Assessed audit requirements
  • Planned DLP policies

Scale:

  • Estimated users
  • Projected volume
  • Assessed performance needs
  • Planned monitoring

Next Steps

Feature comparison:
Feature Comparison

Visual guidance:
Visual Framework

Real examples:
Scenarios

Architecture patterns:
Implementation Patterns


Last Updated: November 10, 2025
Next: Implementation Patterns - Apply the scoring outcomes to pick execution patterns

  1. AG-UI integration with Agent Framework, Microsoft Learn. Preview, Updated 2025-11-11.  2

  2. Cost considerations for extending Microsoft 365 Copilot, Microsoft Learn. Updated 2025-05-19.  2

  3. Copilot Studio requirements, billing, and Copilot Credits, Microsoft Learn. Updated 2025-11-05.  2 3

  4. Manage costs in Azure AI Foundry, Microsoft Learn. Updated 2025-10-17.  2 3 4 5

  5. Bring your own Azure AI Foundry models to Copilot Studio prompts, Microsoft Learn. Updated 2025-11-13.  2 3

  6. Privacy and protections, Microsoft Learn. Updated 2025-08-15. 

  7. Create automated copilots triggered by events, Microsoft Learn. GA 2025-03-24.  2

  8. Trigger an agent by using Logic Apps (preview), Microsoft Learn. Updated 2025-06-30. 

  9. What’s new in Azure AI Foundry Agent Service, Microsoft Learn. Updated 2025-05-15. 

  10. Bring your agents into Microsoft 365 Copilot, Microsoft Learn. Updated 2025-09-12. 

  11. Effective context in Copilot Studio varies by configuration; full model context available in Azure AI Foundry when you manage infrastructure and evaluation pipelines. 


Back to top

Copyright © 2025. This documentation is based on official Microsoft sources and best practices.