🎉 Lab Complete — Predictive Maintenance (TireForge Industries)¶

Congratulations — you've built, instrumented, evaluated, and deployed a production-ready multi-agent AI system from scratch. Here's what you accomplished.

Recap¶

#	Challenge	What You Built
0	Setup	Provisioned a Microsoft Foundry Resource, project, GPT model deployment, Log Analytics workspace, and Application Insights instance via a single `deploy.sh` script
1	Build Agents	Created an Anomaly Detection Agent (reads live sensor telemetry — temperature, vibration, pressure — and identifies machines operating outside safe thresholds) and a Fault Diagnosis Agent (determines root cause and recommends maintenance actions per machine type)
2	Monitor	Enabled OpenTelemetry GenAI tracing — every model call, tool invocation, and token count is captured as a distributed trace in Application Insights
3	Evaluate	Ran systematic LLM-as-judge evaluations across the full sensor dataset, producing repeatable coherence and fluency scores you can version-track across prompt changes
4	Production Workflow	Wired both agents into an orchestrated pipeline in the Foundry portal — a stable, testable endpoint with run history that plant operators can inspect

Skills you practiced¶

Designing agent system prompts with clear role boundaries and constraints
Grounding agents in real sensor telemetry via tool calls (function calling)
Distributed tracing for AI systems with OpenTelemetry
LLM-as-judge evaluation with the Azure AI Evaluation SDK
Multi-agent orchestration in the Foundry portal

Next Steps¶

Want to take the TireForge system further? Here are some directions:

Add more agents — a Parts Inventory agent that checks whether replacement components are in stock before recommending maintenance, or a Scheduling agent that finds the earliest maintenance window with minimal production impact
Connect real data — replace the static sensor_data.json with a live IoT Hub or Azure Event Hub stream
Improve evaluation — add task-specific evaluators (e.g., "did the agent correctly identify a Curing Press failure from elevated temperature + abnormal pressure combination?") alongside the generic coherence scores
Set up CI/CD — run your evaluation dataset automatically on every prompt change using GitHub Actions and fail the build if quality scores drop below a threshold
Explore fine-tuning — use your traced fault diagnoses as training data to fine-tune a smaller, cheaper model for the initial anomaly detection step
Try another scenario — the Claims and Call Center scenarios cover insurance processing and customer support using the same lifecycle

Clean Up Azure Resources¶

Important: The resources deployed in Challenge 0 incur Azure costs while they exist. Delete them when you're done.

What gets deleted¶

The resource group foundry-hackathon-rg-<suffix> and everything inside it:
Microsoft Foundry Resource + project
GPT model deployment
Log Analytics workspace
Application Insights instance

Option 1 — Script¶

Run the cleanup script from the repo root:

bash factory/cleanup.sh

The script reads the .env file written by deploy.sh so it knows exactly which resource group to target. It asks for confirmation before deleting.

Option 2 — Azure Portal¶

Go to portal.azure.com
Search for Resource groups
Find foundry-hackathon-rg-<your-suffix>
Click Delete resource group and confirm

Option 3 — Azure CLI¶

# Replace <suffix> with the value shown in your .env file
az group delete --name foundry-hackathon-rg-<suffix> --yes --no-wait