Challenge 05 - Monitoring Best Practices
< Previous Challenge - Home - Next Challenge >
Introduction
You’re now sending telemetry to New Relic, but raw data alone isn’t enough. Without dashboards and alerts, you’re manually checking traces all the time, you won’t notice problems until customers complain, and you can’t track performance trends over time.
In this challenge, you’ll learn industry best practices for monitoring AI-driven applications by creating custom dashboards, setting up alerts, defining Service Level Objectives (SLOs), tracking deployment changes, and collecting key metrics that matter for your travel planning service.
Description
Your goal is to build a comprehensive monitoring solution for WanderAI that includes dashboards, alerts, SLOs, deployment tracking, and meaningful metrics collection.
Part 1: Enhanced Metrics Collection
Add custom metrics to your application using the OpenTelemetry meter:
- Request Counter - Track total number of travel plan requests per destination
- Error Counter - Track total number of errors, think about different error types
- Tool Call Counter - Track how often each tool is called
Emit these metrics from your application code at appropriate points (request handlers, error handlers, tool functions). Check mock application code for examples and hints of how to create and record metrics using the get_meter() helper function.
Part 2: Create a New Relic Dashboard
Build a dashboard in New Relic called “WanderAI Agent Performance” that visualizes:
- Request rate over time, e.g.
SELECT rate(count(*), 1 minute) FROM Metric WHERE metricName = 'travel_plan.requests.total' TIMESERIES SINCE TODAY
- Error rate over time
- Average response time
- Tool usage breakdown by tool name
Use New Relic Query Language (NRQL) queries to power your dashboard widgets.
Part 3: Set Up Alerts
Configure alerts in New Relic to notify your team when:
- Error rate exceeds a threshold (e.g., 5 errors in 5 minutes)
- Response times are slow (e.g., p95 latency exceeds 25 seconds)
- Specific tools are failing repeatedly
Part 4: Define Service Level Objectives (SLOs)
SLOs shift your monitoring mindset from reactive (“something broke, let’s fix it”) to proactive (“are we meeting our promises to users?”). Define Service Level Indicators (SLIs) and their corresponding objectives for WanderAI:
- Availability SLO — Define an SLI based on successful (non-5xx) responses. Set an objective such as 99.5% over a rolling 7-day window.
- Latency SLO — Define an SLI based on response time. For example, 95% of requests should complete in under 10 seconds over a rolling 7-day window.
- AI Quality SLO (stretch) — If you instrumented an AI quality score in earlier challenges, define an SLI for it (e.g., 90% of responses score above a quality threshold).
Create these SLOs in New Relic using the Service Level Management UI. Then:
- Add an SLO summary widget to your dashboard showing remaining error budget.
- Configure an alert on SLO burn rate so you’re notified when you’re consuming error budget too fast (e.g., a fast-burn alert that fires when your burn rate exceeds 10x normal).
Part 5: Change Tracking & Deployment Markers
When performance degrades, the first question is always: “Did we deploy something?” Deployment markers let you correlate regressions with specific changes.
- Record a deployment marker using the New Relic Change Tracking API. Include attributes like version, commit SHA, and deployer.
- Visualize deployments on your dashboard — Add a billboard or timeline widget that shows recent deployments alongside your performance charts.
- Correlate a change — After recording a marker, trigger a few requests and confirm you can see the deployment event overlaid on your metrics charts in New Relic.
A simple example using curl:
curl -X POST "https://api.newrelic.com/graphql" \
-H "Content-Type: application/json" \
-H "API-Key: YOUR_USER_API_KEY" \
-d '{
"query": "mutation { changeTrackingCreateDeployment(deployment: {version: \"1.0.1\", entityGuid: \"YOUR_ENTITY_GUID\", description: \"Added custom metrics and SLOs\"}) { entityGuid deploymentId } }"
}'
Key Metrics to Monitor
For a travel planning agent, focus on:
| Metric |
Why It Matters |
Target |
| Response Time (p95) |
Speed affects user experience |
< 3 seconds |
| Error Rate |
Reliability |
< 1% |
| Token Usage (avg) |
Cost per request |
< 500 tokens |
| Tool Success Rate |
Accuracy |
> 95% |
Success Criteria
To complete this challenge successfully, you should be able to:
Learning Resources
Tips
- Use consistent, hierarchical naming for metrics (e.g.,
travel_plan.requests.total, travel_plan.errors.total)
- Know what “normal” looks like before problems occur - establish baselines
- Start with conservative alert thresholds and tune as you learn your system’s behavior
- Keep dashboards focused - don’t overwhelm with too many charts
- Consider adding environment attributes to distinguish between dev/staging/production
Advanced Challenges (Optional)
- Define an AI Quality SLO based on evaluation scores from your agent responses
- Set up notification channels (Slack, PagerDuty) for your alerts
- Build a dashboard that shows trends over time to identify gradual degradation
- Automate deployment marker creation in a CI/CD pipeline (e.g., GitHub Actions) so every deploy is tracked automatically
- Create a “Change Impact” dashboard page that compares error rate and latency before vs. after each deployment