Challenge 05 - Monitoring Best Practices

< Previous Challenge - Home - Next Challenge >

Introduction

You’re now sending telemetry to New Relic, but raw data alone isn’t enough. Without dashboards and alerts, you’re manually checking traces all the time, you won’t notice problems until customers complain, and you can’t track performance trends over time.

In this challenge, you’ll learn industry best practices for monitoring AI-driven applications by creating custom dashboards, setting up alerts, defining Service Level Objectives (SLOs), tracking deployment changes, and collecting key metrics that matter for your travel planning service.

Description

Your goal is to build a comprehensive monitoring solution for WanderAI that includes dashboards, alerts, SLOs, deployment tracking, and meaningful metrics collection.

Part 1: Enhanced Metrics Collection

Add custom metrics to your application using the OpenTelemetry meter:

Emit these metrics from your application code at appropriate points (request handlers, error handlers, tool functions). Check mock application code for examples and hints of how to create and record metrics using the get_meter() helper function.

Part 2: Create a New Relic Dashboard

Build a dashboard in New Relic called “WanderAI Agent Performance” that visualizes:

Use New Relic Query Language (NRQL) queries to power your dashboard widgets.

Part 3: Set Up Alerts

Configure alerts in New Relic to notify your team when:

Part 4: Define Service Level Objectives (SLOs)

SLOs shift your monitoring mindset from reactive (“something broke, let’s fix it”) to proactive (“are we meeting our promises to users?”). Define Service Level Indicators (SLIs) and their corresponding objectives for WanderAI:

Create these SLOs in New Relic using the Service Level Management UI. Then:

Part 5: Change Tracking & Deployment Markers

When performance degrades, the first question is always: “Did we deploy something?” Deployment markers let you correlate regressions with specific changes.

A simple example using curl:

curl -X POST "https://api.newrelic.com/graphql" \
  -H "Content-Type: application/json" \
  -H "API-Key: YOUR_USER_API_KEY" \
  -d '{
    "query": "mutation { changeTrackingCreateDeployment(deployment: {version: \"1.0.1\", entityGuid: \"YOUR_ENTITY_GUID\", description: \"Added custom metrics and SLOs\"}) { entityGuid deploymentId } }"
  }'

Key Metrics to Monitor

For a travel planning agent, focus on:

Metric Why It Matters Target
Response Time (p95) Speed affects user experience < 3 seconds
Error Rate Reliability < 1%
Token Usage (avg) Cost per request < 500 tokens
Tool Success Rate Accuracy > 95%

Success Criteria

To complete this challenge successfully, you should be able to:

Learning Resources

Tips

Advanced Challenges (Optional)