< Previous Challenge - Home - Next Challenge >
You can’t just ship AI without testing. What if the agent returns a non-existent destination? What if the itinerary is way too long or short? What if recommendations are unsafe (war zones, extreme weather)? What if the response includes toxicity or negativity?
In this challenge, you’ll build an automated quality gate for your AI agents using New Relic’s AI Monitoring platform. Quality gates ensure that only high-quality travel plans reach your customers.
Your goal is to implement a comprehensive evaluation and quality assurance system for your AI-generated travel plans. This involves several layers of evaluation working together.
OpenTelemetry defines an Event as a LogRecord with a non-empty EventName. Custom Events are a core signal in the New Relic platform. However, despite using the same name, OpenTelemetry Events and New Relic Custom Events are not identical concepts:
EventNames do not share the same format or semantics as Custom Event types. OpenTelemetry Event names are fully qualified with a namespace and follow lower snake case, e.g. com.acme.my_event. Custom Event types are pascal case, e.g. MyEvent.EventName acts as an unambiguous signal of the class / type of event which occurred. Custom Events are treated as an entirely new event type, accessible via NRQL with SELECT * FROM MyEvent.Because of these differences, OpenTelemetry Events are ingested as New Relic Logs since most of the time, OpenTelemetry Events are closer in similarity to New Relic Logs than New Relic Custom Events.
However, you can explicitly signal that an OpenTelemetry LogRecord should be ingested as a Custom Event by adding an entry to LogRecord.attributes following the form: newrelic.event.type=<EventType>.
For example, a LogRecord with attribute newrelic.event.type=MyEvent will be ingested as a Custom Event with type=MyEvent, and accessible via NRQL with: SELECT * FROM MyEvent.
The foundation of enterprise AI evaluation is capturing AI interactions as structured events. New Relic’s AI Monitoring uses a special attribute newrelic.event.type that automatically populates:


Quality Evaluation - Detect issues like toxicity and safety concerns
New Relic LLM Evaluation


You need to emit three custom events after each LLM interaction:
LlmChatCompletionMessage for the user prompt (role: “user”, sequence: 0)
newrelic.event.type - LlmChatCompletionMessage,appName - Service nameduration - duration of the interactionhost - hostname of the serviceid - user ID (if available)request_id - unique ID for the request (e.g., UUID)span_id - OpenTelemetry span ID for trace correlationtrace_id - Links feedback to the specific AI interactionresponse.model - model used for the responsetoken_count - number of tokens in the promptvendor - LLM vendor used (e.g., “openai”, “azure”, “anthropic”)ingest_source - “Python” (or your language of choice)content - the user prompt textrole - “user” for the promptsequence - 0 for user promptis_response - boolean indicating if this event is a user prompt (False) or an LLM response (True)completion_id - unique ID for the LLM completion (e.g., UUID)user_id (optional) - If availableLlmChatCompletionMessage for the LLM response (role: “assistant”, sequence: 1)
newrelic.event.type - LlmChatCompletionMessage,appName - Service nameduration - duration of the interactionhost - hostname of the serviceid - user ID (if available)request_id - unique ID for the request (e.g., UUID)span_id - OpenTelemetry span ID for trace correlationtrace_id - Links feedback to the specific AI interactionresponse.model - model used for the responsetoken_count - number of tokens in the responsevendor - LLM vendor used (e.g., “openai”, “azure”, “anthropic”)ingest_source - “Python” (or your language of choice)content - the LLM response textrole - “assistant” for the responsesequence - 1is_response - boolean indicating if this event is a user prompt (False) or an LLM response (True)completion_id - unique ID for the LLM completion (e.g., UUID)user_id (optional) - If availableLlmChatCompletionSummary for the summary of the interaction
newrelic.event.type - LlmChatCompletionSummary,appName - Service nameduration - duration of the interactionhost - hostname of the serviceid - user ID (if available)request_id - unique ID for the request (e.g., UUID)span_id - OpenTelemetry span ID for trace correlationtrace_id - Links feedback to the specific AI interactionrequest.model - model used for the requestresponse.model - model used for the responsetoken_count - number of tokens (input + output)vendor - LLM vendor used (e.g., “openai”, “azure”, “anthropic”)ingest_source - “Python” (or your language of choice)Implement deterministic checks against business rules:
Integrate the evaluation system into your Flask application:
Capture real user feedback to measure actual satisfaction with AI-generated travel plans:
trace_id from the agent interaction in the feedback log recordnewrelic.event.type: 'LlmFeedbackMessage' containing:
newrelic.event.type - LlmFeedbackMessage,appName - Service nametrace_id - Links feedback to the specific AI interactionfeedback - User’s feedback (e.g., “positive”, “negative”, “neutral”)rating - User’s thumbs up (1) or thumbs down (-1)vendor - LLM vendor used (e.g., “openai”, “azure”, “anthropic”)user_id (optional) - If availableThis feedback data will help you:
Use another LLM to evaluate responses for:
Once you emit the custom events, you can access New Relic’s curated AI Monitoring experience:
Hint: You may need to pin the “AI Monitoring” section in New Relic’s sidebar via “All capabilities” to see it.

To complete this challenge successfully, you should be able to:
LlmChatCompletionMessage, LlmChatCompletionSummary) are being sent to New RelicLlmFeedbackMessage events with trace_id correlation are sent to New Relictrace_idLlmChatCompletionMessageLlmChatCompletionSummaryLlmFeedbackMessagetrace_id from the agent response in your frontend so it can be sent back with user feedbackSELECT * FROM LlmFeedbackMessage WHERE trace_id = 'xxx' to correlate feedback with interactionsFROM LlmChatCompletionSummary, LlmFeedbackMessage WHERE trace_id = trace_id