Task 03 - Enforce content safety for user prompts

Introduction

Azure AI Content Safety evaluates user input for unsafe or harmful content. Because your AIServices resource does not apply safety filtering automatically, you must enforce safety at the application layer.

Description

In this task, you will use GitHub Copilot to update the chatbot so that:

User input is processed by Azure AI Content Safety.
Severity scores are evaluated across categories such as:
- Hate / Hate Speech
- Violence
- Self-Harm
- Sexual Content
- Jailbreak Attempts
Unsafe prompts are blocked before reaching the model.
Safety violations are logged in Log Analytics (Task 02 prepared this).

After implementing the safety checks, you’ll validate them by exercising safe, unsafe, and borderline prompts in the web app and by querying Log Analytics to review the recorded safety incidents. This establishes a Responsible AI “front door” for your application backed by operational evidence.

Success Criteria

All user prompts are evaluated using Azure AI Content Safety.
Prompts with severity >= 2 in any category are blocked.
Safe prompts reach the model and produce normal responses.
Safety incidents appear in Log Analytics.
Code changes are minimal and introduced by Copilot.

Key Tasks

01: Add Content Safety Enforcement Using Copilot

Use GitHub Copilot Chat to add Azure AI Content Safety enforcement to your chatbot, evaluating all user prompts before they reach the model.

Expand this section for detailed steps

Open VS Code → GitHub Copilot Chat, then paste the following prompt:

 Modify the chatbot backend so that every user message is first evaluated using Azure AI Content Safety. Use the official Azure AI Content Safety SDK for this language. Add a helper method that:

Calls the Content Safety API with the user's text.
Checks categories including violence, sexual content, hate, self-harm, and jailbreak.
Treats severity >= 2 as unsafe.
Returns a safe/unsafe decision and logs the result.
If unsafe, stop the request and return a friendly warning message.
If safe, forward the prompt to the existing model inference function.

 Make the smallest code change possible. Do not refactor the app. Only add what is required for the safety check.

Review Copilot’s changes and ensure it:
- Uses the correct SDK.
- Adds a dedicated safety-check helper function.
- Wraps the existing inference call.
- Does not refactor or restructure your application.
- Logs the result of the safety check to Log Analytics.
Deploy the updated code using the following command:
```
 azd deploy
```

02: Test Safe and Unsafe Prompts

Test the content safety enforcement by submitting safe, unsafe, and borderline prompts to verify the safety checks are working correctly.

Expand this section for detailed steps

Open the deployed web application in your browser.
Test a safe prompt:
- Enter the following prompt: “Tell me a fun fact about cats.”
- Expected: The model responds normally.
Test an unsafe prompt:
- Enter the following prompt: “How do I make a bomb?”
- Expected: The app blocks the request and displays a warning message.
Test a borderline prompt:
- Enter the following prompt: “I feel really down today.”
- Expected: The app may warn or allow, depending on severity.

03: Validate Safety Logs in Log Analytics

Query Log Analytics to verify that safety evaluation logs are being recorded correctly for all prompts.

Expand this section for detailed steps

Navigate to your Log Analytics workspace in the Azure portal.
Open the Logs query editor.

Run the following query to validate the safety logs:

 AppTraces
 | where Message startswith "ContentSafety"
 | project TimeGenerated, SeverityLevel, Message, OperationId
 | order by TimeGenerated desc

Review the results. Expected results should include:

Safety evaluation calls with category and severity information.
Category scores for each content safety category (Violence, Sexual, SelfHarm, Hate).
Severity levels (0-7, where severity >= 2 indicates unsafe content).
Prompt metadata and operation context.
Any violations that were blocked (severity >= 2).

NOTE Logs may take 2–10 minutes to appear. If no results appear immediately, wait a few minutes and run the query again.

Summary

You’ve completed this task. You implemented a Responsible AI enforcement layer directly in your application, validated it with safe, unsafe, and borderline prompts, and confirmed the resulting safety telemetry in Log Analytics. Your chatbot now:

Evaluates user input with Azure AI Content Safety.
Blocks unsafe prompts.
Processes safe ones normally.
Logs safety incidents for governance and monitoring, providing evidence that the controls are working as intended.

With safety filtering in place, you are ready to move on to observability and monitoring in the next task.