Recipes Gallery

Welcome to the Presidio Recipes Gallery! This section provides curated, end-to-end examples demonstrating how to customize Microsoft Presidio for specific data privacy and de-identification scenarios.

What are Recipes?

Recipes are comprehensive, reproducible examples tailored to common data domains and use cases. Each recipe goes beyond basic documentation to provide:

Real-world Context: Focused on specific scenarios like financial chatbot conversations, clinical notes, REST API logs, or multilingual content
Synthetic Data Generation: Methods for creating realistic test data that mimics your production environment
Performance Benchmarks: Evaluation metrics (precision, recall, F₂ score, and latency) across different Presidio configurations
Progressive Complexity: Examples ranging from out-of-the-box usage to advanced customization with transformers, LLMs, or hybrid approaches

Why Use Recipes?

While Presidio's documentation covers the fundamentals, recipes bridge the gap between generic examples and production-ready implementations. They help you:

Evaluate Performance: Understand Presidio's accuracy and speed for your specific domain before deployment
Customize Effectively: Learn which recognizers, models, and configurations work best for different data types
Compare Approaches: See side-by-side comparisons of different implementation strategies
Reduce Development Time: Start with a working example close to your use case instead of building from scratch

Recipe Structure

Each recipe typically includes:

Scenario Description: The domain and data type
Data Synthesis: Methods for generating test data using Presidio Evaluator or custom methods
Configuration: Presidio setup with any custom recognizers or models
Evaluation: Performance metrics (precision, recall, F₂ score, latency)
Implementation: Jupyter notebook or Python scripts showing the end-to-end flow (see example)
Key Findings: When to use this approach and trade-offs to consider

For complex flows, consider breaking into multiple notebooks or scripts for better organization.

Available Recipes

Currently, the recipes gallery is being built. Check back soon for recipes covering:

Financial Domain: Chat conversations, transaction logs, customer service interactions
Healthcare Domain: Clinical notes, patient records, medical reports
Retail/E-commerce: Customer data, order information, support tickets
Enterprise: REST API logs, database exports, internal communications
Multilingual: Examples for Spanish, French, German, and other languages

Recipe Performance Table (Coming Soon)

We're developing a comprehensive benchmark table that will show Presidio's performance across different domains and implementation levels. The table will include:

Domain / Scenario	Out-of-the-box (spaCy)	Augmented (+ custom recognizers)	Custom Model (ML/Transformer)	Hybrid "Best-Effort" (ensemble/LLM)
Financial (Chatbot)	Coming Soon	Coming Soon	Coming Soon	Coming Soon
Medical (Clinical Notes)	Coming Soon	Coming Soon	Coming Soon	Coming Soon
Retail (JSON REST)	Coming Soon	Coming Soon	Coming Soon	Coming Soon
Multilingual (Spanish)	Coming Soon	Coming Soon	Coming Soon	Coming Soon

Each cell will contain: - P = Precision - R = Recall
- F₂ = F₂ score (recall-weighted F-score) - Latency = Average processing time per sample (milliseconds) - Notebook = Link to interactive Jupyter notebook

How to Use a Recipe

Browse the recipes to find one matching your domain or use case
Review the notebook to understand the approach and results
Run the notebook in your environment to reproduce the results
Adapt the configuration to your specific data and requirements
Evaluate performance on your own test dataset
Deploy the configuration that best meets your accuracy and performance needs

Contributing a Recipe

We welcome community contributions! See our contribution guidelines for details.

Reference Examples: - Evaluate Presidio Analyzer - Complete end-to-end evaluation workflow - Generate Synthetic Data - Presidio Evaluator data generator

Follow the pattern: Data Synthesis → Configuration → Evaluation

For complex flows, break into multiple notebooks or scripts. Focus on getting working code first - we'll help refine documentation during review.

Presidio Samples: Additional usage examples and integration patterns
Tutorial Series: Step-by-step guide to Presidio features
Best Practices for Developing Recognizers: Deep dive into creating custom PII recognizers
Presidio Research Repository: Evaluation tools and research datasets
FAQ: Common questions about improving detection accuracy

Questions or Feedback?

If you have questions about recipes or suggestions for new scenarios to cover, please:

Open an issue on GitHub
Email us at presidio@microsoft.com
Join the discussion in our community channels

Note: The recipes gallery demonstrates Presidio's flexibility and customization capabilities. The goal is to show that Presidio is designed to be adapted to your specific needs, not used as a one-size-fits-all solution. Each recipe illustrates best practices for customization in different contexts.