Contributing to Presidio Recipes Gallery
Thank you for your interest in contributing! Recipes help others learn how to customize Presidio for specific domains.
What Makes a Good Recipe?
A good recipe should:
-
Focus on a Specific Domain: Target a well-defined scenario (e.g., "Financial chatbot logs", "Clinical notes", "Customer support tickets")
-
Be Reproducible: Include working code that others can run
-
Follow the Research Pattern: Build upon the end-to-end approach from presidio-research, showing:
- Data Synthesis: How to generate or obtain test data
- Configuration: Your Presidio setup with any custom recognizers
-
Evaluation: Metrics (precision, recall, F₂) showing performance
-
Keep It Simple: Focus on getting a working example first. Detailed documentation can come later.
Recipe Format
Follow Presidio Research Examples
Your recipe should follow the end-to-end evaluation approach from presidio-research.
Key reference notebooks: - Evaluate Presidio Analyzer - Complete evaluation workflow example - Generate Synthetic Data - Using Presidio Evaluator data generator
Option 1: Single Jupyter Notebook (Recommended for Simple Cases)
For straightforward examples, use one notebook:
your-recipe-name/
├── recipe.ipynb # Main notebook with data synthesis → evaluation
└── README.md # Brief overview (optional)
Option 2: Multiple Files (Recommended for Complex Flows)
For complex scenarios, break into separate files:
your-recipe-name/
├── 1_generate_data.ipynb # Data synthesis (use Presidio Evaluator or custom)
├── 2_configure.ipynb # Presidio setup with custom recognizers
├── 3_evaluate.ipynb # Run evaluation and analysis
└── README.md # Overview and instructions
Or as Python scripts:
your-recipe-name/
├── generate_data.py # Generate test data
├── configure.py # Presidio setup
├── evaluate.py # Run evaluation
└── README.md # Overview
Required Components
Your recipe should include: - Data Synthesis: Generate synthetic data using Presidio Evaluator or your own method - Presidio Configuration: Show your setup (default, custom recognizers, or custom models) - Evaluation: Measure and report precision, recall, F₂ score, latency - Key Findings: Brief summary of results and when to use this approach
Quick Contribution Steps
-
Fork the repo and create a new branch
-
Create your recipe folder under
docs/recipes/your-recipe-name/ -
Use the template: Copy
template.mdor start with a notebook -
Focus on working code first: Don't worry about making it perfect
-
Submit a PR: We'll help refine it during review
Evaluation Metrics
Include at minimum:
- Precision: Percentage of detected entities that were correct
- Recall: Percentage of actual PII that was detected
- F₂ Score: Recall-weighted F-score (emphasizes catching all PII)
- Latency: Average processing time per sample
Examples to Learn From
Presidio Research Notebooks (recommended starting point): - Evaluate Presidio Analyzer - Complete end-to-end evaluation example - Generate Synthetic Data - Presidio Evaluator data generator - Other presidio-research notebooks - Additional examples and tools
Additional Resources: - Presidio Samples: Integration patterns and usage examples
Questions?
- Open an issue with the
recipelabel - Email presidio@microsoft.com
- Tag @omri374 in your PR for guidance
License
By contributing, you agree to license your work under the MIT License.