Presidio with Fabric
This folder contains guides and samples for running Presidio in Microsoft Fabric notebooks with Spark for scalable PII detection and anonymization.
Contents
- Environment Setup - How to set up your Fabric environment with the required dependencies.
- Presidio and Spark Notebook - The sample notebook demonstrating PII detection and anonymization using Presidio with Spark.
- Sample Data - Example CSV data for testing the PII detection and anonymization workflow.
Overview
Microsoft Fabric provides a unified analytics platform, and these samples demonstrate how to leverage Presidio's PII detection and anonymization capabilities at scale using Fabric's Spark processing. The integration enables data engineers and analysts to:
- Detect PII in large datasets using distributed Spark processing
- Anonymize sensitive information in a scalable manner
- Enhance data privacy compliance for analytics workloads
What You'll Accomplish
By the end of this guide, you'll have a working Presidio implementation in Fabric that can:
- Automatically detect various types of PII in your data at scale
- Transform sensitive data while preserving its analytical value
- Scale to handle larger datasets through Spark's distributed processing
- Integrate with your existing Fabric data workflows
You can easily incorporate this solution into your Fabric pipelines for scheduled runs and integrated workflows, or extend it with custom detection rules and anonymization methods.
Results Preview
Here's what you'll achieve when running the sample notebook:
PII Detection Results
Quickly identify what types of sensitive information exist in your data, helping you understand your privacy exposure:
Anonymized Data
Transform your data by masking or replacing sensitive information while keeping its structure intact:
Scale Testing
See how the solution performs with larger datasets— for understanding how it will handle your production workloads:
Delta Table Output
Save your anonymized data directly to a Delta table, making it immediately available for downstream analytics while maintaining privacy: