Getting started with image de-identification with Presidio
Presidio provides a simple way to de-identify image data by detecting and anonymizing personally identifiable information (PII). This guide shows you how to get started with image de-identification using Presidio's Python packages.
Presidio has two main modules for image de-identification: General purpose, and specifically for DICOM (medical) images.
Simple flow - Python package
-
Install presidio-image-redactor
pip install presidio-image-redactor
-
Redact PII from image
from presidio_image_redactor import ImageRedactorEngine from PIL import Image image = Image.open(path_to_image_file) redactor = ImageRedactorEngine() redactor.redact(image=image)
-
Install presidio-image-redactor
pip install presidio-image-redactor
-
Redact text PII from DICOM image
import pydicom from presidio_image_redactor import DicomImageRedactorEngine # Set input and output paths input_path = "path/to/your/dicom/file.dcm" output_dir = "./output" # Initialize the engine engine = DicomImageRedactorEngine() # Option 1: Redact from a loaded DICOM image dicom_image = pydicom.dcmread(input_path) redacted_dicom_image = engine.redact(dicom_image, fill="contrast") # Option 2: Redact from DICOM file engine.redact_from_file(input_path, output_dir, padding_width=25, fill="contrast") # Option 3: Redact from directory engine.redact_from_directory("path/to/your/dicom", output_dir, padding_width=25, fill="contrast")
Simple flow - Docker container
Presidio provides a Docker containers that you can use to de-identify image data.
- Download Docker image
docker pull mcr.microsoft.com/presidio-image-redactor
- Run container
docker run -d -p 5003:3000 mcr.microsoft.com/presidio-image-redactor
- Use the API
curl -XPOST "http://localhost:5003/redact" -H "content-type: multipart/form-data" -F "image=@img.png" -F "data=\"{'color_fill':'255'}\"" > out.png