Skip to content

Installing Presidio

Description

This document describes the installation of the entire Presidio suite using pip (as Python packages) or using Docker (As containerized services).

Using pip

Note

Consider installing the Presidio python packages in a virtual environment like venv or conda.

Supported Python Versions

Presidio is supported for the following python versions:

  • 3.7
  • 3.8
  • 3.9
  • 3.10
  • 3.11

PII anonymization on text

For PII anonymization on text, install the presidio-analyzer and presidio-anonymizer packages with at least one NLP engine (spaCy, transformers or stanza):

pip install presidio_analyzer
pip install presidio_anonymizer
python -m spacy download en_core_web_lg
pip install "presidio_analyzer[transformers]"
pip install presidio_anonymizer
python -m spacy download en_core_web_sm

Note

When using a transformers NLP engine, Presidio would still use spaCy for other capabilities, therefore a small spaCy model (such as en_core_web_sm) is required. Transformers models would be loaded lazily. To pre-load them, see: Downloading a pre-trained model

pip install "presidio_analyzer[stanza]"
pip install presidio_anonymizer

Note

Stanza models would be loaded lazily. To pre-load them, see: Downloading a pre-trained model.

PII redaction in images

For PII redaction in images

  1. Install the presidio-image-redactor package:

    pip install presidio_image_redactor
    
    # Presidio image redactor uses the presidio-analyzer
    # which requires a spaCy language model:
    python -m spacy download en_core_web_lg
    
  2. Install an OCR engine. The default version uses the Tesseract OCR Engine. More information on installation can be found here.

Using Docker

Presidio can expose REST endpoints for each service using Flask and Docker. To download the Presidio Docker containers, run the following command:

Note

This requires Docker to be installed. Download Docker.

For PII anonymization in text

For PII detection and anonymization in text, the presidio-analyzer and presidio-anonymizer modules are required.

# Download Docker images
docker pull mcr.microsoft.com/presidio-analyzer
docker pull mcr.microsoft.com/presidio-anonymizer

# Run containers with default ports
docker run -d -p 5001:3000 mcr.microsoft.com/presidio-analyzer:latest

docker run -d -p 5002:3000 mcr.microsoft.com/presidio-anonymizer:latest

For PII redaction in images

For PII detection in images, the presidio-image-redactor is required.

# Download Docker image
docker pull mcr.microsoft.com/presidio-image-redactor

# Run container with the default port
docker run -d -p 5003:3000 mcr.microsoft.com/presidio-image-redactor:latest

Once the services are running, their APIs are available. API reference and example calls can be found here.

Install from source

To install Presidio from source, first clone the repo:

  • using HTTPS
git clone https://github.com/microsoft/presidio.git
  • Using SSH
git clone git@github.com:microsoft/presidio.git

Then, build the containers locally.

Note

Presidio uses docker-compose to manage the different Presidio containers.

From the root folder of the repo:

docker-compose --build

To run all Presidio services:

docker-compose up -d

Alternatively, you can build and run individual services. For example, for the presidio-anonymizer service:

docker build ./presidio-anonymizer -t presidio/presidio-anonymizer

And run:

docker run -d -p 5002:5001 presidio/presidio-anonymizer

For more information on developing locally, refer to the setting up a development environment section.