Installing Presidio
Description
This document describes the installation of the entire
Presidio suite using pip
(as Python packages) or using Docker
(As containerized services).
Using pip
Supported Python Versions
Presidio is supported for the following python versions:
- 3.7
- 3.8
- 3.9
- 3.10
- 3.11
PII anonymization on text
For PII anonymization on text, install the presidio-analyzer
and presidio-anonymizer
packages
with at least one NLP engine (spaCy
, transformers
or stanza
):
pip install presidio_analyzer
pip install presidio_anonymizer
python -m spacy download en_core_web_lg
pip install "presidio_analyzer[transformers]"
pip install presidio_anonymizer
python -m spacy download en_core_web_sm
Note
When using a transformers NLP engine, Presidio would still use spaCy for other capabilities, therefore a small spaCy model (such as en_core_web_sm) is required. Transformers models would be loaded lazily. To pre-load them, see: Downloading a pre-trained model
pip install "presidio_analyzer[stanza]"
pip install presidio_anonymizer
Note
Stanza models would be loaded lazily. To pre-load them, see: Downloading a pre-trained model.
PII redaction in images
For PII redaction in images
-
Install the
presidio-image-redactor
package:pip install presidio_image_redactor # Presidio image redactor uses the presidio-analyzer # which requires a spaCy language model: python -m spacy download en_core_web_lg
-
Install an OCR engine. The default version uses the Tesseract OCR Engine. More information on installation can be found here.
Using Docker
Presidio can expose REST endpoints for each service using Flask and Docker. To download the Presidio Docker containers, run the following command:
Note
This requires Docker to be installed. Download Docker.
For PII anonymization in text
For PII detection and anonymization in text, the presidio-analyzer
and presidio-anonymizer
modules are required.
# Download Docker images
docker pull mcr.microsoft.com/presidio-analyzer
docker pull mcr.microsoft.com/presidio-anonymizer
# Run containers with default ports
docker run -d -p 5002:3000 mcr.microsoft.com/presidio-analyzer:latest
docker run -d -p 5001:3000 mcr.microsoft.com/presidio-anonymizer:latest
For PII redaction in images
For PII detection in images, the presidio-image-redactor
is required.
# Download Docker image
docker pull mcr.microsoft.com/presidio-image-redactor
# Run container with the default port
docker run -d -p 5003:3000 mcr.microsoft.com/presidio-image-redactor:latest
Once the services are running, their APIs are available. API reference and example calls can be found here.
Install from source
To install Presidio from source, first clone the repo:
- using HTTPS
git clone https://github.com/microsoft/presidio.git
- Using SSH
git clone git@github.com:microsoft/presidio.git
Then, build the containers locally.
Note
Presidio uses docker-compose to manage the different Presidio containers.
From the root folder of the repo:
docker-compose --build
To run all Presidio services:
docker-compose up -d
Alternatively, you can build and run individual services.
For example, for the presidio-anonymizer
service:
docker build ./presidio-anonymizer -t presidio/presidio-anonymizer
And run:
docker run -d -p 5001:5001 presidio/presidio-anonymizer
For more information on developing locally, refer to the setting up a development environment section.