Skip to content

Presidio Analyzer

The Presidio analyzer is a Python based service for detecting PII entities in text.

During analysis, it runs a set of different PII Recognizers, each one in charge of detecting one or more PII entities using different mechanisms.

Presidio analyzer comes with a set of predefined recognizers, but can easily be extended with other types of custom recognizers. Predefined and custom recognizers leverage regex, Named Entity Recognition and other types of logic to detect PII in unstructured text.

Analyzer Design

Installation

see Installing Presidio.

Getting started

Once the Presidio-analyzer package is installed, run this simple analysis script:

from presidio_analyzer import AnalyzerEngine

# Set up the engine, loads the NLP module (spaCy model by default) and other PII recognizers
analyzer = AnalyzerEngine()

# Call analyzer to get results
results = analyzer.analyze(text="My phone number is 212-555-5555",
                           entities=["PHONE_NUMBER"],
                           language='en')
print(results)

You can run presidio analyzer as an http server using either python runtime or using a docker container.

Using docker container

cd presidio-analyzer
docker run -p 5002:3000 presidio-analyzer

Using python runtime

Note

This requires the Presidio Github repository to be cloned.

cd presidio-analyzer
python app.py
curl -d '{"text":"John Smith drivers license is AC432223", "language":"en"}' -H "Content-Type: application/json" -X POST http://localhost:3000/analyze

Creating PII recognizers

Presidio analyzer can be easily extended to support additional PII entities. See this tutorial on adding new PII recognizers for more information.

Multi-language support

Presidio can be used to detect PII entities in multiple languages. Refer to the multi-language support for more information.

Outputting the analyzer decision process

Presidio analyzer has a built in mechanism for tracing each decision made. This can be useful when attempting to understand a specific PII detection. For more info, see the decision process documentation.

Supported entities

For a list of the current supported entities: Supported entities.

API reference

Follow the API Spec for the Analyzer REST API reference details and Analyzer Python API for Python API reference

Samples

Samples illustrating the usage of the Presidio Analyzer can be found in the Python samples.