Presidio Analyzer
The Presidio analyzer is a Python based service for detecting PII entities in text.
During analysis, it runs a set of different PII Recognizers, each one in charge of detecting one or more PII entities using different mechanisms.
Presidio analyzer comes with a set of predefined recognizers, but can easily be extended with other types of custom recognizers. Predefined and custom recognizers leverage regex, Named Entity Recognition and other types of logic to detect PII in unstructured text.
Installation
see Installing Presidio.
Getting started
Once the Presidio-analyzer package is installed, run this simple analysis script:
from presidio_analyzer import AnalyzerEngine
# Set up the engine, loads the NLP module (spaCy model by default) and other PII recognizers
analyzer = AnalyzerEngine()
# Call analyzer to get results
results = analyzer.analyze(text="My phone number is 212-555-5555",
entities=["PHONE_NUMBER"],
language='en')
print(results)
You can run presidio analyzer as an http server using either python runtime or using a docker container.
Using docker container
cd presidio-analyzer
docker run -p 5002:3000 presidio-analyzer
Using python runtime
Note
This requires the Presidio Github repository to be cloned.
cd presidio-analyzer
python app.py
curl -d '{"text":"John Smith drivers license is AC432223", "language":"en"}' -H "Content-Type: application/json" -X POST http://localhost:3000/analyze
Creating PII recognizers
Presidio analyzer can be easily extended to support additional PII entities. See this tutorial on adding new PII recognizers for more information.
Multi-language support
Presidio can be used to detect PII entities in multiple languages. Refer to the multi-language support for more information.
Outputting the analyzer decision process
Presidio analyzer has a built in mechanism for tracing each decision made. This can be useful when attempting to understand a specific PII detection. For more info, see the decision process documentation.
Supported entities
For a list of the current supported entities: Supported entities.
API reference
Follow the API Spec for the Analyzer REST API reference details and Analyzer Python API for Python API reference
Samples
Samples illustrating the usage of the Presidio Analyzer can be found in the Python samples.