In [ ]:

Copied!

# download presidio
!pip install presidio_analyzer presidio_anonymizer
!python -m spacy download en_core_web_lg
# download presidio
!pip install presidio_analyzer presidio_anonymizer
!python -m spacy download en_core_web_lg

Path to notebook: https://www.github.com/microsoft/presidio/blob/main/docs/samples/python/presidio_notebook.ipynb ¶

In [ ]:

Copied!





from presidio_analyzer import AnalyzerEngine, PatternRecognizer
from presidio_anonymizer import AnonymizerEngine
from presidio_anonymizer.entities import OperatorConfig
import json
from pprint import pprint
from presidio_analyzer import AnalyzerEngine, PatternRecognizer
from presidio_anonymizer import AnonymizerEngine
from presidio_anonymizer.entities import OperatorConfig
import json
from pprint import pprint

Analyze Text for PII Entities¶

Using Presidio Analyzer, analyze a text to identify PII entities. The Presidio analyzer is using pre-defined entity recognizers, and offers the option to create custom recognizers.

The following code sample will:

Set up the Analyzer engine: load the NLP module (spaCy model by default) and other PII recognizers
Call analyzer to get analyzed results for "PHONE_NUMBER" entity type

In [ ]:

Copied!

text_to_anonymize = "His name is Mr. Jones and his phone number is 212-555-5555"
text_to_anonymize = "His name is Mr. Jones and his phone number is 212-555-5555"

In [ ]:

Copied!

analyzer = AnalyzerEngine()
analyzer_results = analyzer.analyze(text=text_to_anonymize, entities=["PHONE_NUMBER"], language='en')

print(analyzer_results)
analyzer = AnalyzerEngine()
analyzer_results = analyzer.analyze(text=text_to_anonymize, entities=["PHONE_NUMBER"], language='en')

print(analyzer_results)

Create Custom PII Entity Recognizers¶

Presidio Analyzer comes with a pre-defined set of entity recognizers. It also allows adding new recognizers without changing the analyzer base code, by creating custom recognizers. In the following example, we will create two new recognizers of type PatternRecognizer to identify titles and pronouns in the analyzed text. A PatternRecognizer is a PII entity recognizer which uses regular expressions or deny-lists.

The following code sample will:

Create custom recognizers
Add the new custom recognizers to the analyzer
Call analyzer to get results from the new recognizers

In [ ]:

Copied!





titles_recognizer = PatternRecognizer(supported_entity="TITLE",
                                      deny_list=["Mr.","Mrs.","Miss"])

pronoun_recognizer = PatternRecognizer(supported_entity="PRONOUN",
                                       deny_list=["he", "He", "his", "His", "she", "She", "hers", "Hers"])

analyzer.registry.add_recognizer(titles_recognizer)
analyzer.registry.add_recognizer(pronoun_recognizer)

analyzer_results = analyzer.analyze(text=text_to_anonymize,
                            entities=["TITLE", "PRONOUN"],
                            language="en")
print(analyzer_results)
titles_recognizer = PatternRecognizer(supported_entity="TITLE",
                                      deny_list=["Mr.","Mrs.","Miss"])

pronoun_recognizer = PatternRecognizer(supported_entity="PRONOUN",
                                       deny_list=["he", "He", "his", "His", "she", "She", "hers", "Hers"])

analyzer.registry.add_recognizer(titles_recognizer)
analyzer.registry.add_recognizer(pronoun_recognizer)

analyzer_results = analyzer.analyze(text=text_to_anonymize,
                            entities=["TITLE", "PRONOUN"],
                            language="en")
print(analyzer_results)

Call Presidio Analyzer and get analyzed results with all the configured recognizers - default and new custom recognizers

In [ ]:

Copied!

analyzer_results = analyzer.analyze(text=text_to_anonymize, language='en')

analyzer_results
analyzer_results = analyzer.analyze(text=text_to_anonymize, language='en')

analyzer_results

Anonymize Text with Identified PII Entities¶

Presidio Anonymizer iterates over the Presidio Analyzer result, and provides anonymization capabilities for the identified text.
The anonymizer provides 5 types of anonymizers - replace, redact, mask, hash and encrypt. The default is replace

The following code sample will:

Setup the anonymizer engine
Create an anonymizer request - text to anonymize, list of anonymizers to apply and the results from the analyzer request
Anonymize the text

In [ ]:

Copied!





anonymizer = AnonymizerEngine()

anonymized_results = anonymizer.anonymize(
    text=text_to_anonymize,
    analyzer_results=analyzer_results,    
    operators={"DEFAULT": OperatorConfig("replace", {"new_value": "<ANONYMIZED>"}), 
                        "PHONE_NUMBER": OperatorConfig("mask", {"type": "mask", "masking_char" : "*", "chars_to_mask" : 12, "from_end" : True}),
                        "TITLE": OperatorConfig("redact", {})}
)

print(f"text: {anonymized_results.text}")
print("detailed response:")

pprint(json.loads(anonymized_results.to_json()))
anonymizer = AnonymizerEngine()

anonymized_results = anonymizer.anonymize(
    text=text_to_anonymize,
    analyzer_results=analyzer_results,    
    operators={"DEFAULT": OperatorConfig("replace", {"new_value": ""}), 
                        "PHONE_NUMBER": OperatorConfig("mask", {"type": "mask", "masking_char" : "*", "chars_to_mask" : 12, "from_end" : True}),
                        "TITLE": OperatorConfig("redact", {})}
)

print(f"text: {anonymized_results.text}")
print("detailed response:")

pprint(json.loads(anonymized_results.to_json()))

In [ ]:

Path to notebook: https://www.github.com/microsoft/presidio/blob/main/docs/samples/python/presidio_notebook.ipynb¶

Analyze Text for PII Entities¶

Create Custom PII Entity Recognizers¶

Anonymize Text with Identified PII Entities¶

Path to notebook: https://www.github.com/microsoft/presidio/blob/main/docs/samples/python/presidio_notebook.ipynb ¶