Example 10: Simple anonymization
Once we have the identified PII entities, we can perform different de-identification operations on them. For more information on the supported operators, see the anonymizer documentation.
The anonymizer requires a configuration specifying the requested operation on each entity type. There's also a default operator which replaces a PII entity with the entity type name.
Each operator has a unique configuration with the parameters needed to perform the operation (redact, hash, mask, replace, encrypt etc.)
Here's an simple example of using presidio-anonymizer:
from presidio_anonymizer import AnonymizerEngine
from presidio_anonymizer.entities import RecognizerResult
# Analyzer output
analyzer_results = [
RecognizerResult(entity_type="PERSON", start=11, end=15, score=0.8),
RecognizerResult(entity_type="PERSON", start=17, end=27, score=0.8),
]
# Initialize the engine:
engine = AnonymizerEngine()
# Invoke the anonymize function with the text,
# analyzer results (potentially coming from presidio-analyzer) and
# Operators to get the anonymization output:
result = engine.anonymize(
text="My name is Bond, James Bond", analyzer_results=analyzer_results
)
print("De-identified text")
print(result.text)
To introduce additional operators, we can pass an OperatorConfig
.
In this example we:
- Mask the last 12 chars of a
PHONE_NUMBER
entity and replace them with*
- Redact a
TITLE
entity - Replace all other entities with the string
<ANONYMIZED>
.
Defining the operators:
# Define anonymization operators
operators = {
"DEFAULT": OperatorConfig("replace", {"new_value": "<ANONYMIZED>"}),
"PHONE_NUMBER": OperatorConfig(
"mask",
{
"type": "mask",
"masking_char": "*",
"chars_to_mask": 12,
"from_end": True,
},
),
"TITLE": OperatorConfig("redact", {}),
}
Full example:
from pprint import pprint
import json
from presidio_anonymizer import AnonymizerEngine
from presidio_anonymizer.entities import OperatorConfig, RecognizerResult
# Analyzer output
analyzer_results = [
RecognizerResult(entity_type="PERSON", start=11, end=15, score=0.8),
RecognizerResult(entity_type="PERSON", start=17, end=27, score=0.8),
]
text_to_anonymize = "My name is Bond, James Bond"
anonymizer = AnonymizerEngine()
# Define anonymization operators
operators = {
"DEFAULT": OperatorConfig("replace", {"new_value": "<ANONYMIZED>"}),
"PHONE_NUMBER": OperatorConfig(
"mask",
{
"type": "mask",
"masking_char": "*",
"chars_to_mask": 12,
"from_end": True,
},
),
"TITLE": OperatorConfig("redact", {}),
}
anonymized_results = anonymizer.anonymize(
text=text_to_anonymize, analyzer_results=analyzer_results, operators=operators
)
print(f"text: {anonymized_results.text}")
print("detailed result:")
pprint(json.loads(anonymized_results.to_json()))