Example 2: Regular-expressions based PII recognition
Another simple recognizer we can add is based on regular expressions. Let's assume we want to be extremely conservative and treat any token which contains a number as PII.
from presidio_analyzer import Pattern, PatternRecognizer
# Define the regex pattern in a Presidio `Pattern` object:
numbers_pattern = Pattern(name="numbers_pattern", regex="\d+", score=0.5)
# Define the recognizer with one or more patterns
number_recognizer = PatternRecognizer(
supported_entity="NUMBER", patterns=[numbers_pattern]
)
Testing the recognizer itself:
text2 = "I live in 510 Broad st."
numbers_result = number_recognizer.analyze(text=text2, entities=["NUMBER"])
print("Result:")
print(numbers_result)
It's important to mention that recognizers are likely to have errors, both false-positive and false-negative, which would impact the entire performance of Presidio. Consider testing each recognizer on a representative dataset prior to integrating it into Presidio. For more info, see the best practices for developing recognizers documentation.