Example 2: Regular-expressions based PII recognition
Another simple recognizer we can add is based on regular expressions. Let's assume we want to be extremely conservative and treat any token which contains a number as PII.
from presidio_analyzer import Pattern, PatternRecognizer # Define the regex pattern in a Presidio `Pattern` object: numbers_pattern = Pattern(name="numbers_pattern", regex="\d+", score=0.5) # Define the recognizer with one or more patterns number_recognizer = PatternRecognizer( supported_entity="NUMBER", patterns=[numbers_pattern] )
Testing the recognizer itself:
text2 = "I live in 510 Broad st." numbers_result = number_recognizer.analyze(text=text2, entities=["NUMBER"]) print("Result:") print(numbers_result)
It's important to mention that recognizers are likely to have errors, both false-positive and false-negative, which would impact the entire performance of Presidio. Consider testing each recognizer on a representative dataset prior to integrating it into Presidio. For more info, see the best practices for developing recognizers documentation.