Skip to content
# download presidio
!pip install presidio_analyzer presidio_anonymizer!python -m spacy download en_core_web_lg

Integrating external models/services with Presidio

Presidio analyzer is comprised of a set of PII recognizers which can run local or remotely. In this notebook we'll give an example of integrating an external service into Presidio-Analyzer.

Azure Text Analytics

Azure Text Analytics is a cloud-based service that provides advanced natural language processing over raw text. One of its main functions includes Named Entity Recognition (NER), which has the ability to identify different entities in text and categorize them into pre-defined classes or types.

Supported entity categories in the Text Analytics API

Text Analytics supports multiple PII entity categories. The Text Analytics service runs a predictive model to identify and categorize named entities from an input document. The service's latest version includes the ability to detect personal (PII) and health (PHI) information. A list of all supported entities can be found in the official documentation.

Prerequisites

To use Text Analytics with Preisido, an Azure Text Analytics resource should first be created under an Azure subscription. Follow the official documentation for instructions. The key and endpoint, generated once the resource is created, should replace the placeholders <your_text_analytics_key> and <your_text_analytics_endpoint> in this notebook, respectively.

Text Analytics Recognizer

In this example we will use the TextAnalyticsRecognizer sample implementation. This class extends Presidio's Remote Recognizer for calling the Text Analytics service REST API. For additional information of a remote recognizer, see the ExampleRemoteRecognizer sample.

from presidio_analyzer import AnalyzerEngine
from text_analytics.example_text_analytics_recognizer import TextAnalyticsEntityCategory, TextAnalyticsRecognizer
  1. Define which entities to get from Text Analytics
ta_entities = [
    TextAnalyticsEntityCategory(name="Person",
                                entity_type="NAME",
                                supported_languages=["en"]),
    TextAnalyticsEntityCategory(name="Age",
                                entity_type="AGE",
                                subcategory = "Age", 
                                supported_languages=["en"]),
    TextAnalyticsEntityCategory(name="InternationlBankingAccountNumber",
                                entity_type="IBAN",
                                supported_languages=["en"])]

For a full list of entities: https://docs.microsoft.com/en-us/azure/cognitive-services/text-analytics/named-entity-types?tabs=personal

  1. Instantiate the remote recognizer object (In this case TextAnalyticsRecognizer)
text_analytics_recognizer = TextAnalyticsRecognizer(
        text_analytics_key="<your_text_analytics_key>",
        text_analytics_endpoint="<your_text_analytics_endpoint>",
        text_analytics_categories = ta_entities)
  1. Add the new recognizer to the list of recognizers and run the PresidioAnalyzer
analyzer = AnalyzerEngine()
analyzer.registry.add_recognizer(text_analytics_recognizer)

results = analyzer.analyze(
    text="David is 30 years old. His IBAN: IL150120690000003111111", language="en"
)
print(results)