Skip to content

Example 9: Ad-hoc recognizers

In addition to recognizers in code or in a YAML file, it is possible to create ad-hoc recognizers via the Presidio Analyzer API for regex and deny-list based logic. These recognizers, in JSON form, are added to the /analyze request and are only used in the context of this request.

Note

These ad-hoc recognizers could be useful if Presidio is already deployed, but requires additional detection logic to be added.

  • The json structure for a regex ad-hoc recognizer is the following:

    {
        "text": "John Smith drivers license is AC432223. Zip code: 10023",
        "language": "en",
        "ad_hoc_recognizers":[
            {
            "name": "Zip code Recognizer",
            "supported_language": "en",
            "patterns": [
                {
                "name": "zip code (weak)", 
                "regex": "(\\b\\d{5}(?:\\-\\d{4})?\\b)", 
                "score": 0.01
                }
            ],
            "context": ["zip", "code"],
            "supported_entity":"ZIP"
            }
        ]
    }
    
  • The json structure for deny-list based recognizers is the following:

    {
        "text": "Mr. John Smith's drivers license is AC432223",
        "language": "en",
        "ad_hoc_recognizers":[
            {
            "name": "Mr. Recognizer",
            "supported_language": "en",
            "deny_list": ["Mr", "Mr.", "Mister"],
            "supported_entity":"MR_TITLE"
            },
            {
            "name": "Ms. Recognizer",
            "supported_language": "en",
            "deny_list": ["Ms", "Ms.", "Miss", "Mrs", "Mrs."],
            "supported_entity":"MS_TITLE"
            }
        ]
    }
    

In both examples, the /analyze request is extended with a list of ad_hoc_recognizers, which could be either patterns, deny_list or both.

Example call to the /analyze service:

{
  "text": "John Smith drivers license is AC432223 and the zip code is 12345",
  "language": "en",
  "return_decision_process": false,
  "correlation_id": "123e4567-e89b-12d3-a456-426614174000",
  "score_threshold": 0.6,
  "entities": [
    "US_DRIVER_LICENSE",
    "ZIP"
  ],
  "trace": false,
  "ad_hoc_recognizers": [
    {
      "name": "Zip code Recognizer",
      "supported_language": "en",
      "patterns": [
        {
          "name": "zip code (weak)",
          "regex": "(\\b\\d{5}(?:\\-\\d{4})?\\b)",
          "score": 0.01
        }
      ],
      "context": [
        "zip",
        "code"
      ],
      "supported_entity": "ZIP"
    }
  ]
}

For more examples of deny-list recognizers, see this sample.