Presidio Analyzer API Reference
Objects at the top of the presidio-analyzer package
presidio_analyzer.AnalyzerEngine
Entry point for Presidio Analyzer.
Orchestrating the detection of PII entities and all related logic.
PARAMETER | DESCRIPTION |
---|---|
registry
|
instance of type RecognizerRegistry
TYPE:
|
nlp_engine
|
instance of type NlpEngine (for example SpacyNlpEngine)
TYPE:
|
app_tracer
|
instance of type AppTracer, used to trace the logic used during each request for interpretability reasons.
TYPE:
|
log_decision_process
|
bool, defines whether the decision process within the analyzer should be logged or not.
TYPE:
|
default_score_threshold
|
Minimum confidence value for detected entities to be returned
TYPE:
|
supported_languages
|
List of possible languages this engine could be run on. Used for loading the right NLP models and recognizers for these languages.
TYPE:
|
context_aware_enhancer
|
instance of type ContextAwareEnhancer for enhancing confidence score based on context words, (LemmaContextAwareEnhancer will be created by default if None passed)
TYPE:
|
METHOD | DESCRIPTION |
---|---|
get_recognizers |
Return a list of PII recognizers currently loaded. |
get_supported_entities |
Return a list of the entities that can be detected. |
analyze |
Find PII entities in text using different PII recognizers for a given language. |
Source code in presidio_analyzer/analyzer_engine.py
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 |
|
get_recognizers
get_recognizers(language: Optional[str] = None) -> List[EntityRecognizer]
Return a list of PII recognizers currently loaded.
PARAMETER | DESCRIPTION |
---|---|
language
|
Return the recognizers supporting a given language.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[EntityRecognizer]
|
List of [Recognizer] as a RecognizersAllResponse |
Source code in presidio_analyzer/analyzer_engine.py
113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 |
|
get_supported_entities
get_supported_entities(language: Optional[str] = None) -> List[str]
Return a list of the entities that can be detected.
PARAMETER | DESCRIPTION |
---|---|
language
|
Return only entities supported in a specific language.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[str]
|
List of entity names |
Source code in presidio_analyzer/analyzer_engine.py
134 135 136 137 138 139 140 141 142 143 144 145 146 |
|
analyze
analyze(
text: str,
language: str,
entities: Optional[List[str]] = None,
correlation_id: Optional[str] = None,
score_threshold: Optional[float] = None,
return_decision_process: Optional[bool] = False,
ad_hoc_recognizers: Optional[List[EntityRecognizer]] = None,
context: Optional[List[str]] = None,
allow_list: Optional[List[str]] = None,
allow_list_match: Optional[str] = "exact",
regex_flags: Optional[int] = re.DOTALL | re.MULTILINE | re.IGNORECASE,
nlp_artifacts: Optional[NlpArtifacts] = None,
) -> List[RecognizerResult]
Find PII entities in text using different PII recognizers for a given language.
:Example:
from presidio_analyzer import AnalyzerEngine
# Set up the engine, loads the NLP module (spaCy model by default)
# and other PII recognizers
analyzer = AnalyzerEngine()
# Call analyzer to get results
results = analyzer.analyze(text='My phone number is 212-555-5555', entities=['PHONE_NUMBER'], language='en')
print(results)
PARAMETER | DESCRIPTION |
---|---|
text
|
the text to analyze
TYPE:
|
language
|
the language of the text
TYPE:
|
entities
|
List of PII entities that should be looked for in the text. If entities=None then all entities are looked for.
TYPE:
|
correlation_id
|
cross call ID for this request
TYPE:
|
score_threshold
|
A minimum value for which to return an identified entity
TYPE:
|
return_decision_process
|
Whether the analysis decision process steps returned in the response.
TYPE:
|
ad_hoc_recognizers
|
List of recognizers which will be used only for this specific request.
TYPE:
|
context
|
List of context words to enhance confidence score if matched with the recognized entity's recognizer context
TYPE:
|
allow_list
|
List of words that the user defines as being allowed to keep in the text
TYPE:
|
allow_list_match
|
How the allow_list should be interpreted; either as "exact" or as "regex". - If
TYPE:
|
regex_flags
|
regex flags to be used for when allow_list_match is "regex"
TYPE:
|
nlp_artifacts
|
precomputed NlpArtifacts
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
an array of the found entities in the text |
Source code in presidio_analyzer/analyzer_engine.py
148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 |
|
presidio_analyzer.analyzer_engine_provider.AnalyzerEngineProvider
Utility function for loading Presidio Analyzer.
Use this class to load presidio analyzer engine from a yaml file
PARAMETER | DESCRIPTION |
---|---|
analyzer_engine_conf_file
|
the path to the analyzer configuration file
TYPE:
|
nlp_engine_conf_file
|
the path to the nlp engine configuration file
TYPE:
|
recognizer_registry_conf_file
|
the path to the recognizer registry configuration file
TYPE:
|
METHOD | DESCRIPTION |
---|---|
get_configuration |
Retrieve the analyzer engine configuration from the provided file. |
create_engine |
Load Presidio Analyzer from yaml configuration file. |
Source code in presidio_analyzer/analyzer_engine_provider.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 |
|
get_configuration
get_configuration(
conf_file: Optional[Union[Path, str]],
) -> Union[Dict[str, Any]]
Retrieve the analyzer engine configuration from the provided file.
Source code in presidio_analyzer/analyzer_engine_provider.py
36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 |
|
create_engine
create_engine() -> AnalyzerEngine
Load Presidio Analyzer from yaml configuration file.
RETURNS | DESCRIPTION |
---|---|
AnalyzerEngine
|
analyzer engine initialized with yaml configuration |
Source code in presidio_analyzer/analyzer_engine_provider.py
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 |
|
presidio_analyzer.analysis_explanation.AnalysisExplanation
Hold tracing information to explain why PII entities were identified as such.
PARAMETER | DESCRIPTION |
---|---|
recognizer
|
name of recognizer that made the decision
TYPE:
|
original_score
|
recognizer's confidence in result
TYPE:
|
pattern_name
|
name of pattern (if decision was made by a PatternRecognizer)
TYPE:
|
pattern
|
regex pattern that was applied (if PatternRecognizer)
TYPE:
|
validation_result
|
result of a validation (e.g. checksum)
TYPE:
|
textual_explanation
|
Free text for describing a decision of a logic or model
TYPE:
|
METHOD | DESCRIPTION |
---|---|
set_improved_score |
Update the score and calculate the difference from the original score. |
set_supportive_context_word |
Set the context word which helped increase the score. |
append_textual_explanation_line |
Append a new line to textual_explanation field. |
to_dict |
Serialize self to dictionary. |
Source code in presidio_analyzer/analysis_explanation.py
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 |
|
set_improved_score
set_improved_score(score: float) -> None
Update the score and calculate the difference from the original score.
Source code in presidio_analyzer/analysis_explanation.py
43 44 45 46 |
|
set_supportive_context_word
set_supportive_context_word(word: str) -> None
Set the context word which helped increase the score.
Source code in presidio_analyzer/analysis_explanation.py
48 49 50 |
|
append_textual_explanation_line
append_textual_explanation_line(text: str) -> None
Append a new line to textual_explanation field.
Source code in presidio_analyzer/analysis_explanation.py
52 53 54 55 56 57 |
|
to_dict
to_dict() -> Dict
Serialize self to dictionary.
RETURNS | DESCRIPTION |
---|---|
Dict
|
a dictionary |
Source code in presidio_analyzer/analysis_explanation.py
59 60 61 62 63 64 65 |
|
presidio_analyzer.recognizer_result.RecognizerResult
Recognizer Result represents the findings of the detected entity.
Result of a recognizer analyzing the text.
PARAMETER | DESCRIPTION |
---|---|
entity_type
|
the type of the entity
TYPE:
|
start
|
the start location of the detected entity
TYPE:
|
end
|
the end location of the detected entity
TYPE:
|
score
|
the score of the detection
TYPE:
|
analysis_explanation
|
contains the explanation of why this entity was identified
TYPE:
|
recognition_metadata
|
a dictionary of metadata to be used in recognizer specific cases, for example specific recognized context words and recognizer name
TYPE:
|
METHOD | DESCRIPTION |
---|---|
append_analysis_explanation_text |
Add text to the analysis explanation. |
to_dict |
Serialize self to dictionary. |
from_json |
Create RecognizerResult from json. |
intersects |
Check if self intersects with a different RecognizerResult. |
contained_in |
Check if self is contained in a different RecognizerResult. |
contains |
Check if one result is contained or equal to another result. |
equal_indices |
Check if the indices are equal between two results. |
has_conflict |
Check if two recognizer results are conflicted or not. |
Source code in presidio_analyzer/recognizer_result.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 |
|
append_analysis_explanation_text
append_analysis_explanation_text(text: str) -> None
Add text to the analysis explanation.
Source code in presidio_analyzer/recognizer_result.py
57 58 59 60 |
|
to_dict
to_dict() -> Dict
Serialize self to dictionary.
RETURNS | DESCRIPTION |
---|---|
Dict
|
a dictionary |
Source code in presidio_analyzer/recognizer_result.py
62 63 64 65 66 67 68 |
|
from_json
classmethod
from_json(data: Dict) -> RecognizerResult
Create RecognizerResult from json.
PARAMETER | DESCRIPTION |
---|---|
data
|
e.g. { "start": 24, "end": 32, "score": 0.8, "entity_type": "NAME" }
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
RecognizerResult
|
RecognizerResult |
Source code in presidio_analyzer/recognizer_result.py
70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 |
|
intersects
intersects(other: RecognizerResult) -> int
Check if self intersects with a different RecognizerResult.
RETURNS | DESCRIPTION |
---|---|
int
|
If intersecting, returns the number of intersecting characters. If not, returns 0 |
Source code in presidio_analyzer/recognizer_result.py
93 94 95 96 97 98 99 100 101 102 103 104 105 106 |
|
contained_in
contained_in(other: RecognizerResult) -> bool
Check if self is contained in a different RecognizerResult.
RETURNS | DESCRIPTION |
---|---|
bool
|
true if contained |
Source code in presidio_analyzer/recognizer_result.py
108 109 110 111 112 113 114 |
|
contains
contains(other: RecognizerResult) -> bool
Check if one result is contained or equal to another result.
PARAMETER | DESCRIPTION |
---|---|
other
|
another RecognizerResult
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
bool
|
bool |
Source code in presidio_analyzer/recognizer_result.py
116 117 118 119 120 121 122 123 |
|
equal_indices
equal_indices(other: RecognizerResult) -> bool
Check if the indices are equal between two results.
PARAMETER | DESCRIPTION |
---|---|
other
|
another RecognizerResult
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
bool
|
|
Source code in presidio_analyzer/recognizer_result.py
125 126 127 128 129 130 131 132 |
|
has_conflict
has_conflict(other: RecognizerResult) -> bool
Check if two recognizer results are conflicted or not.
I have a conflict if: 1. My indices are the same as the other and my score is lower. 2. If my indices are contained in another.
PARAMETER | DESCRIPTION |
---|---|
other
|
RecognizerResult
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
bool
|
|
Source code in presidio_analyzer/recognizer_result.py
175 176 177 178 179 180 181 182 183 184 185 186 187 188 |
|
Batch modules
presidio_analyzer.batch_analyzer_engine.BatchAnalyzerEngine
Batch analysis of documents (tables, lists, dicts).
Wrapper class to run Presidio Analyzer Engine on multiple values, either lists/iterators of strings, or dictionaries.
PARAMETER | DESCRIPTION |
---|---|
analyzer_engine
|
AnalyzerEngine instance to use for handling the values in those collections.
TYPE:
|
METHOD | DESCRIPTION |
---|---|
analyze_iterator |
Analyze an iterable of strings. |
analyze_dict |
Analyze a dictionary of keys (strings) and values/iterable of values. |
Source code in presidio_analyzer/batch_analyzer_engine.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 |
|
analyze_iterator
analyze_iterator(
texts: Iterable[Union[str, bool, float, int]],
language: str,
batch_size: int = 1,
n_process: int = 1,
**kwargs
) -> List[List[RecognizerResult]]
Analyze an iterable of strings.
PARAMETER | DESCRIPTION |
---|---|
texts
|
An list containing strings to be analyzed.
TYPE:
|
language
|
Input language
TYPE:
|
batch_size
|
Batch size to process in a single iteration
TYPE:
|
n_process
|
Number of processors to use. Defaults to
TYPE:
|
kwargs
|
Additional parameters for the
DEFAULT:
|
Source code in presidio_analyzer/batch_analyzer_engine.py
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 |
|
analyze_dict
analyze_dict(
input_dict: Dict[str, Union[Any, Iterable[Any]]],
language: str,
keys_to_skip: Optional[List[str]] = None,
batch_size: int = 1,
n_process: int = 1,
**kwargs
) -> Iterator[DictAnalyzerResult]
Analyze a dictionary of keys (strings) and values/iterable of values.
Non-string values are returned as is.
PARAMETER | DESCRIPTION |
---|---|
input_dict
|
The input dictionary for analysis
TYPE:
|
language
|
Input language
TYPE:
|
keys_to_skip
|
Keys to ignore during analysis
TYPE:
|
batch_size
|
Batch size to process in a single iteration
TYPE:
|
n_process
|
Number of processors to use. Defaults to
TYPE:
|
kwargs
|
Additional keyword arguments for the
DEFAULT:
|
Source code in presidio_analyzer/batch_analyzer_engine.py
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 |
|
presidio_analyzer.dict_analyzer_result.DictAnalyzerResult
dataclass
Data class for holding the output of the Presidio Analyzer on dictionaries.
PARAMETER | DESCRIPTION |
---|---|
key
|
key in dictionary
TYPE:
|
value
|
value to run analysis on (either string or list of strings)
TYPE:
|
recognizer_results
|
Analyzer output for one value. Could be either: - A list of recognizer results if the input is one string - A list of lists of recognizer results, if the input is a list of strings. - An iterator of a DictAnalyzerResult, if the input is a dictionary. In this case the recognizer_results would be the iterator of the DictAnalyzerResults next level in the dictionary.
TYPE:
|
Source code in presidio_analyzer/dict_analyzer_result.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
|
Recognizers and patterns
presidio_analyzer.entity_recognizer.EntityRecognizer
A class representing an abstract PII entity recognizer.
EntityRecognizer is an abstract class to be inherited by Recognizers which hold the logic for recognizing specific PII entities.
EntityRecognizer exposes a method called enhance_using_context which can be overridden in case a custom context aware enhancement is needed in derived class of a recognizer.
PARAMETER | DESCRIPTION |
---|---|
supported_entities
|
the entities supported by this recognizer (for example, phone number, address, etc.)
TYPE:
|
supported_language
|
the language supported by this recognizer. The supported langauge code is iso6391Name
TYPE:
|
name
|
the name of this recognizer (optional)
TYPE:
|
version
|
the recognizer current version
TYPE:
|
context
|
a list of words which can help boost confidence score when they appear in context of the matched entity
TYPE:
|
METHOD | DESCRIPTION |
---|---|
load |
Initialize the recognizer assets if needed. |
analyze |
Analyze text to identify entities. |
enhance_using_context |
Enhance confidence score using context of the entity. |
get_supported_entities |
Return the list of entities this recognizer can identify. |
get_supported_language |
Return the language this recognizer can support. |
get_version |
Return the version of this recognizer. |
to_dict |
Serialize self to dictionary. |
from_dict |
Create EntityRecognizer from a dict input. |
remove_duplicates |
Remove duplicate results. |
sanitize_value |
Cleanse the input string of the replacement pairs specified as argument. |
Source code in presidio_analyzer/entity_recognizer.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 |
|
id
property
id
Return a unique identifier of this recognizer.
load
abstractmethod
load() -> None
Initialize the recognizer assets if needed.
(e.g. machine learning models)
Source code in presidio_analyzer/entity_recognizer.py
67 68 69 70 71 72 73 |
|
analyze
abstractmethod
analyze(
text: str, entities: List[str], nlp_artifacts: NlpArtifacts
) -> List[RecognizerResult]
Analyze text to identify entities.
PARAMETER | DESCRIPTION |
---|---|
text
|
The text to be analyzed
TYPE:
|
entities
|
The list of entities this recognizer is able to detect
TYPE:
|
nlp_artifacts
|
A group of attributes which are the result of an NLP process over the input text.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
List of results detected by this recognizer. |
Source code in presidio_analyzer/entity_recognizer.py
75 76 77 78 79 80 81 82 83 84 85 86 87 88 |
|
enhance_using_context
enhance_using_context(
text: str,
raw_recognizer_results: List[RecognizerResult],
other_raw_recognizer_results: List[RecognizerResult],
nlp_artifacts: NlpArtifacts,
context: Optional[List[str]] = None,
) -> List[RecognizerResult]
Enhance confidence score using context of the entity.
Override this method in derived class in case a custom logic is needed, otherwise return value will be equal to raw_results.
in case a result score is boosted, derived class need to update result.recognition_metadata[RecognizerResult.IS_SCORE_ENHANCED_BY_CONTEXT_KEY]
PARAMETER | DESCRIPTION |
---|---|
text
|
The actual text that was analyzed
TYPE:
|
raw_recognizer_results
|
This recognizer's results, to be updated based on recognizer specific context.
TYPE:
|
other_raw_recognizer_results
|
Other recognizer results matched in the given text to allow related entity context enhancement
TYPE:
|
nlp_artifacts
|
The nlp artifacts contains elements such as lemmatized tokens for better accuracy of the context enhancement process
TYPE:
|
context
|
list of context words
TYPE:
|
Source code in presidio_analyzer/entity_recognizer.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
|
get_supported_entities
get_supported_entities() -> List[str]
Return the list of entities this recognizer can identify.
RETURNS | DESCRIPTION |
---|---|
List[str]
|
A list of the supported entities by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
119 120 121 122 123 124 125 |
|
get_supported_language
get_supported_language() -> str
Return the language this recognizer can support.
RETURNS | DESCRIPTION |
---|---|
str
|
A list of the supported language by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
127 128 129 130 131 132 133 |
|
get_version
get_version() -> str
Return the version of this recognizer.
RETURNS | DESCRIPTION |
---|---|
str
|
The current version of this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
135 136 137 138 139 140 141 |
|
to_dict
to_dict() -> Dict
Serialize self to dictionary.
RETURNS | DESCRIPTION |
---|---|
Dict
|
a dictionary |
Source code in presidio_analyzer/entity_recognizer.py
143 144 145 146 147 148 149 150 151 152 153 154 155 |
|
from_dict
classmethod
from_dict(entity_recognizer_dict: Dict) -> EntityRecognizer
Create EntityRecognizer from a dict input.
PARAMETER | DESCRIPTION |
---|---|
entity_recognizer_dict
|
Dict containing keys and values for instantiation
TYPE:
|
Source code in presidio_analyzer/entity_recognizer.py
157 158 159 160 161 162 163 164 |
|
remove_duplicates
staticmethod
remove_duplicates(results: List[RecognizerResult]) -> List[RecognizerResult]
Remove duplicate results.
Remove duplicates in case the two results have identical start and ends and types.
PARAMETER | DESCRIPTION |
---|---|
results
|
List[RecognizerResult]
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
List[RecognizerResult] |
Source code in presidio_analyzer/entity_recognizer.py
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
|
sanitize_value
staticmethod
sanitize_value(text: str, replacement_pairs: List[Tuple[str, str]]) -> str
Cleanse the input string of the replacement pairs specified as argument.
PARAMETER | DESCRIPTION |
---|---|
text
|
input string
TYPE:
|
replacement_pairs
|
pairs of what has to be replaced with which value
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
cleansed string |
Source code in presidio_analyzer/entity_recognizer.py
200 201 202 203 204 205 206 207 208 209 210 211 |
|
presidio_analyzer.local_recognizer.LocalRecognizer
Bases: ABC
, EntityRecognizer
PII entity recognizer which runs on the same process as the AnalyzerEngine.
METHOD | DESCRIPTION |
---|---|
load |
Initialize the recognizer assets if needed. |
analyze |
Analyze text to identify entities. |
enhance_using_context |
Enhance confidence score using context of the entity. |
get_supported_entities |
Return the list of entities this recognizer can identify. |
get_supported_language |
Return the language this recognizer can support. |
get_version |
Return the version of this recognizer. |
to_dict |
Serialize self to dictionary. |
from_dict |
Create EntityRecognizer from a dict input. |
remove_duplicates |
Remove duplicate results. |
sanitize_value |
Cleanse the input string of the replacement pairs specified as argument. |
Source code in presidio_analyzer/local_recognizer.py
6 7 |
|
id
property
id
Return a unique identifier of this recognizer.
load
abstractmethod
load() -> None
Initialize the recognizer assets if needed.
(e.g. machine learning models)
Source code in presidio_analyzer/entity_recognizer.py
67 68 69 70 71 72 73 |
|
analyze
abstractmethod
analyze(
text: str, entities: List[str], nlp_artifacts: NlpArtifacts
) -> List[RecognizerResult]
Analyze text to identify entities.
PARAMETER | DESCRIPTION |
---|---|
text
|
The text to be analyzed
TYPE:
|
entities
|
The list of entities this recognizer is able to detect
TYPE:
|
nlp_artifacts
|
A group of attributes which are the result of an NLP process over the input text.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
List of results detected by this recognizer. |
Source code in presidio_analyzer/entity_recognizer.py
75 76 77 78 79 80 81 82 83 84 85 86 87 88 |
|
enhance_using_context
enhance_using_context(
text: str,
raw_recognizer_results: List[RecognizerResult],
other_raw_recognizer_results: List[RecognizerResult],
nlp_artifacts: NlpArtifacts,
context: Optional[List[str]] = None,
) -> List[RecognizerResult]
Enhance confidence score using context of the entity.
Override this method in derived class in case a custom logic is needed, otherwise return value will be equal to raw_results.
in case a result score is boosted, derived class need to update result.recognition_metadata[RecognizerResult.IS_SCORE_ENHANCED_BY_CONTEXT_KEY]
PARAMETER | DESCRIPTION |
---|---|
text
|
The actual text that was analyzed
TYPE:
|
raw_recognizer_results
|
This recognizer's results, to be updated based on recognizer specific context.
TYPE:
|
other_raw_recognizer_results
|
Other recognizer results matched in the given text to allow related entity context enhancement
TYPE:
|
nlp_artifacts
|
The nlp artifacts contains elements such as lemmatized tokens for better accuracy of the context enhancement process
TYPE:
|
context
|
list of context words
TYPE:
|
Source code in presidio_analyzer/entity_recognizer.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
|
get_supported_entities
get_supported_entities() -> List[str]
Return the list of entities this recognizer can identify.
RETURNS | DESCRIPTION |
---|---|
List[str]
|
A list of the supported entities by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
119 120 121 122 123 124 125 |
|
get_supported_language
get_supported_language() -> str
Return the language this recognizer can support.
RETURNS | DESCRIPTION |
---|---|
str
|
A list of the supported language by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
127 128 129 130 131 132 133 |
|
get_version
get_version() -> str
Return the version of this recognizer.
RETURNS | DESCRIPTION |
---|---|
str
|
The current version of this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
135 136 137 138 139 140 141 |
|
to_dict
to_dict() -> Dict
Serialize self to dictionary.
RETURNS | DESCRIPTION |
---|---|
Dict
|
a dictionary |
Source code in presidio_analyzer/entity_recognizer.py
143 144 145 146 147 148 149 150 151 152 153 154 155 |
|
from_dict
classmethod
from_dict(entity_recognizer_dict: Dict) -> EntityRecognizer
Create EntityRecognizer from a dict input.
PARAMETER | DESCRIPTION |
---|---|
entity_recognizer_dict
|
Dict containing keys and values for instantiation
TYPE:
|
Source code in presidio_analyzer/entity_recognizer.py
157 158 159 160 161 162 163 164 |
|
remove_duplicates
staticmethod
remove_duplicates(results: List[RecognizerResult]) -> List[RecognizerResult]
Remove duplicate results.
Remove duplicates in case the two results have identical start and ends and types.
PARAMETER | DESCRIPTION |
---|---|
results
|
List[RecognizerResult]
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
List[RecognizerResult] |
Source code in presidio_analyzer/entity_recognizer.py
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
|
sanitize_value
staticmethod
sanitize_value(text: str, replacement_pairs: List[Tuple[str, str]]) -> str
Cleanse the input string of the replacement pairs specified as argument.
PARAMETER | DESCRIPTION |
---|---|
text
|
input string
TYPE:
|
replacement_pairs
|
pairs of what has to be replaced with which value
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
cleansed string |
Source code in presidio_analyzer/entity_recognizer.py
200 201 202 203 204 205 206 207 208 209 210 211 |
|
presidio_analyzer.pattern.Pattern
A class that represents a regex pattern.
PARAMETER | DESCRIPTION |
---|---|
name
|
the name of the pattern
TYPE:
|
regex
|
the regex pattern to detect
TYPE:
|
score
|
the pattern's strength (values varies 0-1)
TYPE:
|
Source code in presidio_analyzer/pattern.py
5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
|
to_dict
to_dict() -> Dict
Turn this instance into a dictionary.
RETURNS | DESCRIPTION |
---|---|
Dict
|
a dictionary |
Source code in presidio_analyzer/pattern.py
21 22 23 24 25 26 27 28 |
|
from_dict
classmethod
from_dict(pattern_dict: Dict) -> Pattern
Load an instance from a dictionary.
PARAMETER | DESCRIPTION |
---|---|
pattern_dict
|
a dictionary holding the pattern's parameters
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Pattern
|
a Pattern instance |
Source code in presidio_analyzer/pattern.py
30 31 32 33 34 35 36 37 38 |
|
presidio_analyzer.pattern_recognizer.PatternRecognizer
Bases: LocalRecognizer
PII entity recognizer using regular expressions or deny-lists.
PARAMETER | DESCRIPTION |
---|---|
patterns
|
A list of patterns to detect
TYPE:
|
deny_list
|
A list of words to detect, in case our recognizer uses a predefined list of words (deny list)
TYPE:
|
context
|
list of context words
TYPE:
|
deny_list_score
|
confidence score for a term identified using a deny-list
TYPE:
|
global_regex_flags
|
regex flags to be used in regex matching, including deny-lists.
TYPE:
|
METHOD | DESCRIPTION |
---|---|
enhance_using_context |
Enhance confidence score using context of the entity. |
get_supported_entities |
Return the list of entities this recognizer can identify. |
get_supported_language |
Return the language this recognizer can support. |
get_version |
Return the version of this recognizer. |
remove_duplicates |
Remove duplicate results. |
sanitize_value |
Cleanse the input string of the replacement pairs specified as argument. |
analyze |
Analyzes text to detect PII using regular expressions or deny-lists. |
validate_result |
Validate the pattern logic e.g., by running checksum on a detected pattern. |
invalidate_result |
Logic to check for result invalidation by running pruning logic. |
build_regex_explanation |
Construct an explanation for why this entity was detected. |
to_dict |
Serialize instance into a dictionary. |
from_dict |
Create instance from a serialized dict. |
Source code in presidio_analyzer/pattern_recognizer.py
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 |
|
id
property
id
Return a unique identifier of this recognizer.
enhance_using_context
enhance_using_context(
text: str,
raw_recognizer_results: List[RecognizerResult],
other_raw_recognizer_results: List[RecognizerResult],
nlp_artifacts: NlpArtifacts,
context: Optional[List[str]] = None,
) -> List[RecognizerResult]
Enhance confidence score using context of the entity.
Override this method in derived class in case a custom logic is needed, otherwise return value will be equal to raw_results.
in case a result score is boosted, derived class need to update result.recognition_metadata[RecognizerResult.IS_SCORE_ENHANCED_BY_CONTEXT_KEY]
PARAMETER | DESCRIPTION |
---|---|
text
|
The actual text that was analyzed
TYPE:
|
raw_recognizer_results
|
This recognizer's results, to be updated based on recognizer specific context.
TYPE:
|
other_raw_recognizer_results
|
Other recognizer results matched in the given text to allow related entity context enhancement
TYPE:
|
nlp_artifacts
|
The nlp artifacts contains elements such as lemmatized tokens for better accuracy of the context enhancement process
TYPE:
|
context
|
list of context words
TYPE:
|
Source code in presidio_analyzer/entity_recognizer.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
|
get_supported_entities
get_supported_entities() -> List[str]
Return the list of entities this recognizer can identify.
RETURNS | DESCRIPTION |
---|---|
List[str]
|
A list of the supported entities by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
119 120 121 122 123 124 125 |
|
get_supported_language
get_supported_language() -> str
Return the language this recognizer can support.
RETURNS | DESCRIPTION |
---|---|
str
|
A list of the supported language by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
127 128 129 130 131 132 133 |
|
get_version
get_version() -> str
Return the version of this recognizer.
RETURNS | DESCRIPTION |
---|---|
str
|
The current version of this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
135 136 137 138 139 140 141 |
|
remove_duplicates
staticmethod
remove_duplicates(results: List[RecognizerResult]) -> List[RecognizerResult]
Remove duplicate results.
Remove duplicates in case the two results have identical start and ends and types.
PARAMETER | DESCRIPTION |
---|---|
results
|
List[RecognizerResult]
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
List[RecognizerResult] |
Source code in presidio_analyzer/entity_recognizer.py
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
|
sanitize_value
staticmethod
sanitize_value(text: str, replacement_pairs: List[Tuple[str, str]]) -> str
Cleanse the input string of the replacement pairs specified as argument.
PARAMETER | DESCRIPTION |
---|---|
text
|
input string
TYPE:
|
replacement_pairs
|
pairs of what has to be replaced with which value
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
cleansed string |
Source code in presidio_analyzer/entity_recognizer.py
200 201 202 203 204 205 206 207 208 209 210 211 |
|
analyze
analyze(
text: str,
entities: List[str],
nlp_artifacts: Optional[NlpArtifacts] = None,
regex_flags: Optional[int] = None,
) -> List[RecognizerResult]
Analyzes text to detect PII using regular expressions or deny-lists.
PARAMETER | DESCRIPTION |
---|---|
text
|
Text to be analyzed
TYPE:
|
entities
|
Entities this recognizer can detect
TYPE:
|
nlp_artifacts
|
Output values from the NLP engine
TYPE:
|
regex_flags
|
regex flags to be used in regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
|
Source code in presidio_analyzer/pattern_recognizer.py
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
|
validate_result
validate_result(pattern_text: str) -> Optional[bool]
Validate the pattern logic e.g., by running checksum on a detected pattern.
PARAMETER | DESCRIPTION |
---|---|
pattern_text
|
the text to validated. Only the part in text that was detected by the regex engine
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Optional[bool]
|
A bool indicating whether the validation was successful. |
Source code in presidio_analyzer/pattern_recognizer.py
117 118 119 120 121 122 123 124 125 |
|
invalidate_result
invalidate_result(pattern_text: str) -> Optional[bool]
Logic to check for result invalidation by running pruning logic.
For example, each SSN number group should not consist of all the same digits.
PARAMETER | DESCRIPTION |
---|---|
pattern_text
|
the text to validated. Only the part in text that was detected by the regex engine
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Optional[bool]
|
A bool indicating whether the result is invalidated |
Source code in presidio_analyzer/pattern_recognizer.py
127 128 129 130 131 132 133 134 135 136 137 |
|
build_regex_explanation
staticmethod
build_regex_explanation(
recognizer_name: str,
pattern_name: str,
pattern: str,
original_score: float,
validation_result: bool,
regex_flags: int,
) -> AnalysisExplanation
Construct an explanation for why this entity was detected.
PARAMETER | DESCRIPTION |
---|---|
recognizer_name
|
Name of recognizer detecting the entity
TYPE:
|
pattern_name
|
Regex pattern name which detected the entity
TYPE:
|
pattern
|
Regex pattern logic
TYPE:
|
original_score
|
Score given by the recognizer
TYPE:
|
validation_result
|
Whether validation was used and its result
TYPE:
|
regex_flags
|
Regex flags used in the regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
AnalysisExplanation
|
Analysis explanation |
Source code in presidio_analyzer/pattern_recognizer.py
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
|
to_dict
to_dict() -> Dict
Serialize instance into a dictionary.
Source code in presidio_analyzer/pattern_recognizer.py
254 255 256 257 258 259 260 261 262 263 264 |
|
from_dict
classmethod
from_dict(entity_recognizer_dict: Dict) -> PatternRecognizer
Create instance from a serialized dict.
Source code in presidio_analyzer/pattern_recognizer.py
266 267 268 269 270 271 272 273 274 |
|
presidio_analyzer.remote_recognizer.RemoteRecognizer
Bases: ABC
, EntityRecognizer
A configuration for a recognizer that runs on a different process / remote machine.
PARAMETER | DESCRIPTION |
---|---|
supported_entities
|
A list of entities this recognizer can identify
TYPE:
|
name
|
name of recognizer
TYPE:
|
supported_language
|
The language this recognizer can detect entities in
TYPE:
|
version
|
Version of this recognizer
TYPE:
|
METHOD | DESCRIPTION |
---|---|
enhance_using_context |
Enhance confidence score using context of the entity. |
get_supported_language |
Return the language this recognizer can support. |
get_version |
Return the version of this recognizer. |
to_dict |
Serialize self to dictionary. |
from_dict |
Create EntityRecognizer from a dict input. |
remove_duplicates |
Remove duplicate results. |
sanitize_value |
Cleanse the input string of the replacement pairs specified as argument. |
analyze |
Call an external service for PII detection. |
Source code in presidio_analyzer/remote_recognizer.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 |
|
id
property
id
Return a unique identifier of this recognizer.
enhance_using_context
enhance_using_context(
text: str,
raw_recognizer_results: List[RecognizerResult],
other_raw_recognizer_results: List[RecognizerResult],
nlp_artifacts: NlpArtifacts,
context: Optional[List[str]] = None,
) -> List[RecognizerResult]
Enhance confidence score using context of the entity.
Override this method in derived class in case a custom logic is needed, otherwise return value will be equal to raw_results.
in case a result score is boosted, derived class need to update result.recognition_metadata[RecognizerResult.IS_SCORE_ENHANCED_BY_CONTEXT_KEY]
PARAMETER | DESCRIPTION |
---|---|
text
|
The actual text that was analyzed
TYPE:
|
raw_recognizer_results
|
This recognizer's results, to be updated based on recognizer specific context.
TYPE:
|
other_raw_recognizer_results
|
Other recognizer results matched in the given text to allow related entity context enhancement
TYPE:
|
nlp_artifacts
|
The nlp artifacts contains elements such as lemmatized tokens for better accuracy of the context enhancement process
TYPE:
|
context
|
list of context words
TYPE:
|
Source code in presidio_analyzer/entity_recognizer.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
|
get_supported_language
get_supported_language() -> str
Return the language this recognizer can support.
RETURNS | DESCRIPTION |
---|---|
str
|
A list of the supported language by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
127 128 129 130 131 132 133 |
|
get_version
get_version() -> str
Return the version of this recognizer.
RETURNS | DESCRIPTION |
---|---|
str
|
The current version of this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
135 136 137 138 139 140 141 |
|
to_dict
to_dict() -> Dict
Serialize self to dictionary.
RETURNS | DESCRIPTION |
---|---|
Dict
|
a dictionary |
Source code in presidio_analyzer/entity_recognizer.py
143 144 145 146 147 148 149 150 151 152 153 154 155 |
|
from_dict
classmethod
from_dict(entity_recognizer_dict: Dict) -> EntityRecognizer
Create EntityRecognizer from a dict input.
PARAMETER | DESCRIPTION |
---|---|
entity_recognizer_dict
|
Dict containing keys and values for instantiation
TYPE:
|
Source code in presidio_analyzer/entity_recognizer.py
157 158 159 160 161 162 163 164 |
|
remove_duplicates
staticmethod
remove_duplicates(results: List[RecognizerResult]) -> List[RecognizerResult]
Remove duplicate results.
Remove duplicates in case the two results have identical start and ends and types.
PARAMETER | DESCRIPTION |
---|---|
results
|
List[RecognizerResult]
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
List[RecognizerResult] |
Source code in presidio_analyzer/entity_recognizer.py
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
|
sanitize_value
staticmethod
sanitize_value(text: str, replacement_pairs: List[Tuple[str, str]]) -> str
Cleanse the input string of the replacement pairs specified as argument.
PARAMETER | DESCRIPTION |
---|---|
text
|
input string
TYPE:
|
replacement_pairs
|
pairs of what has to be replaced with which value
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
cleansed string |
Source code in presidio_analyzer/entity_recognizer.py
200 201 202 203 204 205 206 207 208 209 210 211 |
|
analyze
abstractmethod
analyze(text: str, entities: List[str], nlp_artifacts: NlpArtifacts)
Call an external service for PII detection.
PARAMETER | DESCRIPTION |
---|---|
text
|
text to be analyzed
TYPE:
|
entities
|
Entities that should be looked for
TYPE:
|
nlp_artifacts
|
Additional metadata from the NLP engine
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List of identified PII entities |
Source code in presidio_analyzer/remote_recognizer.py
37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
|
Recognizer registry modules
presidio_analyzer.recognizer_registry.RecognizerRegistry
Detect, register and hold all recognizers to be used by the analyzer.
PARAMETER | DESCRIPTION |
---|---|
recognizers
|
An optional list of recognizers, that will be available instead of the predefined recognizers
TYPE:
|
global_regex_flags
|
regex flags to be used in regex matching, including deny-lists
TYPE:
|
supported_languages
|
List of languages supported by this registry.
TYPE:
|
METHOD | DESCRIPTION |
---|---|
add_nlp_recognizer |
Adding NLP recognizer in accordance with the nlp engine. |
load_predefined_recognizers |
Load the existing recognizers into memory. |
get_recognizers |
Return a list of recognizers which supports the specified name and language. |
add_recognizer |
Add a new recognizer to the list of recognizers. |
remove_recognizer |
Remove a recognizer based on its name. |
add_pattern_recognizer_from_dict |
Load a pattern recognizer from a Dict into the recognizer registry. |
add_recognizers_from_yaml |
Read YAML file and load recognizers into the recognizer registry. |
get_supported_entities |
Return the supported entities by the set of recognizers loaded. |
Source code in presidio_analyzer/recognizer_registry/recognizer_registry.py
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 |
|
add_nlp_recognizer
add_nlp_recognizer(nlp_engine: NlpEngine) -> None
Adding NLP recognizer in accordance with the nlp engine.
PARAMETER | DESCRIPTION |
---|---|
nlp_engine
|
The NLP engine.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
None
|
None |
Source code in presidio_analyzer/recognizer_registry/recognizer_registry.py
71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 |
|
load_predefined_recognizers
load_predefined_recognizers(
languages: Optional[List[str]] = None, nlp_engine: NlpEngine = None
) -> None
Load the existing recognizers into memory.
PARAMETER | DESCRIPTION |
---|---|
languages
|
List of languages for which to load recognizers
TYPE:
|
nlp_engine
|
The NLP engine to use.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
None
|
None |
Source code in presidio_analyzer/recognizer_registry/recognizer_registry.py
93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 |
|
get_recognizers
get_recognizers(
language: str,
entities: Optional[List[str]] = None,
all_fields: bool = False,
ad_hoc_recognizers: Optional[List[EntityRecognizer]] = None,
) -> List[EntityRecognizer]
Return a list of recognizers which supports the specified name and language.
PARAMETER | DESCRIPTION |
---|---|
entities
|
the requested entities
TYPE:
|
language
|
the requested language
TYPE:
|
all_fields
|
a flag to return all fields of a requested language.
TYPE:
|
ad_hoc_recognizers
|
Additional recognizers provided by the user as part of the request
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[EntityRecognizer]
|
A list of the recognizers which supports the supplied entities and language |
Source code in presidio_analyzer/recognizer_registry/recognizer_registry.py
136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 |
|
add_recognizer
add_recognizer(recognizer: EntityRecognizer) -> None
Add a new recognizer to the list of recognizers.
PARAMETER | DESCRIPTION |
---|---|
recognizer
|
Recognizer to add
TYPE:
|
Source code in presidio_analyzer/recognizer_registry/recognizer_registry.py
201 202 203 204 205 206 207 208 209 210 |
|
remove_recognizer
remove_recognizer(recognizer_name: str, language: Optional[str] = None) -> None
Remove a recognizer based on its name.
PARAMETER | DESCRIPTION |
---|---|
recognizer_name
|
Name of recognizer to remove
TYPE:
|
language
|
The supported language of the recognizer to be removed, in case multiple recognizers with the same name are present, and only one should be removed.
TYPE:
|
Source code in presidio_analyzer/recognizer_registry/recognizer_registry.py
212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 |
|
add_pattern_recognizer_from_dict
add_pattern_recognizer_from_dict(recognizer_dict: Dict) -> None
Load a pattern recognizer from a Dict into the recognizer registry.
:example:
registry = RecognizerRegistry() recognizer = { "name": "Titles Recognizer", "supported_language": "de","supported_entity": "TITLE", "deny_list": ["Mr.","Mrs."]} registry.add_pattern_recognizer_from_dict(recognizer)
PARAMETER | DESCRIPTION |
---|---|
recognizer_dict
|
Dict holding a serialization of an PatternRecognizer
TYPE:
|
Source code in presidio_analyzer/recognizer_registry/recognizer_registry.py
251 252 253 254 255 256 257 258 259 260 261 262 263 264 |
|
add_recognizers_from_yaml
add_recognizers_from_yaml(yml_path: Union[str, Path]) -> None
Read YAML file and load recognizers into the recognizer registry.
See example yaml file here: https://github.com/microsoft/presidio/blob/main/presidio-analyzer/presidio_analyzer/conf/example_recognizers.yaml
:example:
yaml_file = "recognizers.yaml" registry = RecognizerRegistry() registry.add_recognizers_from_yaml(yaml_file)
Source code in presidio_analyzer/recognizer_registry/recognizer_registry.py
266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 |
|
get_supported_entities
get_supported_entities(languages: Optional[List[str]] = None) -> List[str]
Return the supported entities by the set of recognizers loaded.
PARAMETER | DESCRIPTION |
---|---|
languages
|
The languages to get the supported entities for. If languages=None, returns all entities for all languages.
TYPE:
|
Source code in presidio_analyzer/recognizer_registry/recognizer_registry.py
318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 |
|
presidio_analyzer.recognizer_registry.RecognizerRegistryProvider
Utility class for loading Recognizer Registry.
Use this class to load recognizer registry from a yaml file
:example: { "supported_languages": ["de", "es"], "recognizers": [ { "name": "Zip code Recognizer", "supported_language": "en", "patterns": [ { "name": "zip code (weak)", "regex": "(\b\d{5}(?:\-\d{4})?\b)", "score": 0.01, } ], "context": ["zip", "code"], "supported_entity": "ZIP", } ] }
PARAMETER | DESCRIPTION |
---|---|
conf_file
|
Path to yaml file containing registry configuration
TYPE:
|
registry_configuration
|
Dict containing registry configuration
TYPE:
|
METHOD | DESCRIPTION |
---|---|
create_recognizer_registry |
Create a recognizer registry according to configuration loaded previously. |
Source code in presidio_analyzer/recognizer_registry/recognizer_registry_provider.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 |
|
create_recognizer_registry
create_recognizer_registry() -> RecognizerRegistry
Create a recognizer registry according to configuration loaded previously.
Source code in presidio_analyzer/recognizer_registry/recognizer_registry_provider.py
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 |
|
Context awareness modules
presidio_analyzer.context_aware_enhancers
Context awareness modules.
ContextAwareEnhancer
A class representing an abstract context aware enhancer.
Context words might enhance confidence score of a recognized entity, ContextAwareEnhancer is an abstract class to be inherited by a context aware enhancer logic.
PARAMETER | DESCRIPTION |
---|---|
context_similarity_factor
|
How much to enhance confidence of match entity
TYPE:
|
min_score_with_context_similarity
|
Minimum confidence score
TYPE:
|
context_prefix_count
|
how many words before the entity to match context
TYPE:
|
context_suffix_count
|
how many words after the entity to match context
TYPE:
|
METHOD | DESCRIPTION |
---|---|
enhance_using_context |
Update results in case surrounding words are relevant to the context words. |
Source code in presidio_analyzer/context_aware_enhancers/context_aware_enhancer.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 |
|
enhance_using_context
abstractmethod
enhance_using_context(
text: str,
raw_results: List[RecognizerResult],
nlp_artifacts: NlpArtifacts,
recognizers: List[EntityRecognizer],
context: Optional[List[str]] = None,
) -> List[RecognizerResult]
Update results in case surrounding words are relevant to the context words.
Using the surrounding words of the actual word matches, look for specific strings that if found contribute to the score of the result, improving the confidence that the match is indeed of that PII entity type
PARAMETER | DESCRIPTION |
---|---|
text
|
The actual text that was analyzed
TYPE:
|
raw_results
|
Recognizer results which didn't take context into consideration
TYPE:
|
nlp_artifacts
|
The nlp artifacts contains elements such as lemmatized tokens for better accuracy of the context enhancement process
TYPE:
|
recognizers
|
the list of recognizers
TYPE:
|
context
|
list of context words
TYPE:
|
Source code in presidio_analyzer/context_aware_enhancers/context_aware_enhancer.py
40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 |
|
LemmaContextAwareEnhancer
Bases: ContextAwareEnhancer
A class representing a lemma based context aware enhancer logic.
Context words might enhance confidence score of a recognized entity, LemmaContextAwareEnhancer is an implementation of Lemma based context aware logic, it compares spacy lemmas of each word in context of the matched entity to given context and the recognizer context words, if matched it enhance the recognized entity confidence score by a given factor.
PARAMETER | DESCRIPTION |
---|---|
context_similarity_factor
|
How much to enhance confidence of match entity
TYPE:
|
min_score_with_context_similarity
|
Minimum confidence score
TYPE:
|
context_prefix_count
|
how many words before the entity to match context
TYPE:
|
context_suffix_count
|
how many words after the entity to match context
TYPE:
|
METHOD | DESCRIPTION |
---|---|
enhance_using_context |
Update results in case the lemmas of surrounding words or input context |
Source code in presidio_analyzer/context_aware_enhancers/lemma_context_aware_enhancer.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 |
|
enhance_using_context
enhance_using_context(
text: str,
raw_results: List[RecognizerResult],
nlp_artifacts: NlpArtifacts,
recognizers: List[EntityRecognizer],
context: Optional[List[str]] = None,
) -> List[RecognizerResult]
Update results in case the lemmas of surrounding words or input context words are identical to the context words.
Using the surrounding words of the actual word matches, look for specific strings that if found contribute to the score of the result, improving the confidence that the match is indeed of that PII entity type
PARAMETER | DESCRIPTION |
---|---|
text
|
The actual text that was analyzed
TYPE:
|
raw_results
|
Recognizer results which didn't take context into consideration
TYPE:
|
nlp_artifacts
|
The nlp artifacts contains elements such as lemmatized tokens for better accuracy of the context enhancement process
TYPE:
|
recognizers
|
the list of recognizers
TYPE:
|
context
|
list of context words
TYPE:
|
Source code in presidio_analyzer/context_aware_enhancers/lemma_context_aware_enhancer.py
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 |
|
NLP Engine modules
presidio_analyzer.nlp_engine
NLP engine package. Performs text pre-processing.
NerModelConfiguration
dataclass
NER model configuration.
PARAMETER | DESCRIPTION |
---|---|
labels_to_ignore
|
List of labels to not return predictions for.
TYPE:
|
aggregation_strategy
|
See https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.TokenClassificationPipeline.aggregation_strategy
TYPE:
|
stride
|
See https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.TokenClassificationPipeline.stride
TYPE:
|
alignment_mode
|
See https://spacy.io/api/doc#char_span
TYPE:
|
default_score
|
Default confidence score if the model does not provide one.
TYPE:
|
model_to_presidio_entity_mapping
|
Mapping between the NER model entities and Presidio entities.
TYPE:
|
low_score_entity_names
|
Set of entity names that are likely to have low detection accuracy that should be adjusted.
TYPE:
|
low_confidence_score_multiplier
|
A multiplier for the score given for low_score_entity_names. Multiplier to the score given for low_score_entity_names.
TYPE:
|
Source code in presidio_analyzer/nlp_engine/ner_model_configuration.py
47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 |
|
from_dict
classmethod
from_dict(nlp_engine_configuration: Dict) -> NerModelConfiguration
Load NLP engine configuration from dict.
PARAMETER | DESCRIPTION |
---|---|
nlp_engine_configuration
|
Dict with the configuration to load.
TYPE:
|
Source code in presidio_analyzer/nlp_engine/ner_model_configuration.py
117 118 119 120 121 122 123 124 125 |
|
to_dict
to_dict() -> Dict
Return the configuration as a dict.
Source code in presidio_analyzer/nlp_engine/ner_model_configuration.py
127 128 129 |
|
NlpArtifacts
NlpArtifacts is an abstraction layer over the results of an NLP pipeline.
processing over a given text, it holds attributes such as entities, tokens and lemmas which can be used by any recognizer
PARAMETER | DESCRIPTION |
---|---|
entities
|
Identified entities
TYPE:
|
tokens
|
Tokenized text
TYPE:
|
tokens_indices
|
Indices of tokens
TYPE:
|
lemmas
|
List of lemmas in text
TYPE:
|
nlp_engine
|
NlpEngine object
TYPE:
|
language
|
Text language
TYPE:
|
scores
|
Entity confidence scores
TYPE:
|
METHOD | DESCRIPTION |
---|---|
set_keywords |
Return keywords fpr text. |
to_json |
Convert nlp artifacts to json. |
Source code in presidio_analyzer/nlp_engine/nlp_artifacts.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 |
|
set_keywords
staticmethod
set_keywords(nlp_engine, lemmas: List[str], language: str) -> List[str]
Return keywords fpr text.
Extracts lemmas with certain conditions as keywords.
Source code in presidio_analyzer/nlp_engine/nlp_artifacts.py
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 |
|
to_json
to_json() -> str
Convert nlp artifacts to json.
Source code in presidio_analyzer/nlp_engine/nlp_artifacts.py
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 |
|
NlpEngine
Bases: ABC
NlpEngine is an abstraction layer over the nlp module.
It provides NLP preprocessing functionality as well as other queries on tokens.
METHOD | DESCRIPTION |
---|---|
load |
Load the NLP model. |
is_loaded |
Return True if the model is already loaded. |
process_text |
Execute the NLP pipeline on the given text and language. |
process_batch |
Execute the NLP pipeline on a batch of texts. |
is_stopword |
Return true if the given word is a stop word. |
is_punct |
Return true if the given word is a punctuation word. |
get_supported_entities |
Return the supported entities for this NLP engine. |
get_supported_languages |
Return the supported languages for this NLP engine. |
Source code in presidio_analyzer/nlp_engine/nlp_engine.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 |
|
load
abstractmethod
load() -> None
Load the NLP model.
Source code in presidio_analyzer/nlp_engine/nlp_engine.py
15 16 17 |
|
is_loaded
abstractmethod
is_loaded() -> bool
Return True if the model is already loaded.
Source code in presidio_analyzer/nlp_engine/nlp_engine.py
19 20 21 |
|
process_text
abstractmethod
process_text(text: str, language: str) -> NlpArtifacts
Execute the NLP pipeline on the given text and language.
Source code in presidio_analyzer/nlp_engine/nlp_engine.py
23 24 25 |
|
process_batch
abstractmethod
process_batch(
texts: Iterable[str],
language: str,
batch_size: int = 1,
n_process: int = 1,
**kwargs
) -> Iterator[Tuple[str, NlpArtifacts]]
Execute the NLP pipeline on a batch of texts.
Returns a tuple of (text, NlpArtifacts)
Source code in presidio_analyzer/nlp_engine/nlp_engine.py
27 28 29 30 31 32 33 34 35 36 37 38 39 |
|
is_stopword
abstractmethod
is_stopword(word: str, language: str) -> bool
Return true if the given word is a stop word.
(within the given language)
Source code in presidio_analyzer/nlp_engine/nlp_engine.py
41 42 43 44 45 46 47 |
|
is_punct
abstractmethod
is_punct(word: str, language: str) -> bool
Return true if the given word is a punctuation word.
(within the given language)
Source code in presidio_analyzer/nlp_engine/nlp_engine.py
49 50 51 52 53 54 55 |
|
get_supported_entities
abstractmethod
get_supported_entities() -> List[str]
Return the supported entities for this NLP engine.
Source code in presidio_analyzer/nlp_engine/nlp_engine.py
57 58 59 60 |
|
get_supported_languages
abstractmethod
get_supported_languages() -> List[str]
Return the supported languages for this NLP engine.
Source code in presidio_analyzer/nlp_engine/nlp_engine.py
62 63 64 65 |
|
SpacyNlpEngine
Bases: NlpEngine
SpacyNlpEngine is an abstraction layer over the nlp module.
It provides processing functionality as well as other queries on tokens. The SpacyNlpEngine uses SpaCy as its NLP module
METHOD | DESCRIPTION |
---|---|
load |
Load the spaCy NLP model. |
get_supported_entities |
Return the supported entities for this NLP engine. |
get_supported_languages |
Return the supported languages for this NLP engine. |
is_loaded |
Return True if the model is already loaded. |
process_text |
Execute the SpaCy NLP pipeline on the given text and language. |
process_batch |
Execute the NLP pipeline on a batch of texts using spacy pipe. |
is_stopword |
Return true if the given word is a stop word. |
is_punct |
Return true if the given word is a punctuation word. |
get_nlp |
Return the language model loaded for a language. |
Source code in presidio_analyzer/nlp_engine/spacy_nlp_engine.py
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 |
|
load
load() -> None
Load the spaCy NLP model.
Source code in presidio_analyzer/nlp_engine/spacy_nlp_engine.py
53 54 55 56 57 58 59 60 61 62 |
|
get_supported_entities
get_supported_entities() -> List[str]
Return the supported entities for this NLP engine.
Source code in presidio_analyzer/nlp_engine/spacy_nlp_engine.py
80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 |
|
get_supported_languages
get_supported_languages() -> List[str]
Return the supported languages for this NLP engine.
Source code in presidio_analyzer/nlp_engine/spacy_nlp_engine.py
96 97 98 99 100 |
|
is_loaded
is_loaded() -> bool
Return True if the model is already loaded.
Source code in presidio_analyzer/nlp_engine/spacy_nlp_engine.py
102 103 104 |
|
process_text
process_text(text: str, language: str) -> NlpArtifacts
Execute the SpaCy NLP pipeline on the given text and language.
Source code in presidio_analyzer/nlp_engine/spacy_nlp_engine.py
106 107 108 109 110 111 112 |
|
process_batch
process_batch(
texts: Union[List[str], List[Tuple[str, object]]],
language: str,
batch_size: int = 1,
n_process: int = 1,
as_tuples: bool = False,
) -> Iterator[Optional[NlpArtifacts]]
Execute the NLP pipeline on a batch of texts using spacy pipe.
PARAMETER | DESCRIPTION |
---|---|
texts
|
A list of texts to process.
TYPE:
|
language
|
The language of the texts.
TYPE:
|
batch_size
|
Default batch size for pipe and evaluate.
TYPE:
|
n_process
|
Number of processors to process texts.
TYPE:
|
as_tuples
|
If set to True, inputs should be a sequence of (text, context) tuples. Output will then be a sequence of (doc, context) tuples. Defaults to False.
TYPE:
|
Source code in presidio_analyzer/nlp_engine/spacy_nlp_engine.py
114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 |
|
is_stopword
is_stopword(word: str, language: str) -> bool
Return true if the given word is a stop word.
(within the given language)
Source code in presidio_analyzer/nlp_engine/spacy_nlp_engine.py
143 144 145 146 147 148 149 |
|
is_punct
is_punct(word: str, language: str) -> bool
Return true if the given word is a punctuation word.
(within the given language).
Source code in presidio_analyzer/nlp_engine/spacy_nlp_engine.py
151 152 153 154 155 156 157 |
|
get_nlp
get_nlp(language: str) -> Language
Return the language model loaded for a language.
PARAMETER | DESCRIPTION |
---|---|
language
|
Language
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Language
|
Model from spaCy |
Source code in presidio_analyzer/nlp_engine/spacy_nlp_engine.py
159 160 161 162 163 164 165 166 |
|
StanzaNlpEngine
Bases: SpacyNlpEngine
StanzaNlpEngine is an abstraction layer over the nlp module.
It provides processing functionality as well as other queries on tokens. The StanzaNlpEngine uses spacy-stanza and stanza as its NLP module
PARAMETER | DESCRIPTION |
---|---|
models
|
Dictionary with the name of the spaCy model per language. For example: models = [{"lang_code": "en", "model_name": "en"}]
TYPE:
|
ner_model_configuration
|
Parameters for the NER model. See conf/stanza.yaml for an example
TYPE:
|
METHOD | DESCRIPTION |
---|---|
is_loaded |
Return True if the model is already loaded. |
process_text |
Execute the SpaCy NLP pipeline on the given text and language. |
process_batch |
Execute the NLP pipeline on a batch of texts using spacy pipe. |
is_stopword |
Return true if the given word is a stop word. |
is_punct |
Return true if the given word is a punctuation word. |
get_supported_entities |
Return the supported entities for this NLP engine. |
get_supported_languages |
Return the supported languages for this NLP engine. |
get_nlp |
Return the language model loaded for a language. |
load |
Load the NLP model. |
Source code in presidio_analyzer/nlp_engine/stanza_nlp_engine.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 |
|
is_loaded
is_loaded() -> bool
Return True if the model is already loaded.
Source code in presidio_analyzer/nlp_engine/spacy_nlp_engine.py
102 103 104 |
|
process_text
process_text(text: str, language: str) -> NlpArtifacts
Execute the SpaCy NLP pipeline on the given text and language.
Source code in presidio_analyzer/nlp_engine/spacy_nlp_engine.py
106 107 108 109 110 111 112 |
|
process_batch
process_batch(
texts: Union[List[str], List[Tuple[str, object]]],
language: str,
batch_size: int = 1,
n_process: int = 1,
as_tuples: bool = False,
) -> Iterator[Optional[NlpArtifacts]]
Execute the NLP pipeline on a batch of texts using spacy pipe.
PARAMETER | DESCRIPTION |
---|---|
texts
|
A list of texts to process.
TYPE:
|
language
|
The language of the texts.
TYPE:
|
batch_size
|
Default batch size for pipe and evaluate.
TYPE:
|
n_process
|
Number of processors to process texts.
TYPE:
|
as_tuples
|
If set to True, inputs should be a sequence of (text, context) tuples. Output will then be a sequence of (doc, context) tuples. Defaults to False.
TYPE:
|
Source code in presidio_analyzer/nlp_engine/spacy_nlp_engine.py
114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 |
|
is_stopword
is_stopword(word: str, language: str) -> bool
Return true if the given word is a stop word.
(within the given language)
Source code in presidio_analyzer/nlp_engine/spacy_nlp_engine.py
143 144 145 146 147 148 149 |
|
is_punct
is_punct(word: str, language: str) -> bool
Return true if the given word is a punctuation word.
(within the given language).
Source code in presidio_analyzer/nlp_engine/spacy_nlp_engine.py
151 152 153 154 155 156 157 |
|
get_supported_entities
get_supported_entities() -> List[str]
Return the supported entities for this NLP engine.
Source code in presidio_analyzer/nlp_engine/spacy_nlp_engine.py
80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 |
|
get_supported_languages
get_supported_languages() -> List[str]
Return the supported languages for this NLP engine.
Source code in presidio_analyzer/nlp_engine/spacy_nlp_engine.py
96 97 98 99 100 |
|
get_nlp
get_nlp(language: str) -> Language
Return the language model loaded for a language.
PARAMETER | DESCRIPTION |
---|---|
language
|
Language
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Language
|
Model from spaCy |
Source code in presidio_analyzer/nlp_engine/spacy_nlp_engine.py
159 160 161 162 163 164 165 166 |
|
load
load() -> None
Load the NLP model.
Source code in presidio_analyzer/nlp_engine/stanza_nlp_engine.py
52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 |
|
TransformersNlpEngine
Bases: SpacyNlpEngine
TransformersNlpEngine is a transformers based NlpEngine.
It comprises a spacy pipeline used for tokenization, lemmatization, pos, and a transformers component for NER.
Both the underlying spacy pipeline and the transformers engine could be configured by the user. :example: [{"lang_code": "en", "model_name": { "spacy": "en_core_web_sm", "transformers": "dslim/bert-base-NER" } }]
PARAMETER | DESCRIPTION |
---|---|
models
|
A dict holding the model's configuration.
TYPE:
|
ner_model_configuration
|
Parameters for the NER model. See conf/transformers.yaml for an example Note that since the spaCy model is not used for NER, we recommend using a simple model, such as en_core_web_sm for English. For potential Transformers models, see a list of models here: https://huggingface.co/models?pipeline_tag=token-classification It is further recommended to fine-tune these models to the specific scenario in hand.
TYPE:
|
METHOD | DESCRIPTION |
---|---|
is_loaded |
Return True if the model is already loaded. |
process_text |
Execute the SpaCy NLP pipeline on the given text and language. |
process_batch |
Execute the NLP pipeline on a batch of texts using spacy pipe. |
is_stopword |
Return true if the given word is a stop word. |
is_punct |
Return true if the given word is a punctuation word. |
get_supported_entities |
Return the supported entities for this NLP engine. |
get_supported_languages |
Return the supported languages for this NLP engine. |
get_nlp |
Return the language model loaded for a language. |
load |
Load the spaCy and transformers models. |
Source code in presidio_analyzer/nlp_engine/transformers_nlp_engine.py
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 |
|
is_loaded
is_loaded() -> bool
Return True if the model is already loaded.
Source code in presidio_analyzer/nlp_engine/spacy_nlp_engine.py
102 103 104 |
|
process_text
process_text(text: str, language: str) -> NlpArtifacts
Execute the SpaCy NLP pipeline on the given text and language.
Source code in presidio_analyzer/nlp_engine/spacy_nlp_engine.py
106 107 108 109 110 111 112 |
|
process_batch
process_batch(
texts: Union[List[str], List[Tuple[str, object]]],
language: str,
batch_size: int = 1,
n_process: int = 1,
as_tuples: bool = False,
) -> Iterator[Optional[NlpArtifacts]]
Execute the NLP pipeline on a batch of texts using spacy pipe.
PARAMETER | DESCRIPTION |
---|---|
texts
|
A list of texts to process.
TYPE:
|
language
|
The language of the texts.
TYPE:
|
batch_size
|
Default batch size for pipe and evaluate.
TYPE:
|
n_process
|
Number of processors to process texts.
TYPE:
|
as_tuples
|
If set to True, inputs should be a sequence of (text, context) tuples. Output will then be a sequence of (doc, context) tuples. Defaults to False.
TYPE:
|
Source code in presidio_analyzer/nlp_engine/spacy_nlp_engine.py
114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 |
|
is_stopword
is_stopword(word: str, language: str) -> bool
Return true if the given word is a stop word.
(within the given language)
Source code in presidio_analyzer/nlp_engine/spacy_nlp_engine.py
143 144 145 146 147 148 149 |
|
is_punct
is_punct(word: str, language: str) -> bool
Return true if the given word is a punctuation word.
(within the given language).
Source code in presidio_analyzer/nlp_engine/spacy_nlp_engine.py
151 152 153 154 155 156 157 |
|
get_supported_entities
get_supported_entities() -> List[str]
Return the supported entities for this NLP engine.
Source code in presidio_analyzer/nlp_engine/spacy_nlp_engine.py
80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 |
|
get_supported_languages
get_supported_languages() -> List[str]
Return the supported languages for this NLP engine.
Source code in presidio_analyzer/nlp_engine/spacy_nlp_engine.py
96 97 98 99 100 |
|
get_nlp
get_nlp(language: str) -> Language
Return the language model loaded for a language.
PARAMETER | DESCRIPTION |
---|---|
language
|
Language
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Language
|
Model from spaCy |
Source code in presidio_analyzer/nlp_engine/spacy_nlp_engine.py
159 160 161 162 163 164 165 166 |
|
load
load() -> None
Load the spaCy and transformers models.
Source code in presidio_analyzer/nlp_engine/transformers_nlp_engine.py
73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 |
|
NlpEngineProvider
Create different NLP engines from configuration.
:example: configuration: { "nlp_engine_name": "spacy", "models": [{"lang_code": "en", "model_name": "en_core_web_lg" }] } Nlp engine names available by default: spacy, stanza.
PARAMETER | DESCRIPTION |
---|---|
nlp_engines
|
List of available NLP engines. Default: (SpacyNlpEngine, StanzaNlpEngine)
TYPE:
|
nlp_configuration
|
Dict containing nlp configuration
TYPE:
|
conf_file
|
Path to yaml file containing nlp engine configuration.
TYPE:
|
METHOD | DESCRIPTION |
---|---|
create_engine |
Create an NLP engine instance. |
Source code in presidio_analyzer/nlp_engine/nlp_engine_provider.py
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 |
|
create_engine
create_engine() -> NlpEngine
Create an NLP engine instance.
Source code in presidio_analyzer/nlp_engine/nlp_engine_provider.py
67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 |
|
Predefined Recognizers
presidio_analyzer.predefined_recognizers
Predefined recognizers package. Holds all the default recognizers.
TransformersRecognizer
Bases: SpacyRecognizer
Recognize entities using the spacy-huggingface-pipeline package.
The recognizer doesn't run transformers models, but loads the output from the NlpArtifacts See: - https://huggingface.co/docs/transformers/main/en/index for transformer models - https://github.com/explosion/spacy-huggingface-pipelines on the spaCy wrapper to transformers
METHOD | DESCRIPTION |
---|---|
enhance_using_context |
Enhance confidence score using context of the entity. |
get_supported_entities |
Return the list of entities this recognizer can identify. |
get_supported_language |
Return the language this recognizer can support. |
get_version |
Return the version of this recognizer. |
to_dict |
Serialize self to dictionary. |
from_dict |
Create EntityRecognizer from a dict input. |
remove_duplicates |
Remove duplicate results. |
sanitize_value |
Cleanse the input string of the replacement pairs specified as argument. |
build_explanation |
Create explanation for why this result was detected. |
Source code in presidio_analyzer/predefined_recognizers/transformers_recognizer.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
|
id
property
id
Return a unique identifier of this recognizer.
enhance_using_context
enhance_using_context(
text: str,
raw_recognizer_results: List[RecognizerResult],
other_raw_recognizer_results: List[RecognizerResult],
nlp_artifacts: NlpArtifacts,
context: Optional[List[str]] = None,
) -> List[RecognizerResult]
Enhance confidence score using context of the entity.
Override this method in derived class in case a custom logic is needed, otherwise return value will be equal to raw_results.
in case a result score is boosted, derived class need to update result.recognition_metadata[RecognizerResult.IS_SCORE_ENHANCED_BY_CONTEXT_KEY]
PARAMETER | DESCRIPTION |
---|---|
text
|
The actual text that was analyzed
TYPE:
|
raw_recognizer_results
|
This recognizer's results, to be updated based on recognizer specific context.
TYPE:
|
other_raw_recognizer_results
|
Other recognizer results matched in the given text to allow related entity context enhancement
TYPE:
|
nlp_artifacts
|
The nlp artifacts contains elements such as lemmatized tokens for better accuracy of the context enhancement process
TYPE:
|
context
|
list of context words
TYPE:
|
Source code in presidio_analyzer/entity_recognizer.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
|
get_supported_entities
get_supported_entities() -> List[str]
Return the list of entities this recognizer can identify.
RETURNS | DESCRIPTION |
---|---|
List[str]
|
A list of the supported entities by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
119 120 121 122 123 124 125 |
|
get_supported_language
get_supported_language() -> str
Return the language this recognizer can support.
RETURNS | DESCRIPTION |
---|---|
str
|
A list of the supported language by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
127 128 129 130 131 132 133 |
|
get_version
get_version() -> str
Return the version of this recognizer.
RETURNS | DESCRIPTION |
---|---|
str
|
The current version of this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
135 136 137 138 139 140 141 |
|
to_dict
to_dict() -> Dict
Serialize self to dictionary.
RETURNS | DESCRIPTION |
---|---|
Dict
|
a dictionary |
Source code in presidio_analyzer/entity_recognizer.py
143 144 145 146 147 148 149 150 151 152 153 154 155 |
|
from_dict
classmethod
from_dict(entity_recognizer_dict: Dict) -> EntityRecognizer
Create EntityRecognizer from a dict input.
PARAMETER | DESCRIPTION |
---|---|
entity_recognizer_dict
|
Dict containing keys and values for instantiation
TYPE:
|
Source code in presidio_analyzer/entity_recognizer.py
157 158 159 160 161 162 163 164 |
|
remove_duplicates
staticmethod
remove_duplicates(results: List[RecognizerResult]) -> List[RecognizerResult]
Remove duplicate results.
Remove duplicates in case the two results have identical start and ends and types.
PARAMETER | DESCRIPTION |
---|---|
results
|
List[RecognizerResult]
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
List[RecognizerResult] |
Source code in presidio_analyzer/entity_recognizer.py
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
|
sanitize_value
staticmethod
sanitize_value(text: str, replacement_pairs: List[Tuple[str, str]]) -> str
Cleanse the input string of the replacement pairs specified as argument.
PARAMETER | DESCRIPTION |
---|---|
text
|
input string
TYPE:
|
replacement_pairs
|
pairs of what has to be replaced with which value
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
cleansed string |
Source code in presidio_analyzer/entity_recognizer.py
200 201 202 203 204 205 206 207 208 209 210 211 |
|
build_explanation
build_explanation(
original_score: float, explanation: str
) -> AnalysisExplanation
Create explanation for why this result was detected.
PARAMETER | DESCRIPTION |
---|---|
original_score
|
Score given by this recognizer
TYPE:
|
explanation
|
Explanation string
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
AnalysisExplanation
|
|
Source code in presidio_analyzer/predefined_recognizers/spacy_recognizer.py
79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 |
|
AbaRoutingRecognizer
Bases: PatternRecognizer
Recognize American Banking Association (ABA) routing number.
Also known as routing transit number (RTN) and used to identify financial institutions and process transactions.
PARAMETER | DESCRIPTION |
---|---|
patterns
|
List of patterns to be used by this recognizer
TYPE:
|
context
|
List of context words to increase confidence in detection
TYPE:
|
supported_language
|
Language this recognizer supports
TYPE:
|
supported_entity
|
The entity this recognizer can detect
TYPE:
|
replacement_pairs
|
List of tuples with potential replacement values for different strings to be used during pattern matching. This can allow a greater variety in input, for example by removing dashes or spaces.
TYPE:
|
METHOD | DESCRIPTION |
---|---|
analyze |
Analyzes text to detect PII using regular expressions or deny-lists. |
enhance_using_context |
Enhance confidence score using context of the entity. |
get_supported_entities |
Return the list of entities this recognizer can identify. |
get_supported_language |
Return the language this recognizer can support. |
get_version |
Return the version of this recognizer. |
to_dict |
Serialize instance into a dictionary. |
from_dict |
Create instance from a serialized dict. |
remove_duplicates |
Remove duplicate results. |
sanitize_value |
Cleanse the input string of the replacement pairs specified as argument. |
invalidate_result |
Logic to check for result invalidation by running pruning logic. |
build_regex_explanation |
Construct an explanation for why this entity was detected. |
Source code in presidio_analyzer/predefined_recognizers/aba_routing_recognizer.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 |
|
id
property
id
Return a unique identifier of this recognizer.
analyze
analyze(
text: str,
entities: List[str],
nlp_artifacts: Optional[NlpArtifacts] = None,
regex_flags: Optional[int] = None,
) -> List[RecognizerResult]
Analyzes text to detect PII using regular expressions or deny-lists.
PARAMETER | DESCRIPTION |
---|---|
text
|
Text to be analyzed
TYPE:
|
entities
|
Entities this recognizer can detect
TYPE:
|
nlp_artifacts
|
Output values from the NLP engine
TYPE:
|
regex_flags
|
regex flags to be used in regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
|
Source code in presidio_analyzer/pattern_recognizer.py
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
|
enhance_using_context
enhance_using_context(
text: str,
raw_recognizer_results: List[RecognizerResult],
other_raw_recognizer_results: List[RecognizerResult],
nlp_artifacts: NlpArtifacts,
context: Optional[List[str]] = None,
) -> List[RecognizerResult]
Enhance confidence score using context of the entity.
Override this method in derived class in case a custom logic is needed, otherwise return value will be equal to raw_results.
in case a result score is boosted, derived class need to update result.recognition_metadata[RecognizerResult.IS_SCORE_ENHANCED_BY_CONTEXT_KEY]
PARAMETER | DESCRIPTION |
---|---|
text
|
The actual text that was analyzed
TYPE:
|
raw_recognizer_results
|
This recognizer's results, to be updated based on recognizer specific context.
TYPE:
|
other_raw_recognizer_results
|
Other recognizer results matched in the given text to allow related entity context enhancement
TYPE:
|
nlp_artifacts
|
The nlp artifacts contains elements such as lemmatized tokens for better accuracy of the context enhancement process
TYPE:
|
context
|
list of context words
TYPE:
|
Source code in presidio_analyzer/entity_recognizer.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
|
get_supported_entities
get_supported_entities() -> List[str]
Return the list of entities this recognizer can identify.
RETURNS | DESCRIPTION |
---|---|
List[str]
|
A list of the supported entities by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
119 120 121 122 123 124 125 |
|
get_supported_language
get_supported_language() -> str
Return the language this recognizer can support.
RETURNS | DESCRIPTION |
---|---|
str
|
A list of the supported language by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
127 128 129 130 131 132 133 |
|
get_version
get_version() -> str
Return the version of this recognizer.
RETURNS | DESCRIPTION |
---|---|
str
|
The current version of this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
135 136 137 138 139 140 141 |
|
to_dict
to_dict() -> Dict
Serialize instance into a dictionary.
Source code in presidio_analyzer/pattern_recognizer.py
254 255 256 257 258 259 260 261 262 263 264 |
|
from_dict
classmethod
from_dict(entity_recognizer_dict: Dict) -> PatternRecognizer
Create instance from a serialized dict.
Source code in presidio_analyzer/pattern_recognizer.py
266 267 268 269 270 271 272 273 274 |
|
remove_duplicates
staticmethod
remove_duplicates(results: List[RecognizerResult]) -> List[RecognizerResult]
Remove duplicate results.
Remove duplicates in case the two results have identical start and ends and types.
PARAMETER | DESCRIPTION |
---|---|
results
|
List[RecognizerResult]
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
List[RecognizerResult] |
Source code in presidio_analyzer/entity_recognizer.py
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
|
sanitize_value
staticmethod
sanitize_value(text: str, replacement_pairs: List[Tuple[str, str]]) -> str
Cleanse the input string of the replacement pairs specified as argument.
PARAMETER | DESCRIPTION |
---|---|
text
|
input string
TYPE:
|
replacement_pairs
|
pairs of what has to be replaced with which value
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
cleansed string |
Source code in presidio_analyzer/entity_recognizer.py
200 201 202 203 204 205 206 207 208 209 210 211 |
|
invalidate_result
invalidate_result(pattern_text: str) -> Optional[bool]
Logic to check for result invalidation by running pruning logic.
For example, each SSN number group should not consist of all the same digits.
PARAMETER | DESCRIPTION |
---|---|
pattern_text
|
the text to validated. Only the part in text that was detected by the regex engine
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Optional[bool]
|
A bool indicating whether the result is invalidated |
Source code in presidio_analyzer/pattern_recognizer.py
127 128 129 130 131 132 133 134 135 136 137 |
|
build_regex_explanation
staticmethod
build_regex_explanation(
recognizer_name: str,
pattern_name: str,
pattern: str,
original_score: float,
validation_result: bool,
regex_flags: int,
) -> AnalysisExplanation
Construct an explanation for why this entity was detected.
PARAMETER | DESCRIPTION |
---|---|
recognizer_name
|
Name of recognizer detecting the entity
TYPE:
|
pattern_name
|
Regex pattern name which detected the entity
TYPE:
|
pattern
|
Regex pattern logic
TYPE:
|
original_score
|
Score given by the recognizer
TYPE:
|
validation_result
|
Whether validation was used and its result
TYPE:
|
regex_flags
|
Regex flags used in the regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
AnalysisExplanation
|
Analysis explanation |
Source code in presidio_analyzer/pattern_recognizer.py
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
|
AuAbnRecognizer
Bases: PatternRecognizer
Recognizes Australian Business Number ("ABN").
The Australian Business Number (ABN) is a unique 11 digit identifier issued to all entities registered in the Australian Business Register (ABR). The 11 digit ABN is structured as a 9 digit identifier with two leading check digits. The leading check digits are derived using a modulus 89 calculation. This recognizer identifies ABN using regex, context words and checksum. Reference: https://abr.business.gov.au/Help/AbnFormat
PARAMETER | DESCRIPTION |
---|---|
patterns
|
List of patterns to be used by this recognizer
TYPE:
|
context
|
List of context words to increase confidence in detection
TYPE:
|
supported_language
|
Language this recognizer supports
TYPE:
|
supported_entity
|
The entity this recognizer can detect
TYPE:
|
replacement_pairs
|
List of tuples with potential replacement values for different strings to be used during pattern matching. This can allow a greater variety in input, for example by removing dashes or spaces.
TYPE:
|
METHOD | DESCRIPTION |
---|---|
analyze |
Analyzes text to detect PII using regular expressions or deny-lists. |
enhance_using_context |
Enhance confidence score using context of the entity. |
get_supported_entities |
Return the list of entities this recognizer can identify. |
get_supported_language |
Return the language this recognizer can support. |
get_version |
Return the version of this recognizer. |
to_dict |
Serialize instance into a dictionary. |
from_dict |
Create instance from a serialized dict. |
remove_duplicates |
Remove duplicate results. |
sanitize_value |
Cleanse the input string of the replacement pairs specified as argument. |
invalidate_result |
Logic to check for result invalidation by running pruning logic. |
build_regex_explanation |
Construct an explanation for why this entity was detected. |
validate_result |
Validate the pattern logic e.g., by running checksum on a detected pattern. |
Source code in presidio_analyzer/predefined_recognizers/au_abn_recognizer.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 |
|
id
property
id
Return a unique identifier of this recognizer.
analyze
analyze(
text: str,
entities: List[str],
nlp_artifacts: Optional[NlpArtifacts] = None,
regex_flags: Optional[int] = None,
) -> List[RecognizerResult]
Analyzes text to detect PII using regular expressions or deny-lists.
PARAMETER | DESCRIPTION |
---|---|
text
|
Text to be analyzed
TYPE:
|
entities
|
Entities this recognizer can detect
TYPE:
|
nlp_artifacts
|
Output values from the NLP engine
TYPE:
|
regex_flags
|
regex flags to be used in regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
|
Source code in presidio_analyzer/pattern_recognizer.py
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
|
enhance_using_context
enhance_using_context(
text: str,
raw_recognizer_results: List[RecognizerResult],
other_raw_recognizer_results: List[RecognizerResult],
nlp_artifacts: NlpArtifacts,
context: Optional[List[str]] = None,
) -> List[RecognizerResult]
Enhance confidence score using context of the entity.
Override this method in derived class in case a custom logic is needed, otherwise return value will be equal to raw_results.
in case a result score is boosted, derived class need to update result.recognition_metadata[RecognizerResult.IS_SCORE_ENHANCED_BY_CONTEXT_KEY]
PARAMETER | DESCRIPTION |
---|---|
text
|
The actual text that was analyzed
TYPE:
|
raw_recognizer_results
|
This recognizer's results, to be updated based on recognizer specific context.
TYPE:
|
other_raw_recognizer_results
|
Other recognizer results matched in the given text to allow related entity context enhancement
TYPE:
|
nlp_artifacts
|
The nlp artifacts contains elements such as lemmatized tokens for better accuracy of the context enhancement process
TYPE:
|
context
|
list of context words
TYPE:
|
Source code in presidio_analyzer/entity_recognizer.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
|
get_supported_entities
get_supported_entities() -> List[str]
Return the list of entities this recognizer can identify.
RETURNS | DESCRIPTION |
---|---|
List[str]
|
A list of the supported entities by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
119 120 121 122 123 124 125 |
|
get_supported_language
get_supported_language() -> str
Return the language this recognizer can support.
RETURNS | DESCRIPTION |
---|---|
str
|
A list of the supported language by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
127 128 129 130 131 132 133 |
|
get_version
get_version() -> str
Return the version of this recognizer.
RETURNS | DESCRIPTION |
---|---|
str
|
The current version of this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
135 136 137 138 139 140 141 |
|
to_dict
to_dict() -> Dict
Serialize instance into a dictionary.
Source code in presidio_analyzer/pattern_recognizer.py
254 255 256 257 258 259 260 261 262 263 264 |
|
from_dict
classmethod
from_dict(entity_recognizer_dict: Dict) -> PatternRecognizer
Create instance from a serialized dict.
Source code in presidio_analyzer/pattern_recognizer.py
266 267 268 269 270 271 272 273 274 |
|
remove_duplicates
staticmethod
remove_duplicates(results: List[RecognizerResult]) -> List[RecognizerResult]
Remove duplicate results.
Remove duplicates in case the two results have identical start and ends and types.
PARAMETER | DESCRIPTION |
---|---|
results
|
List[RecognizerResult]
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
List[RecognizerResult] |
Source code in presidio_analyzer/entity_recognizer.py
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
|
sanitize_value
staticmethod
sanitize_value(text: str, replacement_pairs: List[Tuple[str, str]]) -> str
Cleanse the input string of the replacement pairs specified as argument.
PARAMETER | DESCRIPTION |
---|---|
text
|
input string
TYPE:
|
replacement_pairs
|
pairs of what has to be replaced with which value
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
cleansed string |
Source code in presidio_analyzer/entity_recognizer.py
200 201 202 203 204 205 206 207 208 209 210 211 |
|
invalidate_result
invalidate_result(pattern_text: str) -> Optional[bool]
Logic to check for result invalidation by running pruning logic.
For example, each SSN number group should not consist of all the same digits.
PARAMETER | DESCRIPTION |
---|---|
pattern_text
|
the text to validated. Only the part in text that was detected by the regex engine
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Optional[bool]
|
A bool indicating whether the result is invalidated |
Source code in presidio_analyzer/pattern_recognizer.py
127 128 129 130 131 132 133 134 135 136 137 |
|
build_regex_explanation
staticmethod
build_regex_explanation(
recognizer_name: str,
pattern_name: str,
pattern: str,
original_score: float,
validation_result: bool,
regex_flags: int,
) -> AnalysisExplanation
Construct an explanation for why this entity was detected.
PARAMETER | DESCRIPTION |
---|---|
recognizer_name
|
Name of recognizer detecting the entity
TYPE:
|
pattern_name
|
Regex pattern name which detected the entity
TYPE:
|
pattern
|
Regex pattern logic
TYPE:
|
original_score
|
Score given by the recognizer
TYPE:
|
validation_result
|
Whether validation was used and its result
TYPE:
|
regex_flags
|
Regex flags used in the regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
AnalysisExplanation
|
Analysis explanation |
Source code in presidio_analyzer/pattern_recognizer.py
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
|
validate_result
validate_result(pattern_text: str) -> bool
Validate the pattern logic e.g., by running checksum on a detected pattern.
PARAMETER | DESCRIPTION |
---|---|
pattern_text
|
the text to validated. Only the part in text that was detected by the regex engine
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
bool
|
A bool indicating whether the validation was successful. |
Source code in presidio_analyzer/predefined_recognizers/au_abn_recognizer.py
66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 |
|
AuAcnRecognizer
Bases: PatternRecognizer
Recognizes Australian Company Number ("ACN").
The Australian Company Number (ACN) is a nine digit number with the last digit being a check digit calculated using a modified modulus 10 calculation. This recognizer identifies ACN using regex, context words, and checksum. Reference: https://asic.gov.au/
PARAMETER | DESCRIPTION |
---|---|
patterns
|
List of patterns to be used by this recognizer
TYPE:
|
context
|
List of context words to increase confidence in detection
TYPE:
|
supported_language
|
Language this recognizer supports
TYPE:
|
supported_entity
|
The entity this recognizer can detect
TYPE:
|
replacement_pairs
|
List of tuples with potential replacement values for different strings to be used during pattern matching. This can allow a greater variety in input, for example by removing dashes or spaces.
TYPE:
|
METHOD | DESCRIPTION |
---|---|
analyze |
Analyzes text to detect PII using regular expressions or deny-lists. |
enhance_using_context |
Enhance confidence score using context of the entity. |
get_supported_entities |
Return the list of entities this recognizer can identify. |
get_supported_language |
Return the language this recognizer can support. |
get_version |
Return the version of this recognizer. |
to_dict |
Serialize instance into a dictionary. |
from_dict |
Create instance from a serialized dict. |
remove_duplicates |
Remove duplicate results. |
sanitize_value |
Cleanse the input string of the replacement pairs specified as argument. |
invalidate_result |
Logic to check for result invalidation by running pruning logic. |
build_regex_explanation |
Construct an explanation for why this entity was detected. |
validate_result |
Validate the pattern logic e.g., by running checksum on a detected pattern. |
Source code in presidio_analyzer/predefined_recognizers/au_acn_recognizer.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 |
|
id
property
id
Return a unique identifier of this recognizer.
analyze
analyze(
text: str,
entities: List[str],
nlp_artifacts: Optional[NlpArtifacts] = None,
regex_flags: Optional[int] = None,
) -> List[RecognizerResult]
Analyzes text to detect PII using regular expressions or deny-lists.
PARAMETER | DESCRIPTION |
---|---|
text
|
Text to be analyzed
TYPE:
|
entities
|
Entities this recognizer can detect
TYPE:
|
nlp_artifacts
|
Output values from the NLP engine
TYPE:
|
regex_flags
|
regex flags to be used in regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
|
Source code in presidio_analyzer/pattern_recognizer.py
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
|
enhance_using_context
enhance_using_context(
text: str,
raw_recognizer_results: List[RecognizerResult],
other_raw_recognizer_results: List[RecognizerResult],
nlp_artifacts: NlpArtifacts,
context: Optional[List[str]] = None,
) -> List[RecognizerResult]
Enhance confidence score using context of the entity.
Override this method in derived class in case a custom logic is needed, otherwise return value will be equal to raw_results.
in case a result score is boosted, derived class need to update result.recognition_metadata[RecognizerResult.IS_SCORE_ENHANCED_BY_CONTEXT_KEY]
PARAMETER | DESCRIPTION |
---|---|
text
|
The actual text that was analyzed
TYPE:
|
raw_recognizer_results
|
This recognizer's results, to be updated based on recognizer specific context.
TYPE:
|
other_raw_recognizer_results
|
Other recognizer results matched in the given text to allow related entity context enhancement
TYPE:
|
nlp_artifacts
|
The nlp artifacts contains elements such as lemmatized tokens for better accuracy of the context enhancement process
TYPE:
|
context
|
list of context words
TYPE:
|
Source code in presidio_analyzer/entity_recognizer.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
|
get_supported_entities
get_supported_entities() -> List[str]
Return the list of entities this recognizer can identify.
RETURNS | DESCRIPTION |
---|---|
List[str]
|
A list of the supported entities by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
119 120 121 122 123 124 125 |
|
get_supported_language
get_supported_language() -> str
Return the language this recognizer can support.
RETURNS | DESCRIPTION |
---|---|
str
|
A list of the supported language by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
127 128 129 130 131 132 133 |
|
get_version
get_version() -> str
Return the version of this recognizer.
RETURNS | DESCRIPTION |
---|---|
str
|
The current version of this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
135 136 137 138 139 140 141 |
|
to_dict
to_dict() -> Dict
Serialize instance into a dictionary.
Source code in presidio_analyzer/pattern_recognizer.py
254 255 256 257 258 259 260 261 262 263 264 |
|
from_dict
classmethod
from_dict(entity_recognizer_dict: Dict) -> PatternRecognizer
Create instance from a serialized dict.
Source code in presidio_analyzer/pattern_recognizer.py
266 267 268 269 270 271 272 273 274 |
|
remove_duplicates
staticmethod
remove_duplicates(results: List[RecognizerResult]) -> List[RecognizerResult]
Remove duplicate results.
Remove duplicates in case the two results have identical start and ends and types.
PARAMETER | DESCRIPTION |
---|---|
results
|
List[RecognizerResult]
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
List[RecognizerResult] |
Source code in presidio_analyzer/entity_recognizer.py
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
|
sanitize_value
staticmethod
sanitize_value(text: str, replacement_pairs: List[Tuple[str, str]]) -> str
Cleanse the input string of the replacement pairs specified as argument.
PARAMETER | DESCRIPTION |
---|---|
text
|
input string
TYPE:
|
replacement_pairs
|
pairs of what has to be replaced with which value
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
cleansed string |
Source code in presidio_analyzer/entity_recognizer.py
200 201 202 203 204 205 206 207 208 209 210 211 |
|
invalidate_result
invalidate_result(pattern_text: str) -> Optional[bool]
Logic to check for result invalidation by running pruning logic.
For example, each SSN number group should not consist of all the same digits.
PARAMETER | DESCRIPTION |
---|---|
pattern_text
|
the text to validated. Only the part in text that was detected by the regex engine
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Optional[bool]
|
A bool indicating whether the result is invalidated |
Source code in presidio_analyzer/pattern_recognizer.py
127 128 129 130 131 132 133 134 135 136 137 |
|
build_regex_explanation
staticmethod
build_regex_explanation(
recognizer_name: str,
pattern_name: str,
pattern: str,
original_score: float,
validation_result: bool,
regex_flags: int,
) -> AnalysisExplanation
Construct an explanation for why this entity was detected.
PARAMETER | DESCRIPTION |
---|---|
recognizer_name
|
Name of recognizer detecting the entity
TYPE:
|
pattern_name
|
Regex pattern name which detected the entity
TYPE:
|
pattern
|
Regex pattern logic
TYPE:
|
original_score
|
Score given by the recognizer
TYPE:
|
validation_result
|
Whether validation was used and its result
TYPE:
|
regex_flags
|
Regex flags used in the regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
AnalysisExplanation
|
Analysis explanation |
Source code in presidio_analyzer/pattern_recognizer.py
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
|
validate_result
validate_result(pattern_text: str) -> bool
Validate the pattern logic e.g., by running checksum on a detected pattern.
PARAMETER | DESCRIPTION |
---|---|
pattern_text
|
the text to validated. Only the part in text that was detected by the regex engine
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
bool
|
A bool indicating whether the validation was successful. |
Source code in presidio_analyzer/predefined_recognizers/au_acn_recognizer.py
63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 |
|
AuMedicareRecognizer
Bases: PatternRecognizer
Recognizes Australian Medicare number using regex, context words, and checksum.
Medicare number is a unique identifier issued by Australian Government that enables the cardholder to receive a rebates of medical expenses under Australia's Medicare system. It uses a modulus 10 checksum scheme to validate the number. Reference: https://en.wikipedia.org/wiki/Medicare_card_(Australia)
PARAMETER | DESCRIPTION |
---|---|
patterns
|
List of patterns to be used by this recognizer
TYPE:
|
context
|
List of context words to increase confidence in detection
TYPE:
|
supported_language
|
Language this recognizer supports
TYPE:
|
supported_entity
|
The entity this recognizer can detect
TYPE:
|
replacement_pairs
|
List of tuples with potential replacement values for different strings to be used during pattern matching. This can allow a greater variety in input, for example by removing dashes or spaces.
TYPE:
|
METHOD | DESCRIPTION |
---|---|
analyze |
Analyzes text to detect PII using regular expressions or deny-lists. |
enhance_using_context |
Enhance confidence score using context of the entity. |
get_supported_entities |
Return the list of entities this recognizer can identify. |
get_supported_language |
Return the language this recognizer can support. |
get_version |
Return the version of this recognizer. |
to_dict |
Serialize instance into a dictionary. |
from_dict |
Create instance from a serialized dict. |
remove_duplicates |
Remove duplicate results. |
sanitize_value |
Cleanse the input string of the replacement pairs specified as argument. |
invalidate_result |
Logic to check for result invalidation by running pruning logic. |
build_regex_explanation |
Construct an explanation for why this entity was detected. |
validate_result |
Validate the pattern logic e.g., by running checksum on a detected pattern. |
Source code in presidio_analyzer/predefined_recognizers/au_medicare_recognizer.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 |
|
id
property
id
Return a unique identifier of this recognizer.
analyze
analyze(
text: str,
entities: List[str],
nlp_artifacts: Optional[NlpArtifacts] = None,
regex_flags: Optional[int] = None,
) -> List[RecognizerResult]
Analyzes text to detect PII using regular expressions or deny-lists.
PARAMETER | DESCRIPTION |
---|---|
text
|
Text to be analyzed
TYPE:
|
entities
|
Entities this recognizer can detect
TYPE:
|
nlp_artifacts
|
Output values from the NLP engine
TYPE:
|
regex_flags
|
regex flags to be used in regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
|
Source code in presidio_analyzer/pattern_recognizer.py
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
|
enhance_using_context
enhance_using_context(
text: str,
raw_recognizer_results: List[RecognizerResult],
other_raw_recognizer_results: List[RecognizerResult],
nlp_artifacts: NlpArtifacts,
context: Optional[List[str]] = None,
) -> List[RecognizerResult]
Enhance confidence score using context of the entity.
Override this method in derived class in case a custom logic is needed, otherwise return value will be equal to raw_results.
in case a result score is boosted, derived class need to update result.recognition_metadata[RecognizerResult.IS_SCORE_ENHANCED_BY_CONTEXT_KEY]
PARAMETER | DESCRIPTION |
---|---|
text
|
The actual text that was analyzed
TYPE:
|
raw_recognizer_results
|
This recognizer's results, to be updated based on recognizer specific context.
TYPE:
|
other_raw_recognizer_results
|
Other recognizer results matched in the given text to allow related entity context enhancement
TYPE:
|
nlp_artifacts
|
The nlp artifacts contains elements such as lemmatized tokens for better accuracy of the context enhancement process
TYPE:
|
context
|
list of context words
TYPE:
|
Source code in presidio_analyzer/entity_recognizer.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
|
get_supported_entities
get_supported_entities() -> List[str]
Return the list of entities this recognizer can identify.
RETURNS | DESCRIPTION |
---|---|
List[str]
|
A list of the supported entities by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
119 120 121 122 123 124 125 |
|
get_supported_language
get_supported_language() -> str
Return the language this recognizer can support.
RETURNS | DESCRIPTION |
---|---|
str
|
A list of the supported language by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
127 128 129 130 131 132 133 |
|
get_version
get_version() -> str
Return the version of this recognizer.
RETURNS | DESCRIPTION |
---|---|
str
|
The current version of this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
135 136 137 138 139 140 141 |
|
to_dict
to_dict() -> Dict
Serialize instance into a dictionary.
Source code in presidio_analyzer/pattern_recognizer.py
254 255 256 257 258 259 260 261 262 263 264 |
|
from_dict
classmethod
from_dict(entity_recognizer_dict: Dict) -> PatternRecognizer
Create instance from a serialized dict.
Source code in presidio_analyzer/pattern_recognizer.py
266 267 268 269 270 271 272 273 274 |
|
remove_duplicates
staticmethod
remove_duplicates(results: List[RecognizerResult]) -> List[RecognizerResult]
Remove duplicate results.
Remove duplicates in case the two results have identical start and ends and types.
PARAMETER | DESCRIPTION |
---|---|
results
|
List[RecognizerResult]
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
List[RecognizerResult] |
Source code in presidio_analyzer/entity_recognizer.py
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
|
sanitize_value
staticmethod
sanitize_value(text: str, replacement_pairs: List[Tuple[str, str]]) -> str
Cleanse the input string of the replacement pairs specified as argument.
PARAMETER | DESCRIPTION |
---|---|
text
|
input string
TYPE:
|
replacement_pairs
|
pairs of what has to be replaced with which value
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
cleansed string |
Source code in presidio_analyzer/entity_recognizer.py
200 201 202 203 204 205 206 207 208 209 210 211 |
|
invalidate_result
invalidate_result(pattern_text: str) -> Optional[bool]
Logic to check for result invalidation by running pruning logic.
For example, each SSN number group should not consist of all the same digits.
PARAMETER | DESCRIPTION |
---|---|
pattern_text
|
the text to validated. Only the part in text that was detected by the regex engine
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Optional[bool]
|
A bool indicating whether the result is invalidated |
Source code in presidio_analyzer/pattern_recognizer.py
127 128 129 130 131 132 133 134 135 136 137 |
|
build_regex_explanation
staticmethod
build_regex_explanation(
recognizer_name: str,
pattern_name: str,
pattern: str,
original_score: float,
validation_result: bool,
regex_flags: int,
) -> AnalysisExplanation
Construct an explanation for why this entity was detected.
PARAMETER | DESCRIPTION |
---|---|
recognizer_name
|
Name of recognizer detecting the entity
TYPE:
|
pattern_name
|
Regex pattern name which detected the entity
TYPE:
|
pattern
|
Regex pattern logic
TYPE:
|
original_score
|
Score given by the recognizer
TYPE:
|
validation_result
|
Whether validation was used and its result
TYPE:
|
regex_flags
|
Regex flags used in the regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
AnalysisExplanation
|
Analysis explanation |
Source code in presidio_analyzer/pattern_recognizer.py
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
|
validate_result
validate_result(pattern_text: str) -> bool
Validate the pattern logic e.g., by running checksum on a detected pattern.
PARAMETER | DESCRIPTION |
---|---|
pattern_text
|
the text to validated. Only the part in text that was detected by the regex engine
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
bool
|
A bool indicating whether the validation was successful. |
Source code in presidio_analyzer/predefined_recognizers/au_medicare_recognizer.py
63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 |
|
AuTfnRecognizer
Bases: PatternRecognizer
Recognizes Australian Tax File Numbers ("TFN").
The tax file number (TFN) is a unique identifier issued by the Australian Taxation Office to each taxpaying entity — an individual, company, superannuation fund, partnership, or trust. The TFN consists of a nine digit number, usually presented in the format NNN NNN NNN. TFN includes a check digit for detecting erroneous number based on simple modulo 11. This recognizer uses regex, context words, and checksum to identify TFN. Reference: https://www.ato.gov.au/individuals/tax-file-number/
PARAMETER | DESCRIPTION |
---|---|
patterns
|
List of patterns to be used by this recognizer
TYPE:
|
context
|
List of context words to increase confidence in detection
TYPE:
|
supported_language
|
Language this recognizer supports
TYPE:
|
supported_entity
|
The entity this recognizer can detect
TYPE:
|
replacement_pairs
|
List of tuples with potential replacement values for different strings to be used during pattern matching. This can allow a greater variety in input, for example by removing dashes or spaces.
TYPE:
|
METHOD | DESCRIPTION |
---|---|
analyze |
Analyzes text to detect PII using regular expressions or deny-lists. |
enhance_using_context |
Enhance confidence score using context of the entity. |
get_supported_entities |
Return the list of entities this recognizer can identify. |
get_supported_language |
Return the language this recognizer can support. |
get_version |
Return the version of this recognizer. |
to_dict |
Serialize instance into a dictionary. |
from_dict |
Create instance from a serialized dict. |
remove_duplicates |
Remove duplicate results. |
sanitize_value |
Cleanse the input string of the replacement pairs specified as argument. |
invalidate_result |
Logic to check for result invalidation by running pruning logic. |
build_regex_explanation |
Construct an explanation for why this entity was detected. |
validate_result |
Validate the pattern logic e.g., by running checksum on a detected pattern. |
Source code in presidio_analyzer/predefined_recognizers/au_tfn_recognizer.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 |
|
id
property
id
Return a unique identifier of this recognizer.
analyze
analyze(
text: str,
entities: List[str],
nlp_artifacts: Optional[NlpArtifacts] = None,
regex_flags: Optional[int] = None,
) -> List[RecognizerResult]
Analyzes text to detect PII using regular expressions or deny-lists.
PARAMETER | DESCRIPTION |
---|---|
text
|
Text to be analyzed
TYPE:
|
entities
|
Entities this recognizer can detect
TYPE:
|
nlp_artifacts
|
Output values from the NLP engine
TYPE:
|
regex_flags
|
regex flags to be used in regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
|
Source code in presidio_analyzer/pattern_recognizer.py
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
|
enhance_using_context
enhance_using_context(
text: str,
raw_recognizer_results: List[RecognizerResult],
other_raw_recognizer_results: List[RecognizerResult],
nlp_artifacts: NlpArtifacts,
context: Optional[List[str]] = None,
) -> List[RecognizerResult]
Enhance confidence score using context of the entity.
Override this method in derived class in case a custom logic is needed, otherwise return value will be equal to raw_results.
in case a result score is boosted, derived class need to update result.recognition_metadata[RecognizerResult.IS_SCORE_ENHANCED_BY_CONTEXT_KEY]
PARAMETER | DESCRIPTION |
---|---|
text
|
The actual text that was analyzed
TYPE:
|
raw_recognizer_results
|
This recognizer's results, to be updated based on recognizer specific context.
TYPE:
|
other_raw_recognizer_results
|
Other recognizer results matched in the given text to allow related entity context enhancement
TYPE:
|
nlp_artifacts
|
The nlp artifacts contains elements such as lemmatized tokens for better accuracy of the context enhancement process
TYPE:
|
context
|
list of context words
TYPE:
|
Source code in presidio_analyzer/entity_recognizer.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
|
get_supported_entities
get_supported_entities() -> List[str]
Return the list of entities this recognizer can identify.
RETURNS | DESCRIPTION |
---|---|
List[str]
|
A list of the supported entities by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
119 120 121 122 123 124 125 |
|
get_supported_language
get_supported_language() -> str
Return the language this recognizer can support.
RETURNS | DESCRIPTION |
---|---|
str
|
A list of the supported language by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
127 128 129 130 131 132 133 |
|
get_version
get_version() -> str
Return the version of this recognizer.
RETURNS | DESCRIPTION |
---|---|
str
|
The current version of this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
135 136 137 138 139 140 141 |
|
to_dict
to_dict() -> Dict
Serialize instance into a dictionary.
Source code in presidio_analyzer/pattern_recognizer.py
254 255 256 257 258 259 260 261 262 263 264 |
|
from_dict
classmethod
from_dict(entity_recognizer_dict: Dict) -> PatternRecognizer
Create instance from a serialized dict.
Source code in presidio_analyzer/pattern_recognizer.py
266 267 268 269 270 271 272 273 274 |
|
remove_duplicates
staticmethod
remove_duplicates(results: List[RecognizerResult]) -> List[RecognizerResult]
Remove duplicate results.
Remove duplicates in case the two results have identical start and ends and types.
PARAMETER | DESCRIPTION |
---|---|
results
|
List[RecognizerResult]
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
List[RecognizerResult] |
Source code in presidio_analyzer/entity_recognizer.py
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
|
sanitize_value
staticmethod
sanitize_value(text: str, replacement_pairs: List[Tuple[str, str]]) -> str
Cleanse the input string of the replacement pairs specified as argument.
PARAMETER | DESCRIPTION |
---|---|
text
|
input string
TYPE:
|
replacement_pairs
|
pairs of what has to be replaced with which value
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
cleansed string |
Source code in presidio_analyzer/entity_recognizer.py
200 201 202 203 204 205 206 207 208 209 210 211 |
|
invalidate_result
invalidate_result(pattern_text: str) -> Optional[bool]
Logic to check for result invalidation by running pruning logic.
For example, each SSN number group should not consist of all the same digits.
PARAMETER | DESCRIPTION |
---|---|
pattern_text
|
the text to validated. Only the part in text that was detected by the regex engine
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Optional[bool]
|
A bool indicating whether the result is invalidated |
Source code in presidio_analyzer/pattern_recognizer.py
127 128 129 130 131 132 133 134 135 136 137 |
|
build_regex_explanation
staticmethod
build_regex_explanation(
recognizer_name: str,
pattern_name: str,
pattern: str,
original_score: float,
validation_result: bool,
regex_flags: int,
) -> AnalysisExplanation
Construct an explanation for why this entity was detected.
PARAMETER | DESCRIPTION |
---|---|
recognizer_name
|
Name of recognizer detecting the entity
TYPE:
|
pattern_name
|
Regex pattern name which detected the entity
TYPE:
|
pattern
|
Regex pattern logic
TYPE:
|
original_score
|
Score given by the recognizer
TYPE:
|
validation_result
|
Whether validation was used and its result
TYPE:
|
regex_flags
|
Regex flags used in the regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
AnalysisExplanation
|
Analysis explanation |
Source code in presidio_analyzer/pattern_recognizer.py
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
|
validate_result
validate_result(pattern_text: str) -> bool
Validate the pattern logic e.g., by running checksum on a detected pattern.
PARAMETER | DESCRIPTION |
---|---|
pattern_text
|
the text to validated. Only the part in text that was detected by the regex engine
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
bool
|
A bool indicating whether the validation was successful. |
Source code in presidio_analyzer/predefined_recognizers/au_tfn_recognizer.py
69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 |
|
AzureAILanguageRecognizer
Bases: RemoteRecognizer
Wrapper for PII detection using Azure AI Language.
METHOD | DESCRIPTION |
---|---|
enhance_using_context |
Enhance confidence score using context of the entity. |
get_supported_language |
Return the language this recognizer can support. |
get_version |
Return the version of this recognizer. |
to_dict |
Serialize self to dictionary. |
from_dict |
Create EntityRecognizer from a dict input. |
remove_duplicates |
Remove duplicate results. |
sanitize_value |
Cleanse the input string of the replacement pairs specified as argument. |
get_supported_entities |
Return the list of entities this recognizer can identify. |
analyze |
Analyze text using Azure AI Language. |
Source code in presidio_analyzer/predefined_recognizers/azure_ai_language.py
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 |
|
id
property
id
Return a unique identifier of this recognizer.
enhance_using_context
enhance_using_context(
text: str,
raw_recognizer_results: List[RecognizerResult],
other_raw_recognizer_results: List[RecognizerResult],
nlp_artifacts: NlpArtifacts,
context: Optional[List[str]] = None,
) -> List[RecognizerResult]
Enhance confidence score using context of the entity.
Override this method in derived class in case a custom logic is needed, otherwise return value will be equal to raw_results.
in case a result score is boosted, derived class need to update result.recognition_metadata[RecognizerResult.IS_SCORE_ENHANCED_BY_CONTEXT_KEY]
PARAMETER | DESCRIPTION |
---|---|
text
|
The actual text that was analyzed
TYPE:
|
raw_recognizer_results
|
This recognizer's results, to be updated based on recognizer specific context.
TYPE:
|
other_raw_recognizer_results
|
Other recognizer results matched in the given text to allow related entity context enhancement
TYPE:
|
nlp_artifacts
|
The nlp artifacts contains elements such as lemmatized tokens for better accuracy of the context enhancement process
TYPE:
|
context
|
list of context words
TYPE:
|
Source code in presidio_analyzer/entity_recognizer.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
|
get_supported_language
get_supported_language() -> str
Return the language this recognizer can support.
RETURNS | DESCRIPTION |
---|---|
str
|
A list of the supported language by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
127 128 129 130 131 132 133 |
|
get_version
get_version() -> str
Return the version of this recognizer.
RETURNS | DESCRIPTION |
---|---|
str
|
The current version of this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
135 136 137 138 139 140 141 |
|
to_dict
to_dict() -> Dict
Serialize self to dictionary.
RETURNS | DESCRIPTION |
---|---|
Dict
|
a dictionary |
Source code in presidio_analyzer/entity_recognizer.py
143 144 145 146 147 148 149 150 151 152 153 154 155 |
|
from_dict
classmethod
from_dict(entity_recognizer_dict: Dict) -> EntityRecognizer
Create EntityRecognizer from a dict input.
PARAMETER | DESCRIPTION |
---|---|
entity_recognizer_dict
|
Dict containing keys and values for instantiation
TYPE:
|
Source code in presidio_analyzer/entity_recognizer.py
157 158 159 160 161 162 163 164 |
|
remove_duplicates
staticmethod
remove_duplicates(results: List[RecognizerResult]) -> List[RecognizerResult]
Remove duplicate results.
Remove duplicates in case the two results have identical start and ends and types.
PARAMETER | DESCRIPTION |
---|---|
results
|
List[RecognizerResult]
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
List[RecognizerResult] |
Source code in presidio_analyzer/entity_recognizer.py
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
|
sanitize_value
staticmethod
sanitize_value(text: str, replacement_pairs: List[Tuple[str, str]]) -> str
Cleanse the input string of the replacement pairs specified as argument.
PARAMETER | DESCRIPTION |
---|---|
text
|
input string
TYPE:
|
replacement_pairs
|
pairs of what has to be replaced with which value
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
cleansed string |
Source code in presidio_analyzer/entity_recognizer.py
200 201 202 203 204 205 206 207 208 209 210 211 |
|
get_supported_entities
get_supported_entities() -> List[str]
Return the list of entities this recognizer can identify.
RETURNS | DESCRIPTION |
---|---|
List[str]
|
A list of the supported entities by this recognizer |
Source code in presidio_analyzer/predefined_recognizers/azure_ai_language.py
69 70 71 72 73 74 75 |
|
analyze
analyze(
text: str, entities: List[str] = None, nlp_artifacts: NlpArtifacts = None
) -> List[RecognizerResult]
Analyze text using Azure AI Language.
PARAMETER | DESCRIPTION |
---|---|
text
|
Text to analyze
TYPE:
|
entities
|
List of entities to return
TYPE:
|
nlp_artifacts
|
Object of type NlpArtifacts, not used in this recognizer.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
A list of RecognizerResult, one per each entity found in the text. |
Source code in presidio_analyzer/predefined_recognizers/azure_ai_language.py
111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 |
|
CreditCardRecognizer
Bases: PatternRecognizer
Recognize common credit card numbers using regex + checksum.
PARAMETER | DESCRIPTION |
---|---|
patterns
|
List of patterns to be used by this recognizer
TYPE:
|
context
|
List of context words to increase confidence in detection
TYPE:
|
supported_language
|
Language this recognizer supports
TYPE:
|
supported_entity
|
The entity this recognizer can detect
TYPE:
|
replacement_pairs
|
List of tuples with potential replacement values for different strings to be used during pattern matching. This can allow a greater variety in input, for example by removing dashes or spaces.
TYPE:
|
METHOD | DESCRIPTION |
---|---|
analyze |
Analyzes text to detect PII using regular expressions or deny-lists. |
enhance_using_context |
Enhance confidence score using context of the entity. |
get_supported_entities |
Return the list of entities this recognizer can identify. |
get_supported_language |
Return the language this recognizer can support. |
get_version |
Return the version of this recognizer. |
to_dict |
Serialize instance into a dictionary. |
from_dict |
Create instance from a serialized dict. |
remove_duplicates |
Remove duplicate results. |
sanitize_value |
Cleanse the input string of the replacement pairs specified as argument. |
invalidate_result |
Logic to check for result invalidation by running pruning logic. |
build_regex_explanation |
Construct an explanation for why this entity was detected. |
Source code in presidio_analyzer/predefined_recognizers/credit_card_recognizer.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 |
|
id
property
id
Return a unique identifier of this recognizer.
analyze
analyze(
text: str,
entities: List[str],
nlp_artifacts: Optional[NlpArtifacts] = None,
regex_flags: Optional[int] = None,
) -> List[RecognizerResult]
Analyzes text to detect PII using regular expressions or deny-lists.
PARAMETER | DESCRIPTION |
---|---|
text
|
Text to be analyzed
TYPE:
|
entities
|
Entities this recognizer can detect
TYPE:
|
nlp_artifacts
|
Output values from the NLP engine
TYPE:
|
regex_flags
|
regex flags to be used in regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
|
Source code in presidio_analyzer/pattern_recognizer.py
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
|
enhance_using_context
enhance_using_context(
text: str,
raw_recognizer_results: List[RecognizerResult],
other_raw_recognizer_results: List[RecognizerResult],
nlp_artifacts: NlpArtifacts,
context: Optional[List[str]] = None,
) -> List[RecognizerResult]
Enhance confidence score using context of the entity.
Override this method in derived class in case a custom logic is needed, otherwise return value will be equal to raw_results.
in case a result score is boosted, derived class need to update result.recognition_metadata[RecognizerResult.IS_SCORE_ENHANCED_BY_CONTEXT_KEY]
PARAMETER | DESCRIPTION |
---|---|
text
|
The actual text that was analyzed
TYPE:
|
raw_recognizer_results
|
This recognizer's results, to be updated based on recognizer specific context.
TYPE:
|
other_raw_recognizer_results
|
Other recognizer results matched in the given text to allow related entity context enhancement
TYPE:
|
nlp_artifacts
|
The nlp artifacts contains elements such as lemmatized tokens for better accuracy of the context enhancement process
TYPE:
|
context
|
list of context words
TYPE:
|
Source code in presidio_analyzer/entity_recognizer.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
|
get_supported_entities
get_supported_entities() -> List[str]
Return the list of entities this recognizer can identify.
RETURNS | DESCRIPTION |
---|---|
List[str]
|
A list of the supported entities by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
119 120 121 122 123 124 125 |
|
get_supported_language
get_supported_language() -> str
Return the language this recognizer can support.
RETURNS | DESCRIPTION |
---|---|
str
|
A list of the supported language by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
127 128 129 130 131 132 133 |
|
get_version
get_version() -> str
Return the version of this recognizer.
RETURNS | DESCRIPTION |
---|---|
str
|
The current version of this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
135 136 137 138 139 140 141 |
|
to_dict
to_dict() -> Dict
Serialize instance into a dictionary.
Source code in presidio_analyzer/pattern_recognizer.py
254 255 256 257 258 259 260 261 262 263 264 |
|
from_dict
classmethod
from_dict(entity_recognizer_dict: Dict) -> PatternRecognizer
Create instance from a serialized dict.
Source code in presidio_analyzer/pattern_recognizer.py
266 267 268 269 270 271 272 273 274 |
|
remove_duplicates
staticmethod
remove_duplicates(results: List[RecognizerResult]) -> List[RecognizerResult]
Remove duplicate results.
Remove duplicates in case the two results have identical start and ends and types.
PARAMETER | DESCRIPTION |
---|---|
results
|
List[RecognizerResult]
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
List[RecognizerResult] |
Source code in presidio_analyzer/entity_recognizer.py
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
|
sanitize_value
staticmethod
sanitize_value(text: str, replacement_pairs: List[Tuple[str, str]]) -> str
Cleanse the input string of the replacement pairs specified as argument.
PARAMETER | DESCRIPTION |
---|---|
text
|
input string
TYPE:
|
replacement_pairs
|
pairs of what has to be replaced with which value
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
cleansed string |
Source code in presidio_analyzer/entity_recognizer.py
200 201 202 203 204 205 206 207 208 209 210 211 |
|
invalidate_result
invalidate_result(pattern_text: str) -> Optional[bool]
Logic to check for result invalidation by running pruning logic.
For example, each SSN number group should not consist of all the same digits.
PARAMETER | DESCRIPTION |
---|---|
pattern_text
|
the text to validated. Only the part in text that was detected by the regex engine
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Optional[bool]
|
A bool indicating whether the result is invalidated |
Source code in presidio_analyzer/pattern_recognizer.py
127 128 129 130 131 132 133 134 135 136 137 |
|
build_regex_explanation
staticmethod
build_regex_explanation(
recognizer_name: str,
pattern_name: str,
pattern: str,
original_score: float,
validation_result: bool,
regex_flags: int,
) -> AnalysisExplanation
Construct an explanation for why this entity was detected.
PARAMETER | DESCRIPTION |
---|---|
recognizer_name
|
Name of recognizer detecting the entity
TYPE:
|
pattern_name
|
Regex pattern name which detected the entity
TYPE:
|
pattern
|
Regex pattern logic
TYPE:
|
original_score
|
Score given by the recognizer
TYPE:
|
validation_result
|
Whether validation was used and its result
TYPE:
|
regex_flags
|
Regex flags used in the regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
AnalysisExplanation
|
Analysis explanation |
Source code in presidio_analyzer/pattern_recognizer.py
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
|
CryptoRecognizer
Bases: PatternRecognizer
Recognize common crypto account numbers using regex + checksum.
PARAMETER | DESCRIPTION |
---|---|
patterns
|
List of patterns to be used by this recognizer
TYPE:
|
context
|
List of context words to increase confidence in detection
TYPE:
|
supported_language
|
Language this recognizer supports
TYPE:
|
supported_entity
|
The entity this recognizer can detect
TYPE:
|
METHOD | DESCRIPTION |
---|---|
analyze |
Analyzes text to detect PII using regular expressions or deny-lists. |
enhance_using_context |
Enhance confidence score using context of the entity. |
get_supported_entities |
Return the list of entities this recognizer can identify. |
get_supported_language |
Return the language this recognizer can support. |
get_version |
Return the version of this recognizer. |
to_dict |
Serialize instance into a dictionary. |
from_dict |
Create instance from a serialized dict. |
remove_duplicates |
Remove duplicate results. |
sanitize_value |
Cleanse the input string of the replacement pairs specified as argument. |
invalidate_result |
Logic to check for result invalidation by running pruning logic. |
build_regex_explanation |
Construct an explanation for why this entity was detected. |
validate_result |
Validate the Bitcoin address using checksum. |
bech32_polymod |
Compute the Bech32 checksum. |
bech32_hrp_expand |
Expand the HRP into values for checksum computation. |
bech32_verify_checksum |
Verify a checksum given HRP and converted data characters. |
bech32_decode |
Validate a Bech32/Bech32m string, and determine HRP and data. |
validate_bech32_address |
Validate a Bech32 or Bech32m address. |
Source code in presidio_analyzer/predefined_recognizers/crypto_recognizer.py
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 |
|
id
property
id
Return a unique identifier of this recognizer.
analyze
analyze(
text: str,
entities: List[str],
nlp_artifacts: Optional[NlpArtifacts] = None,
regex_flags: Optional[int] = None,
) -> List[RecognizerResult]
Analyzes text to detect PII using regular expressions or deny-lists.
PARAMETER | DESCRIPTION |
---|---|
text
|
Text to be analyzed
TYPE:
|
entities
|
Entities this recognizer can detect
TYPE:
|
nlp_artifacts
|
Output values from the NLP engine
TYPE:
|
regex_flags
|
regex flags to be used in regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
|
Source code in presidio_analyzer/pattern_recognizer.py
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
|
enhance_using_context
enhance_using_context(
text: str,
raw_recognizer_results: List[RecognizerResult],
other_raw_recognizer_results: List[RecognizerResult],
nlp_artifacts: NlpArtifacts,
context: Optional[List[str]] = None,
) -> List[RecognizerResult]
Enhance confidence score using context of the entity.
Override this method in derived class in case a custom logic is needed, otherwise return value will be equal to raw_results.
in case a result score is boosted, derived class need to update result.recognition_metadata[RecognizerResult.IS_SCORE_ENHANCED_BY_CONTEXT_KEY]
PARAMETER | DESCRIPTION |
---|---|
text
|
The actual text that was analyzed
TYPE:
|
raw_recognizer_results
|
This recognizer's results, to be updated based on recognizer specific context.
TYPE:
|
other_raw_recognizer_results
|
Other recognizer results matched in the given text to allow related entity context enhancement
TYPE:
|
nlp_artifacts
|
The nlp artifacts contains elements such as lemmatized tokens for better accuracy of the context enhancement process
TYPE:
|
context
|
list of context words
TYPE:
|
Source code in presidio_analyzer/entity_recognizer.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
|
get_supported_entities
get_supported_entities() -> List[str]
Return the list of entities this recognizer can identify.
RETURNS | DESCRIPTION |
---|---|
List[str]
|
A list of the supported entities by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
119 120 121 122 123 124 125 |
|
get_supported_language
get_supported_language() -> str
Return the language this recognizer can support.
RETURNS | DESCRIPTION |
---|---|
str
|
A list of the supported language by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
127 128 129 130 131 132 133 |
|
get_version
get_version() -> str
Return the version of this recognizer.
RETURNS | DESCRIPTION |
---|---|
str
|
The current version of this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
135 136 137 138 139 140 141 |
|
to_dict
to_dict() -> Dict
Serialize instance into a dictionary.
Source code in presidio_analyzer/pattern_recognizer.py
254 255 256 257 258 259 260 261 262 263 264 |
|
from_dict
classmethod
from_dict(entity_recognizer_dict: Dict) -> PatternRecognizer
Create instance from a serialized dict.
Source code in presidio_analyzer/pattern_recognizer.py
266 267 268 269 270 271 272 273 274 |
|
remove_duplicates
staticmethod
remove_duplicates(results: List[RecognizerResult]) -> List[RecognizerResult]
Remove duplicate results.
Remove duplicates in case the two results have identical start and ends and types.
PARAMETER | DESCRIPTION |
---|---|
results
|
List[RecognizerResult]
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
List[RecognizerResult] |
Source code in presidio_analyzer/entity_recognizer.py
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
|
sanitize_value
staticmethod
sanitize_value(text: str, replacement_pairs: List[Tuple[str, str]]) -> str
Cleanse the input string of the replacement pairs specified as argument.
PARAMETER | DESCRIPTION |
---|---|
text
|
input string
TYPE:
|
replacement_pairs
|
pairs of what has to be replaced with which value
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
cleansed string |
Source code in presidio_analyzer/entity_recognizer.py
200 201 202 203 204 205 206 207 208 209 210 211 |
|
invalidate_result
invalidate_result(pattern_text: str) -> Optional[bool]
Logic to check for result invalidation by running pruning logic.
For example, each SSN number group should not consist of all the same digits.
PARAMETER | DESCRIPTION |
---|---|
pattern_text
|
the text to validated. Only the part in text that was detected by the regex engine
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Optional[bool]
|
A bool indicating whether the result is invalidated |
Source code in presidio_analyzer/pattern_recognizer.py
127 128 129 130 131 132 133 134 135 136 137 |
|
build_regex_explanation
staticmethod
build_regex_explanation(
recognizer_name: str,
pattern_name: str,
pattern: str,
original_score: float,
validation_result: bool,
regex_flags: int,
) -> AnalysisExplanation
Construct an explanation for why this entity was detected.
PARAMETER | DESCRIPTION |
---|---|
recognizer_name
|
Name of recognizer detecting the entity
TYPE:
|
pattern_name
|
Regex pattern name which detected the entity
TYPE:
|
pattern
|
Regex pattern logic
TYPE:
|
original_score
|
Score given by the recognizer
TYPE:
|
validation_result
|
Whether validation was used and its result
TYPE:
|
regex_flags
|
Regex flags used in the regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
AnalysisExplanation
|
Analysis explanation |
Source code in presidio_analyzer/pattern_recognizer.py
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
|
validate_result
validate_result(pattern_text: str) -> bool
Validate the Bitcoin address using checksum.
PARAMETER | DESCRIPTION |
---|---|
pattern_text
|
The cryptocurrency address to validate.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
bool
|
True if the address is valid according to its respective format, False otherwise. |
Source code in presidio_analyzer/predefined_recognizers/crypto_recognizer.py
52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 |
|
bech32_polymod
staticmethod
bech32_polymod(values)
Compute the Bech32 checksum.
Source code in presidio_analyzer/predefined_recognizers/crypto_recognizer.py
84 85 86 87 88 89 90 91 92 93 94 |
|
bech32_hrp_expand
staticmethod
bech32_hrp_expand(hrp)
Expand the HRP into values for checksum computation.
Source code in presidio_analyzer/predefined_recognizers/crypto_recognizer.py
96 97 98 99 |
|
bech32_verify_checksum
staticmethod
bech32_verify_checksum(hrp, data)
Verify a checksum given HRP and converted data characters.
Source code in presidio_analyzer/predefined_recognizers/crypto_recognizer.py
101 102 103 104 105 106 107 108 109 110 111 |
|
bech32_decode
staticmethod
bech32_decode(bech)
Validate a Bech32/Bech32m string, and determine HRP and data.
Source code in presidio_analyzer/predefined_recognizers/crypto_recognizer.py
113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 |
|
validate_bech32_address
staticmethod
validate_bech32_address(address)
Validate a Bech32 or Bech32m address.
Source code in presidio_analyzer/predefined_recognizers/crypto_recognizer.py
133 134 135 136 137 138 139 |
|
DateRecognizer
Bases: PatternRecognizer
Recognize date using regex.
PARAMETER | DESCRIPTION |
---|---|
patterns
|
List of patterns to be used by this recognizer
TYPE:
|
context
|
List of context words to increase confidence in detection
TYPE:
|
supported_language
|
Language this recognizer supports
TYPE:
|
supported_entity
|
The entity this recognizer can detect
TYPE:
|
METHOD | DESCRIPTION |
---|---|
analyze |
Analyzes text to detect PII using regular expressions or deny-lists. |
enhance_using_context |
Enhance confidence score using context of the entity. |
get_supported_entities |
Return the list of entities this recognizer can identify. |
get_supported_language |
Return the language this recognizer can support. |
get_version |
Return the version of this recognizer. |
to_dict |
Serialize instance into a dictionary. |
from_dict |
Create instance from a serialized dict. |
remove_duplicates |
Remove duplicate results. |
sanitize_value |
Cleanse the input string of the replacement pairs specified as argument. |
validate_result |
Validate the pattern logic e.g., by running checksum on a detected pattern. |
invalidate_result |
Logic to check for result invalidation by running pruning logic. |
build_regex_explanation |
Construct an explanation for why this entity was detected. |
Source code in presidio_analyzer/predefined_recognizers/date_recognizer.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 |
|
id
property
id
Return a unique identifier of this recognizer.
analyze
analyze(
text: str,
entities: List[str],
nlp_artifacts: Optional[NlpArtifacts] = None,
regex_flags: Optional[int] = None,
) -> List[RecognizerResult]
Analyzes text to detect PII using regular expressions or deny-lists.
PARAMETER | DESCRIPTION |
---|---|
text
|
Text to be analyzed
TYPE:
|
entities
|
Entities this recognizer can detect
TYPE:
|
nlp_artifacts
|
Output values from the NLP engine
TYPE:
|
regex_flags
|
regex flags to be used in regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
|
Source code in presidio_analyzer/pattern_recognizer.py
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
|
enhance_using_context
enhance_using_context(
text: str,
raw_recognizer_results: List[RecognizerResult],
other_raw_recognizer_results: List[RecognizerResult],
nlp_artifacts: NlpArtifacts,
context: Optional[List[str]] = None,
) -> List[RecognizerResult]
Enhance confidence score using context of the entity.
Override this method in derived class in case a custom logic is needed, otherwise return value will be equal to raw_results.
in case a result score is boosted, derived class need to update result.recognition_metadata[RecognizerResult.IS_SCORE_ENHANCED_BY_CONTEXT_KEY]
PARAMETER | DESCRIPTION |
---|---|
text
|
The actual text that was analyzed
TYPE:
|
raw_recognizer_results
|
This recognizer's results, to be updated based on recognizer specific context.
TYPE:
|
other_raw_recognizer_results
|
Other recognizer results matched in the given text to allow related entity context enhancement
TYPE:
|
nlp_artifacts
|
The nlp artifacts contains elements such as lemmatized tokens for better accuracy of the context enhancement process
TYPE:
|
context
|
list of context words
TYPE:
|
Source code in presidio_analyzer/entity_recognizer.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
|
get_supported_entities
get_supported_entities() -> List[str]
Return the list of entities this recognizer can identify.
RETURNS | DESCRIPTION |
---|---|
List[str]
|
A list of the supported entities by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
119 120 121 122 123 124 125 |
|
get_supported_language
get_supported_language() -> str
Return the language this recognizer can support.
RETURNS | DESCRIPTION |
---|---|
str
|
A list of the supported language by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
127 128 129 130 131 132 133 |
|
get_version
get_version() -> str
Return the version of this recognizer.
RETURNS | DESCRIPTION |
---|---|
str
|
The current version of this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
135 136 137 138 139 140 141 |
|
to_dict
to_dict() -> Dict
Serialize instance into a dictionary.
Source code in presidio_analyzer/pattern_recognizer.py
254 255 256 257 258 259 260 261 262 263 264 |
|
from_dict
classmethod
from_dict(entity_recognizer_dict: Dict) -> PatternRecognizer
Create instance from a serialized dict.
Source code in presidio_analyzer/pattern_recognizer.py
266 267 268 269 270 271 272 273 274 |
|
remove_duplicates
staticmethod
remove_duplicates(results: List[RecognizerResult]) -> List[RecognizerResult]
Remove duplicate results.
Remove duplicates in case the two results have identical start and ends and types.
PARAMETER | DESCRIPTION |
---|---|
results
|
List[RecognizerResult]
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
List[RecognizerResult] |
Source code in presidio_analyzer/entity_recognizer.py
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
|
sanitize_value
staticmethod
sanitize_value(text: str, replacement_pairs: List[Tuple[str, str]]) -> str
Cleanse the input string of the replacement pairs specified as argument.
PARAMETER | DESCRIPTION |
---|---|
text
|
input string
TYPE:
|
replacement_pairs
|
pairs of what has to be replaced with which value
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
cleansed string |
Source code in presidio_analyzer/entity_recognizer.py
200 201 202 203 204 205 206 207 208 209 210 211 |
|
validate_result
validate_result(pattern_text: str) -> Optional[bool]
Validate the pattern logic e.g., by running checksum on a detected pattern.
PARAMETER | DESCRIPTION |
---|---|
pattern_text
|
the text to validated. Only the part in text that was detected by the regex engine
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Optional[bool]
|
A bool indicating whether the validation was successful. |
Source code in presidio_analyzer/pattern_recognizer.py
117 118 119 120 121 122 123 124 125 |
|
invalidate_result
invalidate_result(pattern_text: str) -> Optional[bool]
Logic to check for result invalidation by running pruning logic.
For example, each SSN number group should not consist of all the same digits.
PARAMETER | DESCRIPTION |
---|---|
pattern_text
|
the text to validated. Only the part in text that was detected by the regex engine
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Optional[bool]
|
A bool indicating whether the result is invalidated |
Source code in presidio_analyzer/pattern_recognizer.py
127 128 129 130 131 132 133 134 135 136 137 |
|
build_regex_explanation
staticmethod
build_regex_explanation(
recognizer_name: str,
pattern_name: str,
pattern: str,
original_score: float,
validation_result: bool,
regex_flags: int,
) -> AnalysisExplanation
Construct an explanation for why this entity was detected.
PARAMETER | DESCRIPTION |
---|---|
recognizer_name
|
Name of recognizer detecting the entity
TYPE:
|
pattern_name
|
Regex pattern name which detected the entity
TYPE:
|
pattern
|
Regex pattern logic
TYPE:
|
original_score
|
Score given by the recognizer
TYPE:
|
validation_result
|
Whether validation was used and its result
TYPE:
|
regex_flags
|
Regex flags used in the regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
AnalysisExplanation
|
Analysis explanation |
Source code in presidio_analyzer/pattern_recognizer.py
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
|
EmailRecognizer
Bases: PatternRecognizer
Recognize email addresses using regex.
PARAMETER | DESCRIPTION |
---|---|
patterns
|
List of patterns to be used by this recognizer
TYPE:
|
context
|
List of context words to increase confidence in detection
TYPE:
|
supported_language
|
Language this recognizer supports
TYPE:
|
supported_entity
|
The entity this recognizer can detect
TYPE:
|
METHOD | DESCRIPTION |
---|---|
analyze |
Analyzes text to detect PII using regular expressions or deny-lists. |
enhance_using_context |
Enhance confidence score using context of the entity. |
get_supported_entities |
Return the list of entities this recognizer can identify. |
get_supported_language |
Return the language this recognizer can support. |
get_version |
Return the version of this recognizer. |
to_dict |
Serialize instance into a dictionary. |
from_dict |
Create instance from a serialized dict. |
remove_duplicates |
Remove duplicate results. |
sanitize_value |
Cleanse the input string of the replacement pairs specified as argument. |
invalidate_result |
Logic to check for result invalidation by running pruning logic. |
build_regex_explanation |
Construct an explanation for why this entity was detected. |
Source code in presidio_analyzer/predefined_recognizers/email_recognizer.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
|
id
property
id
Return a unique identifier of this recognizer.
analyze
analyze(
text: str,
entities: List[str],
nlp_artifacts: Optional[NlpArtifacts] = None,
regex_flags: Optional[int] = None,
) -> List[RecognizerResult]
Analyzes text to detect PII using regular expressions or deny-lists.
PARAMETER | DESCRIPTION |
---|---|
text
|
Text to be analyzed
TYPE:
|
entities
|
Entities this recognizer can detect
TYPE:
|
nlp_artifacts
|
Output values from the NLP engine
TYPE:
|
regex_flags
|
regex flags to be used in regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
|
Source code in presidio_analyzer/pattern_recognizer.py
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
|
enhance_using_context
enhance_using_context(
text: str,
raw_recognizer_results: List[RecognizerResult],
other_raw_recognizer_results: List[RecognizerResult],
nlp_artifacts: NlpArtifacts,
context: Optional[List[str]] = None,
) -> List[RecognizerResult]
Enhance confidence score using context of the entity.
Override this method in derived class in case a custom logic is needed, otherwise return value will be equal to raw_results.
in case a result score is boosted, derived class need to update result.recognition_metadata[RecognizerResult.IS_SCORE_ENHANCED_BY_CONTEXT_KEY]
PARAMETER | DESCRIPTION |
---|---|
text
|
The actual text that was analyzed
TYPE:
|
raw_recognizer_results
|
This recognizer's results, to be updated based on recognizer specific context.
TYPE:
|
other_raw_recognizer_results
|
Other recognizer results matched in the given text to allow related entity context enhancement
TYPE:
|
nlp_artifacts
|
The nlp artifacts contains elements such as lemmatized tokens for better accuracy of the context enhancement process
TYPE:
|
context
|
list of context words
TYPE:
|
Source code in presidio_analyzer/entity_recognizer.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
|
get_supported_entities
get_supported_entities() -> List[str]
Return the list of entities this recognizer can identify.
RETURNS | DESCRIPTION |
---|---|
List[str]
|
A list of the supported entities by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
119 120 121 122 123 124 125 |
|
get_supported_language
get_supported_language() -> str
Return the language this recognizer can support.
RETURNS | DESCRIPTION |
---|---|
str
|
A list of the supported language by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
127 128 129 130 131 132 133 |
|
get_version
get_version() -> str
Return the version of this recognizer.
RETURNS | DESCRIPTION |
---|---|
str
|
The current version of this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
135 136 137 138 139 140 141 |
|
to_dict
to_dict() -> Dict
Serialize instance into a dictionary.
Source code in presidio_analyzer/pattern_recognizer.py
254 255 256 257 258 259 260 261 262 263 264 |
|
from_dict
classmethod
from_dict(entity_recognizer_dict: Dict) -> PatternRecognizer
Create instance from a serialized dict.
Source code in presidio_analyzer/pattern_recognizer.py
266 267 268 269 270 271 272 273 274 |
|
remove_duplicates
staticmethod
remove_duplicates(results: List[RecognizerResult]) -> List[RecognizerResult]
Remove duplicate results.
Remove duplicates in case the two results have identical start and ends and types.
PARAMETER | DESCRIPTION |
---|---|
results
|
List[RecognizerResult]
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
List[RecognizerResult] |
Source code in presidio_analyzer/entity_recognizer.py
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
|
sanitize_value
staticmethod
sanitize_value(text: str, replacement_pairs: List[Tuple[str, str]]) -> str
Cleanse the input string of the replacement pairs specified as argument.
PARAMETER | DESCRIPTION |
---|---|
text
|
input string
TYPE:
|
replacement_pairs
|
pairs of what has to be replaced with which value
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
cleansed string |
Source code in presidio_analyzer/entity_recognizer.py
200 201 202 203 204 205 206 207 208 209 210 211 |
|
invalidate_result
invalidate_result(pattern_text: str) -> Optional[bool]
Logic to check for result invalidation by running pruning logic.
For example, each SSN number group should not consist of all the same digits.
PARAMETER | DESCRIPTION |
---|---|
pattern_text
|
the text to validated. Only the part in text that was detected by the regex engine
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Optional[bool]
|
A bool indicating whether the result is invalidated |
Source code in presidio_analyzer/pattern_recognizer.py
127 128 129 130 131 132 133 134 135 136 137 |
|
build_regex_explanation
staticmethod
build_regex_explanation(
recognizer_name: str,
pattern_name: str,
pattern: str,
original_score: float,
validation_result: bool,
regex_flags: int,
) -> AnalysisExplanation
Construct an explanation for why this entity was detected.
PARAMETER | DESCRIPTION |
---|---|
recognizer_name
|
Name of recognizer detecting the entity
TYPE:
|
pattern_name
|
Regex pattern name which detected the entity
TYPE:
|
pattern
|
Regex pattern logic
TYPE:
|
original_score
|
Score given by the recognizer
TYPE:
|
validation_result
|
Whether validation was used and its result
TYPE:
|
regex_flags
|
Regex flags used in the regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
AnalysisExplanation
|
Analysis explanation |
Source code in presidio_analyzer/pattern_recognizer.py
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
|
EsNieRecognizer
Bases: PatternRecognizer
Recognize NIE number using regex and checksum.
Reference(s): https://es.wikipedia.org/wiki/N%C3%BAmero_de_identidad_de_extranjero https://www.interior.gob.es/opencms/ca/servicios-al-ciudadano/tramites-y-gestiones/dni/calculo-del-digito-de-control-del-nif-nie/
PARAMETER | DESCRIPTION |
---|---|
patterns
|
List of patterns to be used by this recognizer
TYPE:
|
context
|
List of context words to increase confidence in detection
TYPE:
|
supported_language
|
Language this recognizer supports
TYPE:
|
supported_entity
|
The entity this recognizer can detect
TYPE:
|
replacement_pairs
|
List of tuples with potential replacement values for different strings to be used during pattern matching. This can allow a greater variety in input, for example by removing dashes or spaces.
TYPE:
|
METHOD | DESCRIPTION |
---|---|
analyze |
Analyzes text to detect PII using regular expressions or deny-lists. |
enhance_using_context |
Enhance confidence score using context of the entity. |
get_supported_entities |
Return the list of entities this recognizer can identify. |
get_supported_language |
Return the language this recognizer can support. |
get_version |
Return the version of this recognizer. |
to_dict |
Serialize instance into a dictionary. |
from_dict |
Create instance from a serialized dict. |
remove_duplicates |
Remove duplicate results. |
sanitize_value |
Cleanse the input string of the replacement pairs specified as argument. |
invalidate_result |
Logic to check for result invalidation by running pruning logic. |
build_regex_explanation |
Construct an explanation for why this entity was detected. |
validate_result |
Validate the pattern by using the control character. |
Source code in presidio_analyzer/predefined_recognizers/es_nie_recognizer.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 |
|
id
property
id
Return a unique identifier of this recognizer.
analyze
analyze(
text: str,
entities: List[str],
nlp_artifacts: Optional[NlpArtifacts] = None,
regex_flags: Optional[int] = None,
) -> List[RecognizerResult]
Analyzes text to detect PII using regular expressions or deny-lists.
PARAMETER | DESCRIPTION |
---|---|
text
|
Text to be analyzed
TYPE:
|
entities
|
Entities this recognizer can detect
TYPE:
|
nlp_artifacts
|
Output values from the NLP engine
TYPE:
|
regex_flags
|
regex flags to be used in regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
|
Source code in presidio_analyzer/pattern_recognizer.py
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
|
enhance_using_context
enhance_using_context(
text: str,
raw_recognizer_results: List[RecognizerResult],
other_raw_recognizer_results: List[RecognizerResult],
nlp_artifacts: NlpArtifacts,
context: Optional[List[str]] = None,
) -> List[RecognizerResult]
Enhance confidence score using context of the entity.
Override this method in derived class in case a custom logic is needed, otherwise return value will be equal to raw_results.
in case a result score is boosted, derived class need to update result.recognition_metadata[RecognizerResult.IS_SCORE_ENHANCED_BY_CONTEXT_KEY]
PARAMETER | DESCRIPTION |
---|---|
text
|
The actual text that was analyzed
TYPE:
|
raw_recognizer_results
|
This recognizer's results, to be updated based on recognizer specific context.
TYPE:
|
other_raw_recognizer_results
|
Other recognizer results matched in the given text to allow related entity context enhancement
TYPE:
|
nlp_artifacts
|
The nlp artifacts contains elements such as lemmatized tokens for better accuracy of the context enhancement process
TYPE:
|
context
|
list of context words
TYPE:
|
Source code in presidio_analyzer/entity_recognizer.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
|
get_supported_entities
get_supported_entities() -> List[str]
Return the list of entities this recognizer can identify.
RETURNS | DESCRIPTION |
---|---|
List[str]
|
A list of the supported entities by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
119 120 121 122 123 124 125 |
|
get_supported_language
get_supported_language() -> str
Return the language this recognizer can support.
RETURNS | DESCRIPTION |
---|---|
str
|
A list of the supported language by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
127 128 129 130 131 132 133 |
|
get_version
get_version() -> str
Return the version of this recognizer.
RETURNS | DESCRIPTION |
---|---|
str
|
The current version of this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
135 136 137 138 139 140 141 |
|
to_dict
to_dict() -> Dict
Serialize instance into a dictionary.
Source code in presidio_analyzer/pattern_recognizer.py
254 255 256 257 258 259 260 261 262 263 264 |
|
from_dict
classmethod
from_dict(entity_recognizer_dict: Dict) -> PatternRecognizer
Create instance from a serialized dict.
Source code in presidio_analyzer/pattern_recognizer.py
266 267 268 269 270 271 272 273 274 |
|
remove_duplicates
staticmethod
remove_duplicates(results: List[RecognizerResult]) -> List[RecognizerResult]
Remove duplicate results.
Remove duplicates in case the two results have identical start and ends and types.
PARAMETER | DESCRIPTION |
---|---|
results
|
List[RecognizerResult]
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
List[RecognizerResult] |
Source code in presidio_analyzer/entity_recognizer.py
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
|
sanitize_value
staticmethod
sanitize_value(text: str, replacement_pairs: List[Tuple[str, str]]) -> str
Cleanse the input string of the replacement pairs specified as argument.
PARAMETER | DESCRIPTION |
---|---|
text
|
input string
TYPE:
|
replacement_pairs
|
pairs of what has to be replaced with which value
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
cleansed string |
Source code in presidio_analyzer/entity_recognizer.py
200 201 202 203 204 205 206 207 208 209 210 211 |
|
invalidate_result
invalidate_result(pattern_text: str) -> Optional[bool]
Logic to check for result invalidation by running pruning logic.
For example, each SSN number group should not consist of all the same digits.
PARAMETER | DESCRIPTION |
---|---|
pattern_text
|
the text to validated. Only the part in text that was detected by the regex engine
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Optional[bool]
|
A bool indicating whether the result is invalidated |
Source code in presidio_analyzer/pattern_recognizer.py
127 128 129 130 131 132 133 134 135 136 137 |
|
build_regex_explanation
staticmethod
build_regex_explanation(
recognizer_name: str,
pattern_name: str,
pattern: str,
original_score: float,
validation_result: bool,
regex_flags: int,
) -> AnalysisExplanation
Construct an explanation for why this entity was detected.
PARAMETER | DESCRIPTION |
---|---|
recognizer_name
|
Name of recognizer detecting the entity
TYPE:
|
pattern_name
|
Regex pattern name which detected the entity
TYPE:
|
pattern
|
Regex pattern logic
TYPE:
|
original_score
|
Score given by the recognizer
TYPE:
|
validation_result
|
Whether validation was used and its result
TYPE:
|
regex_flags
|
Regex flags used in the regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
AnalysisExplanation
|
Analysis explanation |
Source code in presidio_analyzer/pattern_recognizer.py
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
|
validate_result
validate_result(pattern_text: str) -> bool
Validate the pattern by using the control character.
Source code in presidio_analyzer/predefined_recognizers/es_nie_recognizer.py
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 |
|
EsNifRecognizer
Bases: PatternRecognizer
Recognize NIF number using regex and checksum.
PARAMETER | DESCRIPTION |
---|---|
patterns
|
List of patterns to be used by this recognizer
TYPE:
|
context
|
List of context words to increase confidence in detection
TYPE:
|
supported_language
|
Language this recognizer supports
TYPE:
|
supported_entity
|
The entity this recognizer can detect
TYPE:
|
replacement_pairs
|
List of tuples with potential replacement values for different strings to be used during pattern matching. This can allow a greater variety in input, for example by removing dashes or spaces.
TYPE:
|
METHOD | DESCRIPTION |
---|---|
analyze |
Analyzes text to detect PII using regular expressions or deny-lists. |
enhance_using_context |
Enhance confidence score using context of the entity. |
get_supported_entities |
Return the list of entities this recognizer can identify. |
get_supported_language |
Return the language this recognizer can support. |
get_version |
Return the version of this recognizer. |
to_dict |
Serialize instance into a dictionary. |
from_dict |
Create instance from a serialized dict. |
remove_duplicates |
Remove duplicate results. |
sanitize_value |
Cleanse the input string of the replacement pairs specified as argument. |
invalidate_result |
Logic to check for result invalidation by running pruning logic. |
build_regex_explanation |
Construct an explanation for why this entity was detected. |
Source code in presidio_analyzer/predefined_recognizers/es_nif_recognizer.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 |
|
id
property
id
Return a unique identifier of this recognizer.
analyze
analyze(
text: str,
entities: List[str],
nlp_artifacts: Optional[NlpArtifacts] = None,
regex_flags: Optional[int] = None,
) -> List[RecognizerResult]
Analyzes text to detect PII using regular expressions or deny-lists.
PARAMETER | DESCRIPTION |
---|---|
text
|
Text to be analyzed
TYPE:
|
entities
|
Entities this recognizer can detect
TYPE:
|
nlp_artifacts
|
Output values from the NLP engine
TYPE:
|
regex_flags
|
regex flags to be used in regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
|
Source code in presidio_analyzer/pattern_recognizer.py
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
|
enhance_using_context
enhance_using_context(
text: str,
raw_recognizer_results: List[RecognizerResult],
other_raw_recognizer_results: List[RecognizerResult],
nlp_artifacts: NlpArtifacts,
context: Optional[List[str]] = None,
) -> List[RecognizerResult]
Enhance confidence score using context of the entity.
Override this method in derived class in case a custom logic is needed, otherwise return value will be equal to raw_results.
in case a result score is boosted, derived class need to update result.recognition_metadata[RecognizerResult.IS_SCORE_ENHANCED_BY_CONTEXT_KEY]
PARAMETER | DESCRIPTION |
---|---|
text
|
The actual text that was analyzed
TYPE:
|
raw_recognizer_results
|
This recognizer's results, to be updated based on recognizer specific context.
TYPE:
|
other_raw_recognizer_results
|
Other recognizer results matched in the given text to allow related entity context enhancement
TYPE:
|
nlp_artifacts
|
The nlp artifacts contains elements such as lemmatized tokens for better accuracy of the context enhancement process
TYPE:
|
context
|
list of context words
TYPE:
|
Source code in presidio_analyzer/entity_recognizer.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
|
get_supported_entities
get_supported_entities() -> List[str]
Return the list of entities this recognizer can identify.
RETURNS | DESCRIPTION |
---|---|
List[str]
|
A list of the supported entities by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
119 120 121 122 123 124 125 |
|
get_supported_language
get_supported_language() -> str
Return the language this recognizer can support.
RETURNS | DESCRIPTION |
---|---|
str
|
A list of the supported language by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
127 128 129 130 131 132 133 |
|
get_version
get_version() -> str
Return the version of this recognizer.
RETURNS | DESCRIPTION |
---|---|
str
|
The current version of this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
135 136 137 138 139 140 141 |
|
to_dict
to_dict() -> Dict
Serialize instance into a dictionary.
Source code in presidio_analyzer/pattern_recognizer.py
254 255 256 257 258 259 260 261 262 263 264 |
|
from_dict
classmethod
from_dict(entity_recognizer_dict: Dict) -> PatternRecognizer
Create instance from a serialized dict.
Source code in presidio_analyzer/pattern_recognizer.py
266 267 268 269 270 271 272 273 274 |
|
remove_duplicates
staticmethod
remove_duplicates(results: List[RecognizerResult]) -> List[RecognizerResult]
Remove duplicate results.
Remove duplicates in case the two results have identical start and ends and types.
PARAMETER | DESCRIPTION |
---|---|
results
|
List[RecognizerResult]
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
List[RecognizerResult] |
Source code in presidio_analyzer/entity_recognizer.py
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
|
sanitize_value
staticmethod
sanitize_value(text: str, replacement_pairs: List[Tuple[str, str]]) -> str
Cleanse the input string of the replacement pairs specified as argument.
PARAMETER | DESCRIPTION |
---|---|
text
|
input string
TYPE:
|
replacement_pairs
|
pairs of what has to be replaced with which value
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
cleansed string |
Source code in presidio_analyzer/entity_recognizer.py
200 201 202 203 204 205 206 207 208 209 210 211 |
|
invalidate_result
invalidate_result(pattern_text: str) -> Optional[bool]
Logic to check for result invalidation by running pruning logic.
For example, each SSN number group should not consist of all the same digits.
PARAMETER | DESCRIPTION |
---|---|
pattern_text
|
the text to validated. Only the part in text that was detected by the regex engine
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Optional[bool]
|
A bool indicating whether the result is invalidated |
Source code in presidio_analyzer/pattern_recognizer.py
127 128 129 130 131 132 133 134 135 136 137 |
|
build_regex_explanation
staticmethod
build_regex_explanation(
recognizer_name: str,
pattern_name: str,
pattern: str,
original_score: float,
validation_result: bool,
regex_flags: int,
) -> AnalysisExplanation
Construct an explanation for why this entity was detected.
PARAMETER | DESCRIPTION |
---|---|
recognizer_name
|
Name of recognizer detecting the entity
TYPE:
|
pattern_name
|
Regex pattern name which detected the entity
TYPE:
|
pattern
|
Regex pattern logic
TYPE:
|
original_score
|
Score given by the recognizer
TYPE:
|
validation_result
|
Whether validation was used and its result
TYPE:
|
regex_flags
|
Regex flags used in the regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
AnalysisExplanation
|
Analysis explanation |
Source code in presidio_analyzer/pattern_recognizer.py
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
|
FiPersonalIdentityCodeRecognizer
Bases: PatternRecognizer
Recognizes and validates Finnish Personal Identity Codes (Henkilötunnus).
PARAMETER | DESCRIPTION |
---|---|
patterns
|
List of patterns to be used by this recognizer
TYPE:
|
context
|
List of context words to increase confidence in detection
TYPE:
|
supported_language
|
Language this recognizer supports
TYPE:
|
supported_entity
|
The entity this recognizer can detect
TYPE:
|
METHOD | DESCRIPTION |
---|---|
analyze |
Analyzes text to detect PII using regular expressions or deny-lists. |
enhance_using_context |
Enhance confidence score using context of the entity. |
get_supported_entities |
Return the list of entities this recognizer can identify. |
get_supported_language |
Return the language this recognizer can support. |
get_version |
Return the version of this recognizer. |
to_dict |
Serialize instance into a dictionary. |
from_dict |
Create instance from a serialized dict. |
remove_duplicates |
Remove duplicate results. |
sanitize_value |
Cleanse the input string of the replacement pairs specified as argument. |
invalidate_result |
Logic to check for result invalidation by running pruning logic. |
build_regex_explanation |
Construct an explanation for why this entity was detected. |
validate_result |
Validate the pattern by using the control character. |
Source code in presidio_analyzer/predefined_recognizers/fi_personal_identity_code_recognizer.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 |
|
id
property
id
Return a unique identifier of this recognizer.
analyze
analyze(
text: str,
entities: List[str],
nlp_artifacts: Optional[NlpArtifacts] = None,
regex_flags: Optional[int] = None,
) -> List[RecognizerResult]
Analyzes text to detect PII using regular expressions or deny-lists.
PARAMETER | DESCRIPTION |
---|---|
text
|
Text to be analyzed
TYPE:
|
entities
|
Entities this recognizer can detect
TYPE:
|
nlp_artifacts
|
Output values from the NLP engine
TYPE:
|
regex_flags
|
regex flags to be used in regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
|
Source code in presidio_analyzer/pattern_recognizer.py
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
|
enhance_using_context
enhance_using_context(
text: str,
raw_recognizer_results: List[RecognizerResult],
other_raw_recognizer_results: List[RecognizerResult],
nlp_artifacts: NlpArtifacts,
context: Optional[List[str]] = None,
) -> List[RecognizerResult]
Enhance confidence score using context of the entity.
Override this method in derived class in case a custom logic is needed, otherwise return value will be equal to raw_results.
in case a result score is boosted, derived class need to update result.recognition_metadata[RecognizerResult.IS_SCORE_ENHANCED_BY_CONTEXT_KEY]
PARAMETER | DESCRIPTION |
---|---|
text
|
The actual text that was analyzed
TYPE:
|
raw_recognizer_results
|
This recognizer's results, to be updated based on recognizer specific context.
TYPE:
|
other_raw_recognizer_results
|
Other recognizer results matched in the given text to allow related entity context enhancement
TYPE:
|
nlp_artifacts
|
The nlp artifacts contains elements such as lemmatized tokens for better accuracy of the context enhancement process
TYPE:
|
context
|
list of context words
TYPE:
|
Source code in presidio_analyzer/entity_recognizer.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
|
get_supported_entities
get_supported_entities() -> List[str]
Return the list of entities this recognizer can identify.
RETURNS | DESCRIPTION |
---|---|
List[str]
|
A list of the supported entities by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
119 120 121 122 123 124 125 |
|
get_supported_language
get_supported_language() -> str
Return the language this recognizer can support.
RETURNS | DESCRIPTION |
---|---|
str
|
A list of the supported language by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
127 128 129 130 131 132 133 |
|
get_version
get_version() -> str
Return the version of this recognizer.
RETURNS | DESCRIPTION |
---|---|
str
|
The current version of this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
135 136 137 138 139 140 141 |
|
to_dict
to_dict() -> Dict
Serialize instance into a dictionary.
Source code in presidio_analyzer/pattern_recognizer.py
254 255 256 257 258 259 260 261 262 263 264 |
|
from_dict
classmethod
from_dict(entity_recognizer_dict: Dict) -> PatternRecognizer
Create instance from a serialized dict.
Source code in presidio_analyzer/pattern_recognizer.py
266 267 268 269 270 271 272 273 274 |
|
remove_duplicates
staticmethod
remove_duplicates(results: List[RecognizerResult]) -> List[RecognizerResult]
Remove duplicate results.
Remove duplicates in case the two results have identical start and ends and types.
PARAMETER | DESCRIPTION |
---|---|
results
|
List[RecognizerResult]
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
List[RecognizerResult] |
Source code in presidio_analyzer/entity_recognizer.py
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
|
sanitize_value
staticmethod
sanitize_value(text: str, replacement_pairs: List[Tuple[str, str]]) -> str
Cleanse the input string of the replacement pairs specified as argument.
PARAMETER | DESCRIPTION |
---|---|
text
|
input string
TYPE:
|
replacement_pairs
|
pairs of what has to be replaced with which value
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
cleansed string |
Source code in presidio_analyzer/entity_recognizer.py
200 201 202 203 204 205 206 207 208 209 210 211 |
|
invalidate_result
invalidate_result(pattern_text: str) -> Optional[bool]
Logic to check for result invalidation by running pruning logic.
For example, each SSN number group should not consist of all the same digits.
PARAMETER | DESCRIPTION |
---|---|
pattern_text
|
the text to validated. Only the part in text that was detected by the regex engine
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Optional[bool]
|
A bool indicating whether the result is invalidated |
Source code in presidio_analyzer/pattern_recognizer.py
127 128 129 130 131 132 133 134 135 136 137 |
|
build_regex_explanation
staticmethod
build_regex_explanation(
recognizer_name: str,
pattern_name: str,
pattern: str,
original_score: float,
validation_result: bool,
regex_flags: int,
) -> AnalysisExplanation
Construct an explanation for why this entity was detected.
PARAMETER | DESCRIPTION |
---|---|
recognizer_name
|
Name of recognizer detecting the entity
TYPE:
|
pattern_name
|
Regex pattern name which detected the entity
TYPE:
|
pattern
|
Regex pattern logic
TYPE:
|
original_score
|
Score given by the recognizer
TYPE:
|
validation_result
|
Whether validation was used and its result
TYPE:
|
regex_flags
|
Regex flags used in the regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
AnalysisExplanation
|
Analysis explanation |
Source code in presidio_analyzer/pattern_recognizer.py
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
|
validate_result
validate_result(pattern_text: str) -> Optional[bool]
Validate the pattern by using the control character.
Source code in presidio_analyzer/predefined_recognizers/fi_personal_identity_code_recognizer.py
47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 |
|
GLiNERRecognizer
Bases: LocalRecognizer
GLiNER model based entity recognizer.
METHOD | DESCRIPTION |
---|---|
enhance_using_context |
Enhance confidence score using context of the entity. |
get_supported_entities |
Return the list of entities this recognizer can identify. |
get_supported_language |
Return the language this recognizer can support. |
get_version |
Return the version of this recognizer. |
to_dict |
Serialize self to dictionary. |
from_dict |
Create EntityRecognizer from a dict input. |
remove_duplicates |
Remove duplicate results. |
sanitize_value |
Cleanse the input string of the replacement pairs specified as argument. |
load |
Load the GLiNER model. |
analyze |
Analyze text to identify entities using a GLiNER model. |
Source code in presidio_analyzer/predefined_recognizers/gliner_recognizer.py
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 |
|
id
property
id
Return a unique identifier of this recognizer.
enhance_using_context
enhance_using_context(
text: str,
raw_recognizer_results: List[RecognizerResult],
other_raw_recognizer_results: List[RecognizerResult],
nlp_artifacts: NlpArtifacts,
context: Optional[List[str]] = None,
) -> List[RecognizerResult]
Enhance confidence score using context of the entity.
Override this method in derived class in case a custom logic is needed, otherwise return value will be equal to raw_results.
in case a result score is boosted, derived class need to update result.recognition_metadata[RecognizerResult.IS_SCORE_ENHANCED_BY_CONTEXT_KEY]
PARAMETER | DESCRIPTION |
---|---|
text
|
The actual text that was analyzed
TYPE:
|
raw_recognizer_results
|
This recognizer's results, to be updated based on recognizer specific context.
TYPE:
|
other_raw_recognizer_results
|
Other recognizer results matched in the given text to allow related entity context enhancement
TYPE:
|
nlp_artifacts
|
The nlp artifacts contains elements such as lemmatized tokens for better accuracy of the context enhancement process
TYPE:
|
context
|
list of context words
TYPE:
|
Source code in presidio_analyzer/entity_recognizer.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
|
get_supported_entities
get_supported_entities() -> List[str]
Return the list of entities this recognizer can identify.
RETURNS | DESCRIPTION |
---|---|
List[str]
|
A list of the supported entities by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
119 120 121 122 123 124 125 |
|
get_supported_language
get_supported_language() -> str
Return the language this recognizer can support.
RETURNS | DESCRIPTION |
---|---|
str
|
A list of the supported language by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
127 128 129 130 131 132 133 |
|
get_version
get_version() -> str
Return the version of this recognizer.
RETURNS | DESCRIPTION |
---|---|
str
|
The current version of this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
135 136 137 138 139 140 141 |
|
to_dict
to_dict() -> Dict
Serialize self to dictionary.
RETURNS | DESCRIPTION |
---|---|
Dict
|
a dictionary |
Source code in presidio_analyzer/entity_recognizer.py
143 144 145 146 147 148 149 150 151 152 153 154 155 |
|
from_dict
classmethod
from_dict(entity_recognizer_dict: Dict) -> EntityRecognizer
Create EntityRecognizer from a dict input.
PARAMETER | DESCRIPTION |
---|---|
entity_recognizer_dict
|
Dict containing keys and values for instantiation
TYPE:
|
Source code in presidio_analyzer/entity_recognizer.py
157 158 159 160 161 162 163 164 |
|
remove_duplicates
staticmethod
remove_duplicates(results: List[RecognizerResult]) -> List[RecognizerResult]
Remove duplicate results.
Remove duplicates in case the two results have identical start and ends and types.
PARAMETER | DESCRIPTION |
---|---|
results
|
List[RecognizerResult]
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
List[RecognizerResult] |
Source code in presidio_analyzer/entity_recognizer.py
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
|
sanitize_value
staticmethod
sanitize_value(text: str, replacement_pairs: List[Tuple[str, str]]) -> str
Cleanse the input string of the replacement pairs specified as argument.
PARAMETER | DESCRIPTION |
---|---|
text
|
input string
TYPE:
|
replacement_pairs
|
pairs of what has to be replaced with which value
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
cleansed string |
Source code in presidio_analyzer/entity_recognizer.py
200 201 202 203 204 205 206 207 208 209 210 211 |
|
load
load() -> None
Load the GLiNER model.
Source code in presidio_analyzer/predefined_recognizers/gliner_recognizer.py
103 104 105 106 107 |
|
analyze
analyze(
text: str, entities: List[str], nlp_artifacts: Optional[NlpArtifacts] = None
) -> List[RecognizerResult]
Analyze text to identify entities using a GLiNER model.
PARAMETER | DESCRIPTION |
---|---|
text
|
The text to be analyzed
TYPE:
|
entities
|
The list of entities this recognizer is requested to return
TYPE:
|
nlp_artifacts
|
N/A for this recognizer
TYPE:
|
Source code in presidio_analyzer/predefined_recognizers/gliner_recognizer.py
109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 |
|
IbanRecognizer
Bases: PatternRecognizer
Recognize IBAN code using regex and checksum.
PARAMETER | DESCRIPTION |
---|---|
patterns
|
List of patterns to be used by this recognizer
TYPE:
|
context
|
List of context words to increase confidence in detection
TYPE:
|
supported_language
|
Language this recognizer supports
TYPE:
|
supported_entity
|
The entity this recognizer can detect
TYPE:
|
exact_match
|
Whether patterns should be exactly matched or not
TYPE:
|
bos_eos
|
Tuple of strings for beginning of string (BOS) and end of string (EOS)
TYPE:
|
regex_flags
|
Regex flags options
TYPE:
|
replacement_pairs
|
List of tuples with potential replacement values for different strings to be used during pattern matching. This can allow a greater variety in input, for example by removing dashes or spaces.
TYPE:
|
METHOD | DESCRIPTION |
---|---|
enhance_using_context |
Enhance confidence score using context of the entity. |
get_supported_entities |
Return the list of entities this recognizer can identify. |
get_supported_language |
Return the language this recognizer can support. |
get_version |
Return the version of this recognizer. |
to_dict |
Serialize instance into a dictionary. |
from_dict |
Create instance from a serialized dict. |
remove_duplicates |
Remove duplicate results. |
sanitize_value |
Cleanse the input string of the replacement pairs specified as argument. |
invalidate_result |
Logic to check for result invalidation by running pruning logic. |
build_regex_explanation |
Construct an explanation for why this entity was detected. |
analyze |
Analyze IBAN. |
Source code in presidio_analyzer/predefined_recognizers/iban_recognizer.py
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 |
|
id
property
id
Return a unique identifier of this recognizer.
enhance_using_context
enhance_using_context(
text: str,
raw_recognizer_results: List[RecognizerResult],
other_raw_recognizer_results: List[RecognizerResult],
nlp_artifacts: NlpArtifacts,
context: Optional[List[str]] = None,
) -> List[RecognizerResult]
Enhance confidence score using context of the entity.
Override this method in derived class in case a custom logic is needed, otherwise return value will be equal to raw_results.
in case a result score is boosted, derived class need to update result.recognition_metadata[RecognizerResult.IS_SCORE_ENHANCED_BY_CONTEXT_KEY]
PARAMETER | DESCRIPTION |
---|---|
text
|
The actual text that was analyzed
TYPE:
|
raw_recognizer_results
|
This recognizer's results, to be updated based on recognizer specific context.
TYPE:
|
other_raw_recognizer_results
|
Other recognizer results matched in the given text to allow related entity context enhancement
TYPE:
|
nlp_artifacts
|
The nlp artifacts contains elements such as lemmatized tokens for better accuracy of the context enhancement process
TYPE:
|
context
|
list of context words
TYPE:
|
Source code in presidio_analyzer/entity_recognizer.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
|
get_supported_entities
get_supported_entities() -> List[str]
Return the list of entities this recognizer can identify.
RETURNS | DESCRIPTION |
---|---|
List[str]
|
A list of the supported entities by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
119 120 121 122 123 124 125 |
|
get_supported_language
get_supported_language() -> str
Return the language this recognizer can support.
RETURNS | DESCRIPTION |
---|---|
str
|
A list of the supported language by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
127 128 129 130 131 132 133 |
|
get_version
get_version() -> str
Return the version of this recognizer.
RETURNS | DESCRIPTION |
---|---|
str
|
The current version of this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
135 136 137 138 139 140 141 |
|
to_dict
to_dict() -> Dict
Serialize instance into a dictionary.
Source code in presidio_analyzer/pattern_recognizer.py
254 255 256 257 258 259 260 261 262 263 264 |
|
from_dict
classmethod
from_dict(entity_recognizer_dict: Dict) -> PatternRecognizer
Create instance from a serialized dict.
Source code in presidio_analyzer/pattern_recognizer.py
266 267 268 269 270 271 272 273 274 |
|
remove_duplicates
staticmethod
remove_duplicates(results: List[RecognizerResult]) -> List[RecognizerResult]
Remove duplicate results.
Remove duplicates in case the two results have identical start and ends and types.
PARAMETER | DESCRIPTION |
---|---|
results
|
List[RecognizerResult]
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
List[RecognizerResult] |
Source code in presidio_analyzer/entity_recognizer.py
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
|
sanitize_value
staticmethod
sanitize_value(text: str, replacement_pairs: List[Tuple[str, str]]) -> str
Cleanse the input string of the replacement pairs specified as argument.
PARAMETER | DESCRIPTION |
---|---|
text
|
input string
TYPE:
|
replacement_pairs
|
pairs of what has to be replaced with which value
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
cleansed string |
Source code in presidio_analyzer/entity_recognizer.py
200 201 202 203 204 205 206 207 208 209 210 211 |
|
invalidate_result
invalidate_result(pattern_text: str) -> Optional[bool]
Logic to check for result invalidation by running pruning logic.
For example, each SSN number group should not consist of all the same digits.
PARAMETER | DESCRIPTION |
---|---|
pattern_text
|
the text to validated. Only the part in text that was detected by the regex engine
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Optional[bool]
|
A bool indicating whether the result is invalidated |
Source code in presidio_analyzer/pattern_recognizer.py
127 128 129 130 131 132 133 134 135 136 137 |
|
build_regex_explanation
staticmethod
build_regex_explanation(
recognizer_name: str,
pattern_name: str,
pattern: str,
original_score: float,
validation_result: bool,
regex_flags: int,
) -> AnalysisExplanation
Construct an explanation for why this entity was detected.
PARAMETER | DESCRIPTION |
---|---|
recognizer_name
|
Name of recognizer detecting the entity
TYPE:
|
pattern_name
|
Regex pattern name which detected the entity
TYPE:
|
pattern
|
Regex pattern logic
TYPE:
|
original_score
|
Score given by the recognizer
TYPE:
|
validation_result
|
Whether validation was used and its result
TYPE:
|
regex_flags
|
Regex flags used in the regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
AnalysisExplanation
|
Analysis explanation |
Source code in presidio_analyzer/pattern_recognizer.py
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
|
analyze
analyze(
text: str,
entities: List[str],
nlp_artifacts: NlpArtifacts = None,
regex_flags: int = None,
) -> List[RecognizerResult]
Analyze IBAN.
Source code in presidio_analyzer/predefined_recognizers/iban_recognizer.py
101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 |
|
InAadhaarRecognizer
Bases: PatternRecognizer
Recognizes Indian UIDAI Person Identification Number ("AADHAAR").
Reference: https://en.wikipedia.org/wiki/Aadhaar A 12 digit unique number that is issued to each individual by Government of India
PARAMETER | DESCRIPTION |
---|---|
patterns
|
List of patterns to be used by this recognizer
TYPE:
|
context
|
List of context words to increase confidence in detection
TYPE:
|
supported_language
|
Language this recognizer supports
TYPE:
|
supported_entity
|
The entity this recognizer can detect
TYPE:
|
replacement_pairs
|
List of tuples with potential replacement values for different strings to be used during pattern matching. This can allow a greater variety in input, for example by removing dashes or spaces.
TYPE:
|
METHOD | DESCRIPTION |
---|---|
analyze |
Analyzes text to detect PII using regular expressions or deny-lists. |
enhance_using_context |
Enhance confidence score using context of the entity. |
get_supported_entities |
Return the list of entities this recognizer can identify. |
get_supported_language |
Return the language this recognizer can support. |
get_version |
Return the version of this recognizer. |
to_dict |
Serialize instance into a dictionary. |
from_dict |
Create instance from a serialized dict. |
remove_duplicates |
Remove duplicate results. |
sanitize_value |
Cleanse the input string of the replacement pairs specified as argument. |
invalidate_result |
Logic to check for result invalidation by running pruning logic. |
build_regex_explanation |
Construct an explanation for why this entity was detected. |
validate_result |
Determine absolute value based on calculation. |
Source code in presidio_analyzer/predefined_recognizers/in_aadhaar_recognizer.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 |
|
id
property
id
Return a unique identifier of this recognizer.
analyze
analyze(
text: str,
entities: List[str],
nlp_artifacts: Optional[NlpArtifacts] = None,
regex_flags: Optional[int] = None,
) -> List[RecognizerResult]
Analyzes text to detect PII using regular expressions or deny-lists.
PARAMETER | DESCRIPTION |
---|---|
text
|
Text to be analyzed
TYPE:
|
entities
|
Entities this recognizer can detect
TYPE:
|
nlp_artifacts
|
Output values from the NLP engine
TYPE:
|
regex_flags
|
regex flags to be used in regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
|
Source code in presidio_analyzer/pattern_recognizer.py
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
|
enhance_using_context
enhance_using_context(
text: str,
raw_recognizer_results: List[RecognizerResult],
other_raw_recognizer_results: List[RecognizerResult],
nlp_artifacts: NlpArtifacts,
context: Optional[List[str]] = None,
) -> List[RecognizerResult]
Enhance confidence score using context of the entity.
Override this method in derived class in case a custom logic is needed, otherwise return value will be equal to raw_results.
in case a result score is boosted, derived class need to update result.recognition_metadata[RecognizerResult.IS_SCORE_ENHANCED_BY_CONTEXT_KEY]
PARAMETER | DESCRIPTION |
---|---|
text
|
The actual text that was analyzed
TYPE:
|
raw_recognizer_results
|
This recognizer's results, to be updated based on recognizer specific context.
TYPE:
|
other_raw_recognizer_results
|
Other recognizer results matched in the given text to allow related entity context enhancement
TYPE:
|
nlp_artifacts
|
The nlp artifacts contains elements such as lemmatized tokens for better accuracy of the context enhancement process
TYPE:
|
context
|
list of context words
TYPE:
|
Source code in presidio_analyzer/entity_recognizer.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
|
get_supported_entities
get_supported_entities() -> List[str]
Return the list of entities this recognizer can identify.
RETURNS | DESCRIPTION |
---|---|
List[str]
|
A list of the supported entities by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
119 120 121 122 123 124 125 |
|
get_supported_language
get_supported_language() -> str
Return the language this recognizer can support.
RETURNS | DESCRIPTION |
---|---|
str
|
A list of the supported language by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
127 128 129 130 131 132 133 |
|
get_version
get_version() -> str
Return the version of this recognizer.
RETURNS | DESCRIPTION |
---|---|
str
|
The current version of this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
135 136 137 138 139 140 141 |
|
to_dict
to_dict() -> Dict
Serialize instance into a dictionary.
Source code in presidio_analyzer/pattern_recognizer.py
254 255 256 257 258 259 260 261 262 263 264 |
|
from_dict
classmethod
from_dict(entity_recognizer_dict: Dict) -> PatternRecognizer
Create instance from a serialized dict.
Source code in presidio_analyzer/pattern_recognizer.py
266 267 268 269 270 271 272 273 274 |
|
remove_duplicates
staticmethod
remove_duplicates(results: List[RecognizerResult]) -> List[RecognizerResult]
Remove duplicate results.
Remove duplicates in case the two results have identical start and ends and types.
PARAMETER | DESCRIPTION |
---|---|
results
|
List[RecognizerResult]
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
List[RecognizerResult] |
Source code in presidio_analyzer/entity_recognizer.py
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
|
sanitize_value
staticmethod
sanitize_value(text: str, replacement_pairs: List[Tuple[str, str]]) -> str
Cleanse the input string of the replacement pairs specified as argument.
PARAMETER | DESCRIPTION |
---|---|
text
|
input string
TYPE:
|
replacement_pairs
|
pairs of what has to be replaced with which value
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
cleansed string |
Source code in presidio_analyzer/entity_recognizer.py
200 201 202 203 204 205 206 207 208 209 210 211 |
|
invalidate_result
invalidate_result(pattern_text: str) -> Optional[bool]
Logic to check for result invalidation by running pruning logic.
For example, each SSN number group should not consist of all the same digits.
PARAMETER | DESCRIPTION |
---|---|
pattern_text
|
the text to validated. Only the part in text that was detected by the regex engine
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Optional[bool]
|
A bool indicating whether the result is invalidated |
Source code in presidio_analyzer/pattern_recognizer.py
127 128 129 130 131 132 133 134 135 136 137 |
|
build_regex_explanation
staticmethod
build_regex_explanation(
recognizer_name: str,
pattern_name: str,
pattern: str,
original_score: float,
validation_result: bool,
regex_flags: int,
) -> AnalysisExplanation
Construct an explanation for why this entity was detected.
PARAMETER | DESCRIPTION |
---|---|
recognizer_name
|
Name of recognizer detecting the entity
TYPE:
|
pattern_name
|
Regex pattern name which detected the entity
TYPE:
|
pattern
|
Regex pattern logic
TYPE:
|
original_score
|
Score given by the recognizer
TYPE:
|
validation_result
|
Whether validation was used and its result
TYPE:
|
regex_flags
|
Regex flags used in the regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
AnalysisExplanation
|
Analysis explanation |
Source code in presidio_analyzer/pattern_recognizer.py
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
|
validate_result
validate_result(pattern_text: str) -> bool
Determine absolute value based on calculation.
Source code in presidio_analyzer/predefined_recognizers/in_aadhaar_recognizer.py
58 59 60 61 62 63 |
|
InPanRecognizer
Bases: PatternRecognizer
Recognizes Indian Permanent Account Number ("PAN").
The Permanent Account Number (PAN) is a ten digit alpha-numeric code with the last digit being a check digit calculated using a modified modulus 10 calculation. This recognizer identifies PAN using regex and context words. Reference: https://en.wikipedia.org/wiki/Permanent_account_number, https://incometaxindia.gov.in/Forms/tps/1.Permanent%20Account%20Number%20(PAN).pdf
PARAMETER | DESCRIPTION |
---|---|
patterns
|
List of patterns to be used by this recognizer
TYPE:
|
context
|
List of context words to increase confidence in detection
TYPE:
|
supported_language
|
Language this recognizer supports
TYPE:
|
supported_entity
|
The entity this recognizer can detect
TYPE:
|
replacement_pairs
|
List of tuples with potential replacement values for different strings to be used during pattern matching. This can allow a greater variety in input, for example by removing dashes or spaces.
TYPE:
|
METHOD | DESCRIPTION |
---|---|
analyze |
Analyzes text to detect PII using regular expressions or deny-lists. |
enhance_using_context |
Enhance confidence score using context of the entity. |
get_supported_entities |
Return the list of entities this recognizer can identify. |
get_supported_language |
Return the language this recognizer can support. |
get_version |
Return the version of this recognizer. |
to_dict |
Serialize instance into a dictionary. |
from_dict |
Create instance from a serialized dict. |
remove_duplicates |
Remove duplicate results. |
sanitize_value |
Cleanse the input string of the replacement pairs specified as argument. |
validate_result |
Validate the pattern logic e.g., by running checksum on a detected pattern. |
invalidate_result |
Logic to check for result invalidation by running pruning logic. |
build_regex_explanation |
Construct an explanation for why this entity was detected. |
Source code in presidio_analyzer/predefined_recognizers/in_pan_recognizer.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 |
|
id
property
id
Return a unique identifier of this recognizer.
analyze
analyze(
text: str,
entities: List[str],
nlp_artifacts: Optional[NlpArtifacts] = None,
regex_flags: Optional[int] = None,
) -> List[RecognizerResult]
Analyzes text to detect PII using regular expressions or deny-lists.
PARAMETER | DESCRIPTION |
---|---|
text
|
Text to be analyzed
TYPE:
|
entities
|
Entities this recognizer can detect
TYPE:
|
nlp_artifacts
|
Output values from the NLP engine
TYPE:
|
regex_flags
|
regex flags to be used in regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
|
Source code in presidio_analyzer/pattern_recognizer.py
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
|
enhance_using_context
enhance_using_context(
text: str,
raw_recognizer_results: List[RecognizerResult],
other_raw_recognizer_results: List[RecognizerResult],
nlp_artifacts: NlpArtifacts,
context: Optional[List[str]] = None,
) -> List[RecognizerResult]
Enhance confidence score using context of the entity.
Override this method in derived class in case a custom logic is needed, otherwise return value will be equal to raw_results.
in case a result score is boosted, derived class need to update result.recognition_metadata[RecognizerResult.IS_SCORE_ENHANCED_BY_CONTEXT_KEY]
PARAMETER | DESCRIPTION |
---|---|
text
|
The actual text that was analyzed
TYPE:
|
raw_recognizer_results
|
This recognizer's results, to be updated based on recognizer specific context.
TYPE:
|
other_raw_recognizer_results
|
Other recognizer results matched in the given text to allow related entity context enhancement
TYPE:
|
nlp_artifacts
|
The nlp artifacts contains elements such as lemmatized tokens for better accuracy of the context enhancement process
TYPE:
|
context
|
list of context words
TYPE:
|
Source code in presidio_analyzer/entity_recognizer.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
|
get_supported_entities
get_supported_entities() -> List[str]
Return the list of entities this recognizer can identify.
RETURNS | DESCRIPTION |
---|---|
List[str]
|
A list of the supported entities by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
119 120 121 122 123 124 125 |
|
get_supported_language
get_supported_language() -> str
Return the language this recognizer can support.
RETURNS | DESCRIPTION |
---|---|
str
|
A list of the supported language by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
127 128 129 130 131 132 133 |
|
get_version
get_version() -> str
Return the version of this recognizer.
RETURNS | DESCRIPTION |
---|---|
str
|
The current version of this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
135 136 137 138 139 140 141 |
|
to_dict
to_dict() -> Dict
Serialize instance into a dictionary.
Source code in presidio_analyzer/pattern_recognizer.py
254 255 256 257 258 259 260 261 262 263 264 |
|
from_dict
classmethod
from_dict(entity_recognizer_dict: Dict) -> PatternRecognizer
Create instance from a serialized dict.
Source code in presidio_analyzer/pattern_recognizer.py
266 267 268 269 270 271 272 273 274 |
|
remove_duplicates
staticmethod
remove_duplicates(results: List[RecognizerResult]) -> List[RecognizerResult]
Remove duplicate results.
Remove duplicates in case the two results have identical start and ends and types.
PARAMETER | DESCRIPTION |
---|---|
results
|
List[RecognizerResult]
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
List[RecognizerResult] |
Source code in presidio_analyzer/entity_recognizer.py
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
|
sanitize_value
staticmethod
sanitize_value(text: str, replacement_pairs: List[Tuple[str, str]]) -> str
Cleanse the input string of the replacement pairs specified as argument.
PARAMETER | DESCRIPTION |
---|---|
text
|
input string
TYPE:
|
replacement_pairs
|
pairs of what has to be replaced with which value
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
cleansed string |
Source code in presidio_analyzer/entity_recognizer.py
200 201 202 203 204 205 206 207 208 209 210 211 |
|
validate_result
validate_result(pattern_text: str) -> Optional[bool]
Validate the pattern logic e.g., by running checksum on a detected pattern.
PARAMETER | DESCRIPTION |
---|---|
pattern_text
|
the text to validated. Only the part in text that was detected by the regex engine
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Optional[bool]
|
A bool indicating whether the validation was successful. |
Source code in presidio_analyzer/pattern_recognizer.py
117 118 119 120 121 122 123 124 125 |
|
invalidate_result
invalidate_result(pattern_text: str) -> Optional[bool]
Logic to check for result invalidation by running pruning logic.
For example, each SSN number group should not consist of all the same digits.
PARAMETER | DESCRIPTION |
---|---|
pattern_text
|
the text to validated. Only the part in text that was detected by the regex engine
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Optional[bool]
|
A bool indicating whether the result is invalidated |
Source code in presidio_analyzer/pattern_recognizer.py
127 128 129 130 131 132 133 134 135 136 137 |
|
build_regex_explanation
staticmethod
build_regex_explanation(
recognizer_name: str,
pattern_name: str,
pattern: str,
original_score: float,
validation_result: bool,
regex_flags: int,
) -> AnalysisExplanation
Construct an explanation for why this entity was detected.
PARAMETER | DESCRIPTION |
---|---|
recognizer_name
|
Name of recognizer detecting the entity
TYPE:
|
pattern_name
|
Regex pattern name which detected the entity
TYPE:
|
pattern
|
Regex pattern logic
TYPE:
|
original_score
|
Score given by the recognizer
TYPE:
|
validation_result
|
Whether validation was used and its result
TYPE:
|
regex_flags
|
Regex flags used in the regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
AnalysisExplanation
|
Analysis explanation |
Source code in presidio_analyzer/pattern_recognizer.py
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
|
InPassportRecognizer
Bases: PatternRecognizer
Recognizes Indian Passport Number.
Indian Passport Number is a eight digit alphanumeric number.
Reference: https://www.bajajallianz.com/blog/travel-insurance-articles/where-is-passport-number-in-indian-passport.html
PARAMETER | DESCRIPTION |
---|---|
patterns
|
List of patterns to be used by this recognizer
TYPE:
|
context
|
List of context words to increase confidence in detection
TYPE:
|
supported_language
|
Language this recognizer supports
TYPE:
|
supported_entity
|
The entity this recognizer can detect
TYPE:
|
METHOD | DESCRIPTION |
---|---|
analyze |
Analyzes text to detect PII using regular expressions or deny-lists. |
enhance_using_context |
Enhance confidence score using context of the entity. |
get_supported_entities |
Return the list of entities this recognizer can identify. |
get_supported_language |
Return the language this recognizer can support. |
get_version |
Return the version of this recognizer. |
to_dict |
Serialize instance into a dictionary. |
from_dict |
Create instance from a serialized dict. |
remove_duplicates |
Remove duplicate results. |
sanitize_value |
Cleanse the input string of the replacement pairs specified as argument. |
validate_result |
Validate the pattern logic e.g., by running checksum on a detected pattern. |
invalidate_result |
Logic to check for result invalidation by running pruning logic. |
build_regex_explanation |
Construct an explanation for why this entity was detected. |
Source code in presidio_analyzer/predefined_recognizers/in_passport_recognizer.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
|
id
property
id
Return a unique identifier of this recognizer.
analyze
analyze(
text: str,
entities: List[str],
nlp_artifacts: Optional[NlpArtifacts] = None,
regex_flags: Optional[int] = None,
) -> List[RecognizerResult]
Analyzes text to detect PII using regular expressions or deny-lists.
PARAMETER | DESCRIPTION |
---|---|
text
|
Text to be analyzed
TYPE:
|
entities
|
Entities this recognizer can detect
TYPE:
|
nlp_artifacts
|
Output values from the NLP engine
TYPE:
|
regex_flags
|
regex flags to be used in regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
|
Source code in presidio_analyzer/pattern_recognizer.py
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
|
enhance_using_context
enhance_using_context(
text: str,
raw_recognizer_results: List[RecognizerResult],
other_raw_recognizer_results: List[RecognizerResult],
nlp_artifacts: NlpArtifacts,
context: Optional[List[str]] = None,
) -> List[RecognizerResult]
Enhance confidence score using context of the entity.
Override this method in derived class in case a custom logic is needed, otherwise return value will be equal to raw_results.
in case a result score is boosted, derived class need to update result.recognition_metadata[RecognizerResult.IS_SCORE_ENHANCED_BY_CONTEXT_KEY]
PARAMETER | DESCRIPTION |
---|---|
text
|
The actual text that was analyzed
TYPE:
|
raw_recognizer_results
|
This recognizer's results, to be updated based on recognizer specific context.
TYPE:
|
other_raw_recognizer_results
|
Other recognizer results matched in the given text to allow related entity context enhancement
TYPE:
|
nlp_artifacts
|
The nlp artifacts contains elements such as lemmatized tokens for better accuracy of the context enhancement process
TYPE:
|
context
|
list of context words
TYPE:
|
Source code in presidio_analyzer/entity_recognizer.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
|
get_supported_entities
get_supported_entities() -> List[str]
Return the list of entities this recognizer can identify.
RETURNS | DESCRIPTION |
---|---|
List[str]
|
A list of the supported entities by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
119 120 121 122 123 124 125 |
|
get_supported_language
get_supported_language() -> str
Return the language this recognizer can support.
RETURNS | DESCRIPTION |
---|---|
str
|
A list of the supported language by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
127 128 129 130 131 132 133 |
|
get_version
get_version() -> str
Return the version of this recognizer.
RETURNS | DESCRIPTION |
---|---|
str
|
The current version of this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
135 136 137 138 139 140 141 |
|
to_dict
to_dict() -> Dict
Serialize instance into a dictionary.
Source code in presidio_analyzer/pattern_recognizer.py
254 255 256 257 258 259 260 261 262 263 264 |
|
from_dict
classmethod
from_dict(entity_recognizer_dict: Dict) -> PatternRecognizer
Create instance from a serialized dict.
Source code in presidio_analyzer/pattern_recognizer.py
266 267 268 269 270 271 272 273 274 |
|
remove_duplicates
staticmethod
remove_duplicates(results: List[RecognizerResult]) -> List[RecognizerResult]
Remove duplicate results.
Remove duplicates in case the two results have identical start and ends and types.
PARAMETER | DESCRIPTION |
---|---|
results
|
List[RecognizerResult]
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
List[RecognizerResult] |
Source code in presidio_analyzer/entity_recognizer.py
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
|
sanitize_value
staticmethod
sanitize_value(text: str, replacement_pairs: List[Tuple[str, str]]) -> str
Cleanse the input string of the replacement pairs specified as argument.
PARAMETER | DESCRIPTION |
---|---|
text
|
input string
TYPE:
|
replacement_pairs
|
pairs of what has to be replaced with which value
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
cleansed string |
Source code in presidio_analyzer/entity_recognizer.py
200 201 202 203 204 205 206 207 208 209 210 211 |
|
validate_result
validate_result(pattern_text: str) -> Optional[bool]
Validate the pattern logic e.g., by running checksum on a detected pattern.
PARAMETER | DESCRIPTION |
---|---|
pattern_text
|
the text to validated. Only the part in text that was detected by the regex engine
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Optional[bool]
|
A bool indicating whether the validation was successful. |
Source code in presidio_analyzer/pattern_recognizer.py
117 118 119 120 121 122 123 124 125 |
|
invalidate_result
invalidate_result(pattern_text: str) -> Optional[bool]
Logic to check for result invalidation by running pruning logic.
For example, each SSN number group should not consist of all the same digits.
PARAMETER | DESCRIPTION |
---|---|
pattern_text
|
the text to validated. Only the part in text that was detected by the regex engine
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Optional[bool]
|
A bool indicating whether the result is invalidated |
Source code in presidio_analyzer/pattern_recognizer.py
127 128 129 130 131 132 133 134 135 136 137 |
|
build_regex_explanation
staticmethod
build_regex_explanation(
recognizer_name: str,
pattern_name: str,
pattern: str,
original_score: float,
validation_result: bool,
regex_flags: int,
) -> AnalysisExplanation
Construct an explanation for why this entity was detected.
PARAMETER | DESCRIPTION |
---|---|
recognizer_name
|
Name of recognizer detecting the entity
TYPE:
|
pattern_name
|
Regex pattern name which detected the entity
TYPE:
|
pattern
|
Regex pattern logic
TYPE:
|
original_score
|
Score given by the recognizer
TYPE:
|
validation_result
|
Whether validation was used and its result
TYPE:
|
regex_flags
|
Regex flags used in the regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
AnalysisExplanation
|
Analysis explanation |
Source code in presidio_analyzer/pattern_recognizer.py
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
|
InVehicleRegistrationRecognizer
Bases: PatternRecognizer
Recognizes Indian Vehicle Registration Number issued by RTO.
Reference(s): https://en.wikipedia.org/wiki/Vehicle_registration_plates_of_India https://en.wikipedia.org/wiki/Regional_Transport_Office https://en.wikipedia.org/wiki/List_of_Regional_Transport_Office_districts_in_India
The registration scheme changed over time with multiple formats in play over the years India has multiple active patterns for registration plates issued to different vehicle categories
PARAMETER | DESCRIPTION |
---|---|
patterns
|
List of patterns to be used by this recognizer
TYPE:
|
context
|
List of context words to increase confidence in detection
TYPE:
|
supported_language
|
Language this recognizer supports
TYPE:
|
supported_entity
|
The entity this recognizer can detect
TYPE:
|
replacement_pairs
|
List of tuples with potential replacement values for different strings to be used during pattern matching. This can allow a greater variety in input e.g. by removing dashes or spaces
TYPE:
|
METHOD | DESCRIPTION |
---|---|
analyze |
Analyzes text to detect PII using regular expressions or deny-lists. |
enhance_using_context |
Enhance confidence score using context of the entity. |
get_supported_entities |
Return the list of entities this recognizer can identify. |
get_supported_language |
Return the language this recognizer can support. |
get_version |
Return the version of this recognizer. |
to_dict |
Serialize instance into a dictionary. |
from_dict |
Create instance from a serialized dict. |
remove_duplicates |
Remove duplicate results. |
sanitize_value |
Cleanse the input string of the replacement pairs specified as argument. |
invalidate_result |
Logic to check for result invalidation by running pruning logic. |
build_regex_explanation |
Construct an explanation for why this entity was detected. |
validate_result |
Determine absolute value based on calculation. |
Source code in presidio_analyzer/predefined_recognizers/in_vehicle_registration_recognizer.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 |
|
id
property
id
Return a unique identifier of this recognizer.
analyze
analyze(
text: str,
entities: List[str],
nlp_artifacts: Optional[NlpArtifacts] = None,
regex_flags: Optional[int] = None,
) -> List[RecognizerResult]
Analyzes text to detect PII using regular expressions or deny-lists.
PARAMETER | DESCRIPTION |
---|---|
text
|
Text to be analyzed
TYPE:
|
entities
|
Entities this recognizer can detect
TYPE:
|
nlp_artifacts
|
Output values from the NLP engine
TYPE:
|
regex_flags
|
regex flags to be used in regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
|
Source code in presidio_analyzer/pattern_recognizer.py
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
|
enhance_using_context
enhance_using_context(
text: str,
raw_recognizer_results: List[RecognizerResult],
other_raw_recognizer_results: List[RecognizerResult],
nlp_artifacts: NlpArtifacts,
context: Optional[List[str]] = None,
) -> List[RecognizerResult]
Enhance confidence score using context of the entity.
Override this method in derived class in case a custom logic is needed, otherwise return value will be equal to raw_results.
in case a result score is boosted, derived class need to update result.recognition_metadata[RecognizerResult.IS_SCORE_ENHANCED_BY_CONTEXT_KEY]
PARAMETER | DESCRIPTION |
---|---|
text
|
The actual text that was analyzed
TYPE:
|
raw_recognizer_results
|
This recognizer's results, to be updated based on recognizer specific context.
TYPE:
|
other_raw_recognizer_results
|
Other recognizer results matched in the given text to allow related entity context enhancement
TYPE:
|
nlp_artifacts
|
The nlp artifacts contains elements such as lemmatized tokens for better accuracy of the context enhancement process
TYPE:
|
context
|
list of context words
TYPE:
|
Source code in presidio_analyzer/entity_recognizer.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
|
get_supported_entities
get_supported_entities() -> List[str]
Return the list of entities this recognizer can identify.
RETURNS | DESCRIPTION |
---|---|
List[str]
|
A list of the supported entities by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
119 120 121 122 123 124 125 |
|
get_supported_language
get_supported_language() -> str
Return the language this recognizer can support.
RETURNS | DESCRIPTION |
---|---|
str
|
A list of the supported language by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
127 128 129 130 131 132 133 |
|
get_version
get_version() -> str
Return the version of this recognizer.
RETURNS | DESCRIPTION |
---|---|
str
|
The current version of this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
135 136 137 138 139 140 141 |
|
to_dict
to_dict() -> Dict
Serialize instance into a dictionary.
Source code in presidio_analyzer/pattern_recognizer.py
254 255 256 257 258 259 260 261 262 263 264 |
|
from_dict
classmethod
from_dict(entity_recognizer_dict: Dict) -> PatternRecognizer
Create instance from a serialized dict.
Source code in presidio_analyzer/pattern_recognizer.py
266 267 268 269 270 271 272 273 274 |
|
remove_duplicates
staticmethod
remove_duplicates(results: List[RecognizerResult]) -> List[RecognizerResult]
Remove duplicate results.
Remove duplicates in case the two results have identical start and ends and types.
PARAMETER | DESCRIPTION |
---|---|
results
|
List[RecognizerResult]
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
List[RecognizerResult] |
Source code in presidio_analyzer/entity_recognizer.py
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
|
sanitize_value
staticmethod
sanitize_value(text: str, replacement_pairs: List[Tuple[str, str]]) -> str
Cleanse the input string of the replacement pairs specified as argument.
PARAMETER | DESCRIPTION |
---|---|
text
|
input string
TYPE:
|
replacement_pairs
|
pairs of what has to be replaced with which value
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
cleansed string |
Source code in presidio_analyzer/entity_recognizer.py
200 201 202 203 204 205 206 207 208 209 210 211 |
|
invalidate_result
invalidate_result(pattern_text: str) -> Optional[bool]
Logic to check for result invalidation by running pruning logic.
For example, each SSN number group should not consist of all the same digits.
PARAMETER | DESCRIPTION |
---|---|
pattern_text
|
the text to validated. Only the part in text that was detected by the regex engine
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Optional[bool]
|
A bool indicating whether the result is invalidated |
Source code in presidio_analyzer/pattern_recognizer.py
127 128 129 130 131 132 133 134 135 136 137 |
|
build_regex_explanation
staticmethod
build_regex_explanation(
recognizer_name: str,
pattern_name: str,
pattern: str,
original_score: float,
validation_result: bool,
regex_flags: int,
) -> AnalysisExplanation
Construct an explanation for why this entity was detected.
PARAMETER | DESCRIPTION |
---|---|
recognizer_name
|
Name of recognizer detecting the entity
TYPE:
|
pattern_name
|
Regex pattern name which detected the entity
TYPE:
|
pattern
|
Regex pattern logic
TYPE:
|
original_score
|
Score given by the recognizer
TYPE:
|
validation_result
|
Whether validation was used and its result
TYPE:
|
regex_flags
|
Regex flags used in the regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
AnalysisExplanation
|
Analysis explanation |
Source code in presidio_analyzer/pattern_recognizer.py
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
|
validate_result
validate_result(pattern_text: str) -> bool
Determine absolute value based on calculation.
Source code in presidio_analyzer/predefined_recognizers/in_vehicle_registration_recognizer.py
349 350 351 352 353 354 |
|
InVoterRecognizer
Bases: PatternRecognizer
Recognize Indian Voter/Election Id(EPIC).
The Elector's Photo Identity Card or Voter id is a ten digit alpha-numeric code issued by Election Commission of India to adult domiciles who have reached the age of 18 Ref: https://en.wikipedia.org/wiki/Voter_ID_(India)
PARAMETER | DESCRIPTION |
---|---|
patterns
|
List of patterns to be used by this recognizer
TYPE:
|
context
|
List of context words to increase confidence in detection
TYPE:
|
supported_language
|
Language this recognizer supports
TYPE:
|
supported_entity
|
The entity this recognizer can detect
TYPE:
|
METHOD | DESCRIPTION |
---|---|
analyze |
Analyzes text to detect PII using regular expressions or deny-lists. |
enhance_using_context |
Enhance confidence score using context of the entity. |
get_supported_entities |
Return the list of entities this recognizer can identify. |
get_supported_language |
Return the language this recognizer can support. |
get_version |
Return the version of this recognizer. |
to_dict |
Serialize instance into a dictionary. |
from_dict |
Create instance from a serialized dict. |
remove_duplicates |
Remove duplicate results. |
sanitize_value |
Cleanse the input string of the replacement pairs specified as argument. |
validate_result |
Validate the pattern logic e.g., by running checksum on a detected pattern. |
invalidate_result |
Logic to check for result invalidation by running pruning logic. |
build_regex_explanation |
Construct an explanation for why this entity was detected. |
Source code in presidio_analyzer/predefined_recognizers/in_voter_recognizer.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 |
|
id
property
id
Return a unique identifier of this recognizer.
analyze
analyze(
text: str,
entities: List[str],
nlp_artifacts: Optional[NlpArtifacts] = None,
regex_flags: Optional[int] = None,
) -> List[RecognizerResult]
Analyzes text to detect PII using regular expressions or deny-lists.
PARAMETER | DESCRIPTION |
---|---|
text
|
Text to be analyzed
TYPE:
|
entities
|
Entities this recognizer can detect
TYPE:
|
nlp_artifacts
|
Output values from the NLP engine
TYPE:
|
regex_flags
|
regex flags to be used in regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
|
Source code in presidio_analyzer/pattern_recognizer.py
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
|
enhance_using_context
enhance_using_context(
text: str,
raw_recognizer_results: List[RecognizerResult],
other_raw_recognizer_results: List[RecognizerResult],
nlp_artifacts: NlpArtifacts,
context: Optional[List[str]] = None,
) -> List[RecognizerResult]
Enhance confidence score using context of the entity.
Override this method in derived class in case a custom logic is needed, otherwise return value will be equal to raw_results.
in case a result score is boosted, derived class need to update result.recognition_metadata[RecognizerResult.IS_SCORE_ENHANCED_BY_CONTEXT_KEY]
PARAMETER | DESCRIPTION |
---|---|
text
|
The actual text that was analyzed
TYPE:
|
raw_recognizer_results
|
This recognizer's results, to be updated based on recognizer specific context.
TYPE:
|
other_raw_recognizer_results
|
Other recognizer results matched in the given text to allow related entity context enhancement
TYPE:
|
nlp_artifacts
|
The nlp artifacts contains elements such as lemmatized tokens for better accuracy of the context enhancement process
TYPE:
|
context
|
list of context words
TYPE:
|
Source code in presidio_analyzer/entity_recognizer.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
|
get_supported_entities
get_supported_entities() -> List[str]
Return the list of entities this recognizer can identify.
RETURNS | DESCRIPTION |
---|---|
List[str]
|
A list of the supported entities by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
119 120 121 122 123 124 125 |
|
get_supported_language
get_supported_language() -> str
Return the language this recognizer can support.
RETURNS | DESCRIPTION |
---|---|
str
|
A list of the supported language by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
127 128 129 130 131 132 133 |
|
get_version
get_version() -> str
Return the version of this recognizer.
RETURNS | DESCRIPTION |
---|---|
str
|
The current version of this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
135 136 137 138 139 140 141 |
|
to_dict
to_dict() -> Dict
Serialize instance into a dictionary.
Source code in presidio_analyzer/pattern_recognizer.py
254 255 256 257 258 259 260 261 262 263 264 |
|
from_dict
classmethod
from_dict(entity_recognizer_dict: Dict) -> PatternRecognizer
Create instance from a serialized dict.
Source code in presidio_analyzer/pattern_recognizer.py
266 267 268 269 270 271 272 273 274 |
|
remove_duplicates
staticmethod
remove_duplicates(results: List[RecognizerResult]) -> List[RecognizerResult]
Remove duplicate results.
Remove duplicates in case the two results have identical start and ends and types.
PARAMETER | DESCRIPTION |
---|---|
results
|
List[RecognizerResult]
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
List[RecognizerResult] |
Source code in presidio_analyzer/entity_recognizer.py
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
|
sanitize_value
staticmethod
sanitize_value(text: str, replacement_pairs: List[Tuple[str, str]]) -> str
Cleanse the input string of the replacement pairs specified as argument.
PARAMETER | DESCRIPTION |
---|---|
text
|
input string
TYPE:
|
replacement_pairs
|
pairs of what has to be replaced with which value
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
cleansed string |
Source code in presidio_analyzer/entity_recognizer.py
200 201 202 203 204 205 206 207 208 209 210 211 |
|
validate_result
validate_result(pattern_text: str) -> Optional[bool]
Validate the pattern logic e.g., by running checksum on a detected pattern.
PARAMETER | DESCRIPTION |
---|---|
pattern_text
|
the text to validated. Only the part in text that was detected by the regex engine
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Optional[bool]
|
A bool indicating whether the validation was successful. |
Source code in presidio_analyzer/pattern_recognizer.py
117 118 119 120 121 122 123 124 125 |
|
invalidate_result
invalidate_result(pattern_text: str) -> Optional[bool]
Logic to check for result invalidation by running pruning logic.
For example, each SSN number group should not consist of all the same digits.
PARAMETER | DESCRIPTION |
---|---|
pattern_text
|
the text to validated. Only the part in text that was detected by the regex engine
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Optional[bool]
|
A bool indicating whether the result is invalidated |
Source code in presidio_analyzer/pattern_recognizer.py
127 128 129 130 131 132 133 134 135 136 137 |
|
build_regex_explanation
staticmethod
build_regex_explanation(
recognizer_name: str,
pattern_name: str,
pattern: str,
original_score: float,
validation_result: bool,
regex_flags: int,
) -> AnalysisExplanation
Construct an explanation for why this entity was detected.
PARAMETER | DESCRIPTION |
---|---|
recognizer_name
|
Name of recognizer detecting the entity
TYPE:
|
pattern_name
|
Regex pattern name which detected the entity
TYPE:
|
pattern
|
Regex pattern logic
TYPE:
|
original_score
|
Score given by the recognizer
TYPE:
|
validation_result
|
Whether validation was used and its result
TYPE:
|
regex_flags
|
Regex flags used in the regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
AnalysisExplanation
|
Analysis explanation |
Source code in presidio_analyzer/pattern_recognizer.py
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
|
IpRecognizer
Bases: PatternRecognizer
Recognize IP address using regex.
PARAMETER | DESCRIPTION |
---|---|
patterns
|
List of patterns to be used by this recognizer
TYPE:
|
context
|
List of context words to increase confidence in detection
TYPE:
|
supported_language
|
Language this recognizer supports
TYPE:
|
supported_entity
|
The entity this recognizer can detect
TYPE:
|
METHOD | DESCRIPTION |
---|---|
analyze |
Analyzes text to detect PII using regular expressions or deny-lists. |
enhance_using_context |
Enhance confidence score using context of the entity. |
get_supported_entities |
Return the list of entities this recognizer can identify. |
get_supported_language |
Return the language this recognizer can support. |
get_version |
Return the version of this recognizer. |
to_dict |
Serialize instance into a dictionary. |
from_dict |
Create instance from a serialized dict. |
remove_duplicates |
Remove duplicate results. |
sanitize_value |
Cleanse the input string of the replacement pairs specified as argument. |
validate_result |
Validate the pattern logic e.g., by running checksum on a detected pattern. |
build_regex_explanation |
Construct an explanation for why this entity was detected. |
invalidate_result |
Check if the pattern text cannot be validated as an IP address. |
Source code in presidio_analyzer/predefined_recognizers/ip_recognizer.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 |
|
id
property
id
Return a unique identifier of this recognizer.
analyze
analyze(
text: str,
entities: List[str],
nlp_artifacts: Optional[NlpArtifacts] = None,
regex_flags: Optional[int] = None,
) -> List[RecognizerResult]
Analyzes text to detect PII using regular expressions or deny-lists.
PARAMETER | DESCRIPTION |
---|---|
text
|
Text to be analyzed
TYPE:
|
entities
|
Entities this recognizer can detect
TYPE:
|
nlp_artifacts
|
Output values from the NLP engine
TYPE:
|
regex_flags
|
regex flags to be used in regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
|
Source code in presidio_analyzer/pattern_recognizer.py
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
|
enhance_using_context
enhance_using_context(
text: str,
raw_recognizer_results: List[RecognizerResult],
other_raw_recognizer_results: List[RecognizerResult],
nlp_artifacts: NlpArtifacts,
context: Optional[List[str]] = None,
) -> List[RecognizerResult]
Enhance confidence score using context of the entity.
Override this method in derived class in case a custom logic is needed, otherwise return value will be equal to raw_results.
in case a result score is boosted, derived class need to update result.recognition_metadata[RecognizerResult.IS_SCORE_ENHANCED_BY_CONTEXT_KEY]
PARAMETER | DESCRIPTION |
---|---|
text
|
The actual text that was analyzed
TYPE:
|
raw_recognizer_results
|
This recognizer's results, to be updated based on recognizer specific context.
TYPE:
|
other_raw_recognizer_results
|
Other recognizer results matched in the given text to allow related entity context enhancement
TYPE:
|
nlp_artifacts
|
The nlp artifacts contains elements such as lemmatized tokens for better accuracy of the context enhancement process
TYPE:
|
context
|
list of context words
TYPE:
|
Source code in presidio_analyzer/entity_recognizer.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
|
get_supported_entities
get_supported_entities() -> List[str]
Return the list of entities this recognizer can identify.
RETURNS | DESCRIPTION |
---|---|
List[str]
|
A list of the supported entities by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
119 120 121 122 123 124 125 |
|
get_supported_language
get_supported_language() -> str
Return the language this recognizer can support.
RETURNS | DESCRIPTION |
---|---|
str
|
A list of the supported language by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
127 128 129 130 131 132 133 |
|
get_version
get_version() -> str
Return the version of this recognizer.
RETURNS | DESCRIPTION |
---|---|
str
|
The current version of this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
135 136 137 138 139 140 141 |
|
to_dict
to_dict() -> Dict
Serialize instance into a dictionary.
Source code in presidio_analyzer/pattern_recognizer.py
254 255 256 257 258 259 260 261 262 263 264 |
|
from_dict
classmethod
from_dict(entity_recognizer_dict: Dict) -> PatternRecognizer
Create instance from a serialized dict.
Source code in presidio_analyzer/pattern_recognizer.py
266 267 268 269 270 271 272 273 274 |
|
remove_duplicates
staticmethod
remove_duplicates(results: List[RecognizerResult]) -> List[RecognizerResult]
Remove duplicate results.
Remove duplicates in case the two results have identical start and ends and types.
PARAMETER | DESCRIPTION |
---|---|
results
|
List[RecognizerResult]
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
List[RecognizerResult] |
Source code in presidio_analyzer/entity_recognizer.py
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
|
sanitize_value
staticmethod
sanitize_value(text: str, replacement_pairs: List[Tuple[str, str]]) -> str
Cleanse the input string of the replacement pairs specified as argument.
PARAMETER | DESCRIPTION |
---|---|
text
|
input string
TYPE:
|
replacement_pairs
|
pairs of what has to be replaced with which value
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
cleansed string |
Source code in presidio_analyzer/entity_recognizer.py
200 201 202 203 204 205 206 207 208 209 210 211 |
|
validate_result
validate_result(pattern_text: str) -> Optional[bool]
Validate the pattern logic e.g., by running checksum on a detected pattern.
PARAMETER | DESCRIPTION |
---|---|
pattern_text
|
the text to validated. Only the part in text that was detected by the regex engine
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Optional[bool]
|
A bool indicating whether the validation was successful. |
Source code in presidio_analyzer/pattern_recognizer.py
117 118 119 120 121 122 123 124 125 |
|
build_regex_explanation
staticmethod
build_regex_explanation(
recognizer_name: str,
pattern_name: str,
pattern: str,
original_score: float,
validation_result: bool,
regex_flags: int,
) -> AnalysisExplanation
Construct an explanation for why this entity was detected.
PARAMETER | DESCRIPTION |
---|---|
recognizer_name
|
Name of recognizer detecting the entity
TYPE:
|
pattern_name
|
Regex pattern name which detected the entity
TYPE:
|
pattern
|
Regex pattern logic
TYPE:
|
original_score
|
Score given by the recognizer
TYPE:
|
validation_result
|
Whether validation was used and its result
TYPE:
|
regex_flags
|
Regex flags used in the regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
AnalysisExplanation
|
Analysis explanation |
Source code in presidio_analyzer/pattern_recognizer.py
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
|
invalidate_result
invalidate_result(pattern_text: str) -> bool
Check if the pattern text cannot be validated as an IP address.
PARAMETER | DESCRIPTION |
---|---|
pattern_text
|
Text detected as pattern by regex
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
bool
|
True if invalidated |
Source code in presidio_analyzer/predefined_recognizers/ip_recognizer.py
53 54 55 56 57 58 59 60 61 62 63 |
|
ItDriverLicenseRecognizer
Bases: PatternRecognizer
Recognizes IT Driver License using regex.
PARAMETER | DESCRIPTION |
---|---|
patterns
|
List of patterns to be used by this recognizer
TYPE:
|
context
|
List of context words to increase confidence in detection
TYPE:
|
supported_language
|
Language this recognizer supports
TYPE:
|
supported_entity
|
The entity this recognizer can detect
TYPE:
|
METHOD | DESCRIPTION |
---|---|
analyze |
Analyzes text to detect PII using regular expressions or deny-lists. |
enhance_using_context |
Enhance confidence score using context of the entity. |
get_supported_entities |
Return the list of entities this recognizer can identify. |
get_supported_language |
Return the language this recognizer can support. |
get_version |
Return the version of this recognizer. |
to_dict |
Serialize instance into a dictionary. |
from_dict |
Create instance from a serialized dict. |
remove_duplicates |
Remove duplicate results. |
sanitize_value |
Cleanse the input string of the replacement pairs specified as argument. |
validate_result |
Validate the pattern logic e.g., by running checksum on a detected pattern. |
invalidate_result |
Logic to check for result invalidation by running pruning logic. |
build_regex_explanation |
Construct an explanation for why this entity was detected. |
Source code in presidio_analyzer/predefined_recognizers/it_driver_license_recognizer.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
|
id
property
id
Return a unique identifier of this recognizer.
analyze
analyze(
text: str,
entities: List[str],
nlp_artifacts: Optional[NlpArtifacts] = None,
regex_flags: Optional[int] = None,
) -> List[RecognizerResult]
Analyzes text to detect PII using regular expressions or deny-lists.
PARAMETER | DESCRIPTION |
---|---|
text
|
Text to be analyzed
TYPE:
|
entities
|
Entities this recognizer can detect
TYPE:
|
nlp_artifacts
|
Output values from the NLP engine
TYPE:
|
regex_flags
|
regex flags to be used in regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
|
Source code in presidio_analyzer/pattern_recognizer.py
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
|
enhance_using_context
enhance_using_context(
text: str,
raw_recognizer_results: List[RecognizerResult],
other_raw_recognizer_results: List[RecognizerResult],
nlp_artifacts: NlpArtifacts,
context: Optional[List[str]] = None,
) -> List[RecognizerResult]
Enhance confidence score using context of the entity.
Override this method in derived class in case a custom logic is needed, otherwise return value will be equal to raw_results.
in case a result score is boosted, derived class need to update result.recognition_metadata[RecognizerResult.IS_SCORE_ENHANCED_BY_CONTEXT_KEY]
PARAMETER | DESCRIPTION |
---|---|
text
|
The actual text that was analyzed
TYPE:
|
raw_recognizer_results
|
This recognizer's results, to be updated based on recognizer specific context.
TYPE:
|
other_raw_recognizer_results
|
Other recognizer results matched in the given text to allow related entity context enhancement
TYPE:
|
nlp_artifacts
|
The nlp artifacts contains elements such as lemmatized tokens for better accuracy of the context enhancement process
TYPE:
|
context
|
list of context words
TYPE:
|
Source code in presidio_analyzer/entity_recognizer.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
|
get_supported_entities
get_supported_entities() -> List[str]
Return the list of entities this recognizer can identify.
RETURNS | DESCRIPTION |
---|---|
List[str]
|
A list of the supported entities by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
119 120 121 122 123 124 125 |
|
get_supported_language
get_supported_language() -> str
Return the language this recognizer can support.
RETURNS | DESCRIPTION |
---|---|
str
|
A list of the supported language by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
127 128 129 130 131 132 133 |
|
get_version
get_version() -> str
Return the version of this recognizer.
RETURNS | DESCRIPTION |
---|---|
str
|
The current version of this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
135 136 137 138 139 140 141 |
|
to_dict
to_dict() -> Dict
Serialize instance into a dictionary.
Source code in presidio_analyzer/pattern_recognizer.py
254 255 256 257 258 259 260 261 262 263 264 |
|
from_dict
classmethod
from_dict(entity_recognizer_dict: Dict) -> PatternRecognizer
Create instance from a serialized dict.
Source code in presidio_analyzer/pattern_recognizer.py
266 267 268 269 270 271 272 273 274 |
|
remove_duplicates
staticmethod
remove_duplicates(results: List[RecognizerResult]) -> List[RecognizerResult]
Remove duplicate results.
Remove duplicates in case the two results have identical start and ends and types.
PARAMETER | DESCRIPTION |
---|---|
results
|
List[RecognizerResult]
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
List[RecognizerResult] |
Source code in presidio_analyzer/entity_recognizer.py
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
|
sanitize_value
staticmethod
sanitize_value(text: str, replacement_pairs: List[Tuple[str, str]]) -> str
Cleanse the input string of the replacement pairs specified as argument.
PARAMETER | DESCRIPTION |
---|---|
text
|
input string
TYPE:
|
replacement_pairs
|
pairs of what has to be replaced with which value
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
cleansed string |
Source code in presidio_analyzer/entity_recognizer.py
200 201 202 203 204 205 206 207 208 209 210 211 |
|
validate_result
validate_result(pattern_text: str) -> Optional[bool]
Validate the pattern logic e.g., by running checksum on a detected pattern.
PARAMETER | DESCRIPTION |
---|---|
pattern_text
|
the text to validated. Only the part in text that was detected by the regex engine
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Optional[bool]
|
A bool indicating whether the validation was successful. |
Source code in presidio_analyzer/pattern_recognizer.py
117 118 119 120 121 122 123 124 125 |
|
invalidate_result
invalidate_result(pattern_text: str) -> Optional[bool]
Logic to check for result invalidation by running pruning logic.
For example, each SSN number group should not consist of all the same digits.
PARAMETER | DESCRIPTION |
---|---|
pattern_text
|
the text to validated. Only the part in text that was detected by the regex engine
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Optional[bool]
|
A bool indicating whether the result is invalidated |
Source code in presidio_analyzer/pattern_recognizer.py
127 128 129 130 131 132 133 134 135 136 137 |
|
build_regex_explanation
staticmethod
build_regex_explanation(
recognizer_name: str,
pattern_name: str,
pattern: str,
original_score: float,
validation_result: bool,
regex_flags: int,
) -> AnalysisExplanation
Construct an explanation for why this entity was detected.
PARAMETER | DESCRIPTION |
---|---|
recognizer_name
|
Name of recognizer detecting the entity
TYPE:
|
pattern_name
|
Regex pattern name which detected the entity
TYPE:
|
pattern
|
Regex pattern logic
TYPE:
|
original_score
|
Score given by the recognizer
TYPE:
|
validation_result
|
Whether validation was used and its result
TYPE:
|
regex_flags
|
Regex flags used in the regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
AnalysisExplanation
|
Analysis explanation |
Source code in presidio_analyzer/pattern_recognizer.py
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
|
ItFiscalCodeRecognizer
Bases: PatternRecognizer
Recognizes IT Fiscal Code using regex.
PARAMETER | DESCRIPTION |
---|---|
patterns
|
List of patterns to be used by this recognizer
TYPE:
|
context
|
List of context words to increase confidence in detection
TYPE:
|
supported_language
|
Language this recognizer supports
TYPE:
|
supported_entity
|
The entity this recognizer can detect
TYPE:
|
METHOD | DESCRIPTION |
---|---|
analyze |
Analyzes text to detect PII using regular expressions or deny-lists. |
enhance_using_context |
Enhance confidence score using context of the entity. |
get_supported_entities |
Return the list of entities this recognizer can identify. |
get_supported_language |
Return the language this recognizer can support. |
get_version |
Return the version of this recognizer. |
to_dict |
Serialize instance into a dictionary. |
from_dict |
Create instance from a serialized dict. |
remove_duplicates |
Remove duplicate results. |
sanitize_value |
Cleanse the input string of the replacement pairs specified as argument. |
invalidate_result |
Logic to check for result invalidation by running pruning logic. |
build_regex_explanation |
Construct an explanation for why this entity was detected. |
validate_result |
Validate the pattern logic e.g., by running checksum on a detected pattern. |
Source code in presidio_analyzer/predefined_recognizers/it_fiscal_code_recognizer.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 |
|
id
property
id
Return a unique identifier of this recognizer.
analyze
analyze(
text: str,
entities: List[str],
nlp_artifacts: Optional[NlpArtifacts] = None,
regex_flags: Optional[int] = None,
) -> List[RecognizerResult]
Analyzes text to detect PII using regular expressions or deny-lists.
PARAMETER | DESCRIPTION |
---|---|
text
|
Text to be analyzed
TYPE:
|
entities
|
Entities this recognizer can detect
TYPE:
|
nlp_artifacts
|
Output values from the NLP engine
TYPE:
|
regex_flags
|
regex flags to be used in regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
|
Source code in presidio_analyzer/pattern_recognizer.py
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
|
enhance_using_context
enhance_using_context(
text: str,
raw_recognizer_results: List[RecognizerResult],
other_raw_recognizer_results: List[RecognizerResult],
nlp_artifacts: NlpArtifacts,
context: Optional[List[str]] = None,
) -> List[RecognizerResult]
Enhance confidence score using context of the entity.
Override this method in derived class in case a custom logic is needed, otherwise return value will be equal to raw_results.
in case a result score is boosted, derived class need to update result.recognition_metadata[RecognizerResult.IS_SCORE_ENHANCED_BY_CONTEXT_KEY]
PARAMETER | DESCRIPTION |
---|---|
text
|
The actual text that was analyzed
TYPE:
|
raw_recognizer_results
|
This recognizer's results, to be updated based on recognizer specific context.
TYPE:
|
other_raw_recognizer_results
|
Other recognizer results matched in the given text to allow related entity context enhancement
TYPE:
|
nlp_artifacts
|
The nlp artifacts contains elements such as lemmatized tokens for better accuracy of the context enhancement process
TYPE:
|
context
|
list of context words
TYPE:
|
Source code in presidio_analyzer/entity_recognizer.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
|
get_supported_entities
get_supported_entities() -> List[str]
Return the list of entities this recognizer can identify.
RETURNS | DESCRIPTION |
---|---|
List[str]
|
A list of the supported entities by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
119 120 121 122 123 124 125 |
|
get_supported_language
get_supported_language() -> str
Return the language this recognizer can support.
RETURNS | DESCRIPTION |
---|---|
str
|
A list of the supported language by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
127 128 129 130 131 132 133 |
|
get_version
get_version() -> str
Return the version of this recognizer.
RETURNS | DESCRIPTION |
---|---|
str
|
The current version of this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
135 136 137 138 139 140 141 |
|
to_dict
to_dict() -> Dict
Serialize instance into a dictionary.
Source code in presidio_analyzer/pattern_recognizer.py
254 255 256 257 258 259 260 261 262 263 264 |
|
from_dict
classmethod
from_dict(entity_recognizer_dict: Dict) -> PatternRecognizer
Create instance from a serialized dict.
Source code in presidio_analyzer/pattern_recognizer.py
266 267 268 269 270 271 272 273 274 |
|
remove_duplicates
staticmethod
remove_duplicates(results: List[RecognizerResult]) -> List[RecognizerResult]
Remove duplicate results.
Remove duplicates in case the two results have identical start and ends and types.
PARAMETER | DESCRIPTION |
---|---|
results
|
List[RecognizerResult]
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
List[RecognizerResult] |
Source code in presidio_analyzer/entity_recognizer.py
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
|
sanitize_value
staticmethod
sanitize_value(text: str, replacement_pairs: List[Tuple[str, str]]) -> str
Cleanse the input string of the replacement pairs specified as argument.
PARAMETER | DESCRIPTION |
---|---|
text
|
input string
TYPE:
|
replacement_pairs
|
pairs of what has to be replaced with which value
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
cleansed string |
Source code in presidio_analyzer/entity_recognizer.py
200 201 202 203 204 205 206 207 208 209 210 211 |
|
invalidate_result
invalidate_result(pattern_text: str) -> Optional[bool]
Logic to check for result invalidation by running pruning logic.
For example, each SSN number group should not consist of all the same digits.
PARAMETER | DESCRIPTION |
---|---|
pattern_text
|
the text to validated. Only the part in text that was detected by the regex engine
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Optional[bool]
|
A bool indicating whether the result is invalidated |
Source code in presidio_analyzer/pattern_recognizer.py
127 128 129 130 131 132 133 134 135 136 137 |
|
build_regex_explanation
staticmethod
build_regex_explanation(
recognizer_name: str,
pattern_name: str,
pattern: str,
original_score: float,
validation_result: bool,
regex_flags: int,
) -> AnalysisExplanation
Construct an explanation for why this entity was detected.
PARAMETER | DESCRIPTION |
---|---|
recognizer_name
|
Name of recognizer detecting the entity
TYPE:
|
pattern_name
|
Regex pattern name which detected the entity
TYPE:
|
pattern
|
Regex pattern logic
TYPE:
|
original_score
|
Score given by the recognizer
TYPE:
|
validation_result
|
Whether validation was used and its result
TYPE:
|
regex_flags
|
Regex flags used in the regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
AnalysisExplanation
|
Analysis explanation |
Source code in presidio_analyzer/pattern_recognizer.py
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
|
validate_result
validate_result(pattern_text: str) -> Optional[bool]
Validate the pattern logic e.g., by running checksum on a detected pattern.
PARAMETER | DESCRIPTION |
---|---|
pattern_text
|
the text to validated. Only the part in text that was detected by the regex engine
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Optional[bool]
|
A bool indicating whether the validation was successful. |
Source code in presidio_analyzer/predefined_recognizers/it_fiscal_code_recognizer.py
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 |
|
ItIdentityCardRecognizer
Bases: PatternRecognizer
Recognizes Italian Identity Card number using case-insensitive regex.
PARAMETER | DESCRIPTION |
---|---|
patterns
|
List of patterns to be used by this recognizer
TYPE:
|
context
|
List of context words to increase confidence in detection
TYPE:
|
supported_language
|
Language this recognizer supports
TYPE:
|
supported_entity
|
The entity this recognizer can detect
TYPE:
|
METHOD | DESCRIPTION |
---|---|
analyze |
Analyzes text to detect PII using regular expressions or deny-lists. |
enhance_using_context |
Enhance confidence score using context of the entity. |
get_supported_entities |
Return the list of entities this recognizer can identify. |
get_supported_language |
Return the language this recognizer can support. |
get_version |
Return the version of this recognizer. |
to_dict |
Serialize instance into a dictionary. |
from_dict |
Create instance from a serialized dict. |
remove_duplicates |
Remove duplicate results. |
sanitize_value |
Cleanse the input string of the replacement pairs specified as argument. |
validate_result |
Validate the pattern logic e.g., by running checksum on a detected pattern. |
invalidate_result |
Logic to check for result invalidation by running pruning logic. |
build_regex_explanation |
Construct an explanation for why this entity was detected. |
Source code in presidio_analyzer/predefined_recognizers/it_identity_card_recognizer.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 |
|
id
property
id
Return a unique identifier of this recognizer.
analyze
analyze(
text: str,
entities: List[str],
nlp_artifacts: Optional[NlpArtifacts] = None,
regex_flags: Optional[int] = None,
) -> List[RecognizerResult]
Analyzes text to detect PII using regular expressions or deny-lists.
PARAMETER | DESCRIPTION |
---|---|
text
|
Text to be analyzed
TYPE:
|
entities
|
Entities this recognizer can detect
TYPE:
|
nlp_artifacts
|
Output values from the NLP engine
TYPE:
|
regex_flags
|
regex flags to be used in regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
|
Source code in presidio_analyzer/pattern_recognizer.py
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
|
enhance_using_context
enhance_using_context(
text: str,
raw_recognizer_results: List[RecognizerResult],
other_raw_recognizer_results: List[RecognizerResult],
nlp_artifacts: NlpArtifacts,
context: Optional[List[str]] = None,
) -> List[RecognizerResult]
Enhance confidence score using context of the entity.
Override this method in derived class in case a custom logic is needed, otherwise return value will be equal to raw_results.
in case a result score is boosted, derived class need to update result.recognition_metadata[RecognizerResult.IS_SCORE_ENHANCED_BY_CONTEXT_KEY]
PARAMETER | DESCRIPTION |
---|---|
text
|
The actual text that was analyzed
TYPE:
|
raw_recognizer_results
|
This recognizer's results, to be updated based on recognizer specific context.
TYPE:
|
other_raw_recognizer_results
|
Other recognizer results matched in the given text to allow related entity context enhancement
TYPE:
|
nlp_artifacts
|
The nlp artifacts contains elements such as lemmatized tokens for better accuracy of the context enhancement process
TYPE:
|
context
|
list of context words
TYPE:
|
Source code in presidio_analyzer/entity_recognizer.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
|
get_supported_entities
get_supported_entities() -> List[str]
Return the list of entities this recognizer can identify.
RETURNS | DESCRIPTION |
---|---|
List[str]
|
A list of the supported entities by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
119 120 121 122 123 124 125 |
|
get_supported_language
get_supported_language() -> str
Return the language this recognizer can support.
RETURNS | DESCRIPTION |
---|---|
str
|
A list of the supported language by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
127 128 129 130 131 132 133 |
|
get_version
get_version() -> str
Return the version of this recognizer.
RETURNS | DESCRIPTION |
---|---|
str
|
The current version of this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
135 136 137 138 139 140 141 |
|
to_dict
to_dict() -> Dict
Serialize instance into a dictionary.
Source code in presidio_analyzer/pattern_recognizer.py
254 255 256 257 258 259 260 261 262 263 264 |
|
from_dict
classmethod
from_dict(entity_recognizer_dict: Dict) -> PatternRecognizer
Create instance from a serialized dict.
Source code in presidio_analyzer/pattern_recognizer.py
266 267 268 269 270 271 272 273 274 |
|
remove_duplicates
staticmethod
remove_duplicates(results: List[RecognizerResult]) -> List[RecognizerResult]
Remove duplicate results.
Remove duplicates in case the two results have identical start and ends and types.
PARAMETER | DESCRIPTION |
---|---|
results
|
List[RecognizerResult]
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
List[RecognizerResult] |
Source code in presidio_analyzer/entity_recognizer.py
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
|
sanitize_value
staticmethod
sanitize_value(text: str, replacement_pairs: List[Tuple[str, str]]) -> str
Cleanse the input string of the replacement pairs specified as argument.
PARAMETER | DESCRIPTION |
---|---|
text
|
input string
TYPE:
|
replacement_pairs
|
pairs of what has to be replaced with which value
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
cleansed string |
Source code in presidio_analyzer/entity_recognizer.py
200 201 202 203 204 205 206 207 208 209 210 211 |
|
validate_result
validate_result(pattern_text: str) -> Optional[bool]
Validate the pattern logic e.g., by running checksum on a detected pattern.
PARAMETER | DESCRIPTION |
---|---|
pattern_text
|
the text to validated. Only the part in text that was detected by the regex engine
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Optional[bool]
|
A bool indicating whether the validation was successful. |
Source code in presidio_analyzer/pattern_recognizer.py
117 118 119 120 121 122 123 124 125 |
|
invalidate_result
invalidate_result(pattern_text: str) -> Optional[bool]
Logic to check for result invalidation by running pruning logic.
For example, each SSN number group should not consist of all the same digits.
PARAMETER | DESCRIPTION |
---|---|
pattern_text
|
the text to validated. Only the part in text that was detected by the regex engine
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Optional[bool]
|
A bool indicating whether the result is invalidated |
Source code in presidio_analyzer/pattern_recognizer.py
127 128 129 130 131 132 133 134 135 136 137 |
|
build_regex_explanation
staticmethod
build_regex_explanation(
recognizer_name: str,
pattern_name: str,
pattern: str,
original_score: float,
validation_result: bool,
regex_flags: int,
) -> AnalysisExplanation
Construct an explanation for why this entity was detected.
PARAMETER | DESCRIPTION |
---|---|
recognizer_name
|
Name of recognizer detecting the entity
TYPE:
|
pattern_name
|
Regex pattern name which detected the entity
TYPE:
|
pattern
|
Regex pattern logic
TYPE:
|
original_score
|
Score given by the recognizer
TYPE:
|
validation_result
|
Whether validation was used and its result
TYPE:
|
regex_flags
|
Regex flags used in the regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
AnalysisExplanation
|
Analysis explanation |
Source code in presidio_analyzer/pattern_recognizer.py
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
|
ItPassportRecognizer
Bases: PatternRecognizer
Recognizes IT Passport number using case-insensitive regex.
PARAMETER | DESCRIPTION |
---|---|
patterns
|
List of patterns to be used by this recognizer
TYPE:
|
context
|
List of context words to increase confidence in detection
TYPE:
|
supported_language
|
Language this recognizer supports
TYPE:
|
supported_entity
|
The entity this recognizer can detect
TYPE:
|
METHOD | DESCRIPTION |
---|---|
analyze |
Analyzes text to detect PII using regular expressions or deny-lists. |
enhance_using_context |
Enhance confidence score using context of the entity. |
get_supported_entities |
Return the list of entities this recognizer can identify. |
get_supported_language |
Return the language this recognizer can support. |
get_version |
Return the version of this recognizer. |
to_dict |
Serialize instance into a dictionary. |
from_dict |
Create instance from a serialized dict. |
remove_duplicates |
Remove duplicate results. |
sanitize_value |
Cleanse the input string of the replacement pairs specified as argument. |
validate_result |
Validate the pattern logic e.g., by running checksum on a detected pattern. |
invalidate_result |
Logic to check for result invalidation by running pruning logic. |
build_regex_explanation |
Construct an explanation for why this entity was detected. |
Source code in presidio_analyzer/predefined_recognizers/it_passport_recognizer.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
|
id
property
id
Return a unique identifier of this recognizer.
analyze
analyze(
text: str,
entities: List[str],
nlp_artifacts: Optional[NlpArtifacts] = None,
regex_flags: Optional[int] = None,
) -> List[RecognizerResult]
Analyzes text to detect PII using regular expressions or deny-lists.
PARAMETER | DESCRIPTION |
---|---|
text
|
Text to be analyzed
TYPE:
|
entities
|
Entities this recognizer can detect
TYPE:
|
nlp_artifacts
|
Output values from the NLP engine
TYPE:
|
regex_flags
|
regex flags to be used in regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
|
Source code in presidio_analyzer/pattern_recognizer.py
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
|
enhance_using_context
enhance_using_context(
text: str,
raw_recognizer_results: List[RecognizerResult],
other_raw_recognizer_results: List[RecognizerResult],
nlp_artifacts: NlpArtifacts,
context: Optional[List[str]] = None,
) -> List[RecognizerResult]
Enhance confidence score using context of the entity.
Override this method in derived class in case a custom logic is needed, otherwise return value will be equal to raw_results.
in case a result score is boosted, derived class need to update result.recognition_metadata[RecognizerResult.IS_SCORE_ENHANCED_BY_CONTEXT_KEY]
PARAMETER | DESCRIPTION |
---|---|
text
|
The actual text that was analyzed
TYPE:
|
raw_recognizer_results
|
This recognizer's results, to be updated based on recognizer specific context.
TYPE:
|
other_raw_recognizer_results
|
Other recognizer results matched in the given text to allow related entity context enhancement
TYPE:
|
nlp_artifacts
|
The nlp artifacts contains elements such as lemmatized tokens for better accuracy of the context enhancement process
TYPE:
|
context
|
list of context words
TYPE:
|
Source code in presidio_analyzer/entity_recognizer.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
|
get_supported_entities
get_supported_entities() -> List[str]
Return the list of entities this recognizer can identify.
RETURNS | DESCRIPTION |
---|---|
List[str]
|
A list of the supported entities by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
119 120 121 122 123 124 125 |
|
get_supported_language
get_supported_language() -> str
Return the language this recognizer can support.
RETURNS | DESCRIPTION |
---|---|
str
|
A list of the supported language by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
127 128 129 130 131 132 133 |
|
get_version
get_version() -> str
Return the version of this recognizer.
RETURNS | DESCRIPTION |
---|---|
str
|
The current version of this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
135 136 137 138 139 140 141 |
|
to_dict
to_dict() -> Dict
Serialize instance into a dictionary.
Source code in presidio_analyzer/pattern_recognizer.py
254 255 256 257 258 259 260 261 262 263 264 |
|
from_dict
classmethod
from_dict(entity_recognizer_dict: Dict) -> PatternRecognizer
Create instance from a serialized dict.
Source code in presidio_analyzer/pattern_recognizer.py
266 267 268 269 270 271 272 273 274 |
|
remove_duplicates
staticmethod
remove_duplicates(results: List[RecognizerResult]) -> List[RecognizerResult]
Remove duplicate results.
Remove duplicates in case the two results have identical start and ends and types.
PARAMETER | DESCRIPTION |
---|---|
results
|
List[RecognizerResult]
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
List[RecognizerResult] |
Source code in presidio_analyzer/entity_recognizer.py
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
|
sanitize_value
staticmethod
sanitize_value(text: str, replacement_pairs: List[Tuple[str, str]]) -> str
Cleanse the input string of the replacement pairs specified as argument.
PARAMETER | DESCRIPTION |
---|---|
text
|
input string
TYPE:
|
replacement_pairs
|
pairs of what has to be replaced with which value
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
cleansed string |
Source code in presidio_analyzer/entity_recognizer.py
200 201 202 203 204 205 206 207 208 209 210 211 |
|
validate_result
validate_result(pattern_text: str) -> Optional[bool]
Validate the pattern logic e.g., by running checksum on a detected pattern.
PARAMETER | DESCRIPTION |
---|---|
pattern_text
|
the text to validated. Only the part in text that was detected by the regex engine
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Optional[bool]
|
A bool indicating whether the validation was successful. |
Source code in presidio_analyzer/pattern_recognizer.py
117 118 119 120 121 122 123 124 125 |
|
invalidate_result
invalidate_result(pattern_text: str) -> Optional[bool]
Logic to check for result invalidation by running pruning logic.
For example, each SSN number group should not consist of all the same digits.
PARAMETER | DESCRIPTION |
---|---|
pattern_text
|
the text to validated. Only the part in text that was detected by the regex engine
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Optional[bool]
|
A bool indicating whether the result is invalidated |
Source code in presidio_analyzer/pattern_recognizer.py
127 128 129 130 131 132 133 134 135 136 137 |
|
build_regex_explanation
staticmethod
build_regex_explanation(
recognizer_name: str,
pattern_name: str,
pattern: str,
original_score: float,
validation_result: bool,
regex_flags: int,
) -> AnalysisExplanation
Construct an explanation for why this entity was detected.
PARAMETER | DESCRIPTION |
---|---|
recognizer_name
|
Name of recognizer detecting the entity
TYPE:
|
pattern_name
|
Regex pattern name which detected the entity
TYPE:
|
pattern
|
Regex pattern logic
TYPE:
|
original_score
|
Score given by the recognizer
TYPE:
|
validation_result
|
Whether validation was used and its result
TYPE:
|
regex_flags
|
Regex flags used in the regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
AnalysisExplanation
|
Analysis explanation |
Source code in presidio_analyzer/pattern_recognizer.py
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
|
ItVatCodeRecognizer
Bases: PatternRecognizer
Recognizes Italian VAT code using regex and checksum.
For more information about italian VAT code: https://en.wikipedia.org/wiki/VAT_identification_number#:~:text=%5B2%5D)-,Italy,-Partita%20IVA
PARAMETER | DESCRIPTION |
---|---|
patterns
|
List of patterns to be used by this recognizer
TYPE:
|
context
|
List of context words to increase confidence in detection
TYPE:
|
supported_language
|
Language this recognizer supports
TYPE:
|
supported_entity
|
The entity this recognizer can detect
TYPE:
|
replacement_pairs
|
List of tuples with potential replacement values for different strings to be used during pattern matching. This can allow a greater variety in input, for example by removing dashes or spaces.
TYPE:
|
METHOD | DESCRIPTION |
---|---|
analyze |
Analyzes text to detect PII using regular expressions or deny-lists. |
enhance_using_context |
Enhance confidence score using context of the entity. |
get_supported_entities |
Return the list of entities this recognizer can identify. |
get_supported_language |
Return the language this recognizer can support. |
get_version |
Return the version of this recognizer. |
to_dict |
Serialize instance into a dictionary. |
from_dict |
Create instance from a serialized dict. |
remove_duplicates |
Remove duplicate results. |
sanitize_value |
Cleanse the input string of the replacement pairs specified as argument. |
invalidate_result |
Logic to check for result invalidation by running pruning logic. |
build_regex_explanation |
Construct an explanation for why this entity was detected. |
validate_result |
Validate the pattern logic e.g., by running checksum on a detected pattern. |
Source code in presidio_analyzer/predefined_recognizers/it_vat_code.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 |
|
id
property
id
Return a unique identifier of this recognizer.
analyze
analyze(
text: str,
entities: List[str],
nlp_artifacts: Optional[NlpArtifacts] = None,
regex_flags: Optional[int] = None,
) -> List[RecognizerResult]
Analyzes text to detect PII using regular expressions or deny-lists.
PARAMETER | DESCRIPTION |
---|---|
text
|
Text to be analyzed
TYPE:
|
entities
|
Entities this recognizer can detect
TYPE:
|
nlp_artifacts
|
Output values from the NLP engine
TYPE:
|
regex_flags
|
regex flags to be used in regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
|
Source code in presidio_analyzer/pattern_recognizer.py
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
|
enhance_using_context
enhance_using_context(
text: str,
raw_recognizer_results: List[RecognizerResult],
other_raw_recognizer_results: List[RecognizerResult],
nlp_artifacts: NlpArtifacts,
context: Optional[List[str]] = None,
) -> List[RecognizerResult]
Enhance confidence score using context of the entity.
Override this method in derived class in case a custom logic is needed, otherwise return value will be equal to raw_results.
in case a result score is boosted, derived class need to update result.recognition_metadata[RecognizerResult.IS_SCORE_ENHANCED_BY_CONTEXT_KEY]
PARAMETER | DESCRIPTION |
---|---|
text
|
The actual text that was analyzed
TYPE:
|
raw_recognizer_results
|
This recognizer's results, to be updated based on recognizer specific context.
TYPE:
|
other_raw_recognizer_results
|
Other recognizer results matched in the given text to allow related entity context enhancement
TYPE:
|
nlp_artifacts
|
The nlp artifacts contains elements such as lemmatized tokens for better accuracy of the context enhancement process
TYPE:
|
context
|
list of context words
TYPE:
|
Source code in presidio_analyzer/entity_recognizer.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
|
get_supported_entities
get_supported_entities() -> List[str]
Return the list of entities this recognizer can identify.
RETURNS | DESCRIPTION |
---|---|
List[str]
|
A list of the supported entities by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
119 120 121 122 123 124 125 |
|
get_supported_language
get_supported_language() -> str
Return the language this recognizer can support.
RETURNS | DESCRIPTION |
---|---|
str
|
A list of the supported language by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
127 128 129 130 131 132 133 |
|
get_version
get_version() -> str
Return the version of this recognizer.
RETURNS | DESCRIPTION |
---|---|
str
|
The current version of this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
135 136 137 138 139 140 141 |
|
to_dict
to_dict() -> Dict
Serialize instance into a dictionary.
Source code in presidio_analyzer/pattern_recognizer.py
254 255 256 257 258 259 260 261 262 263 264 |
|
from_dict
classmethod
from_dict(entity_recognizer_dict: Dict) -> PatternRecognizer
Create instance from a serialized dict.
Source code in presidio_analyzer/pattern_recognizer.py
266 267 268 269 270 271 272 273 274 |
|
remove_duplicates
staticmethod
remove_duplicates(results: List[RecognizerResult]) -> List[RecognizerResult]
Remove duplicate results.
Remove duplicates in case the two results have identical start and ends and types.
PARAMETER | DESCRIPTION |
---|---|
results
|
List[RecognizerResult]
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
List[RecognizerResult] |
Source code in presidio_analyzer/entity_recognizer.py
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
|
sanitize_value
staticmethod
sanitize_value(text: str, replacement_pairs: List[Tuple[str, str]]) -> str
Cleanse the input string of the replacement pairs specified as argument.
PARAMETER | DESCRIPTION |
---|---|
text
|
input string
TYPE:
|
replacement_pairs
|
pairs of what has to be replaced with which value
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
cleansed string |
Source code in presidio_analyzer/entity_recognizer.py
200 201 202 203 204 205 206 207 208 209 210 211 |
|
invalidate_result
invalidate_result(pattern_text: str) -> Optional[bool]
Logic to check for result invalidation by running pruning logic.
For example, each SSN number group should not consist of all the same digits.
PARAMETER | DESCRIPTION |
---|---|
pattern_text
|
the text to validated. Only the part in text that was detected by the regex engine
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Optional[bool]
|
A bool indicating whether the result is invalidated |
Source code in presidio_analyzer/pattern_recognizer.py
127 128 129 130 131 132 133 134 135 136 137 |
|
build_regex_explanation
staticmethod
build_regex_explanation(
recognizer_name: str,
pattern_name: str,
pattern: str,
original_score: float,
validation_result: bool,
regex_flags: int,
) -> AnalysisExplanation
Construct an explanation for why this entity was detected.
PARAMETER | DESCRIPTION |
---|---|
recognizer_name
|
Name of recognizer detecting the entity
TYPE:
|
pattern_name
|
Regex pattern name which detected the entity
TYPE:
|
pattern
|
Regex pattern logic
TYPE:
|
original_score
|
Score given by the recognizer
TYPE:
|
validation_result
|
Whether validation was used and its result
TYPE:
|
regex_flags
|
Regex flags used in the regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
AnalysisExplanation
|
Analysis explanation |
Source code in presidio_analyzer/pattern_recognizer.py
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
|
validate_result
validate_result(pattern_text: str) -> bool
Validate the pattern logic e.g., by running checksum on a detected pattern.
PARAMETER | DESCRIPTION |
---|---|
pattern_text
|
the text to validated. Only the part in text that was detected by the regex engine
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
bool
|
A bool indicating whether the validation was successful. |
Source code in presidio_analyzer/predefined_recognizers/it_vat_code.py
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 |
|
MedicalLicenseRecognizer
Bases: PatternRecognizer
Recognize common Medical license numbers using regex + checksum.
PARAMETER | DESCRIPTION |
---|---|
patterns
|
List of patterns to be used by this recognizer
TYPE:
|
context
|
List of context words to increase confidence in detection
TYPE:
|
supported_language
|
Language this recognizer supports
TYPE:
|
supported_entity
|
The entity this recognizer can detect
TYPE:
|
replacement_pairs
|
List of tuples with potential replacement values for different strings to be used during pattern matching. This can allow a greater variety in input, for example by removing dashes or spaces.
TYPE:
|
METHOD | DESCRIPTION |
---|---|
analyze |
Analyzes text to detect PII using regular expressions or deny-lists. |
enhance_using_context |
Enhance confidence score using context of the entity. |
get_supported_entities |
Return the list of entities this recognizer can identify. |
get_supported_language |
Return the language this recognizer can support. |
get_version |
Return the version of this recognizer. |
to_dict |
Serialize instance into a dictionary. |
from_dict |
Create instance from a serialized dict. |
remove_duplicates |
Remove duplicate results. |
sanitize_value |
Cleanse the input string of the replacement pairs specified as argument. |
invalidate_result |
Logic to check for result invalidation by running pruning logic. |
build_regex_explanation |
Construct an explanation for why this entity was detected. |
Source code in presidio_analyzer/predefined_recognizers/medical_license_recognizer.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 |
|
id
property
id
Return a unique identifier of this recognizer.
analyze
analyze(
text: str,
entities: List[str],
nlp_artifacts: Optional[NlpArtifacts] = None,
regex_flags: Optional[int] = None,
) -> List[RecognizerResult]
Analyzes text to detect PII using regular expressions or deny-lists.
PARAMETER | DESCRIPTION |
---|---|
text
|
Text to be analyzed
TYPE:
|
entities
|
Entities this recognizer can detect
TYPE:
|
nlp_artifacts
|
Output values from the NLP engine
TYPE:
|
regex_flags
|
regex flags to be used in regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
|
Source code in presidio_analyzer/pattern_recognizer.py
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
|
enhance_using_context
enhance_using_context(
text: str,
raw_recognizer_results: List[RecognizerResult],
other_raw_recognizer_results: List[RecognizerResult],
nlp_artifacts: NlpArtifacts,
context: Optional[List[str]] = None,
) -> List[RecognizerResult]
Enhance confidence score using context of the entity.
Override this method in derived class in case a custom logic is needed, otherwise return value will be equal to raw_results.
in case a result score is boosted, derived class need to update result.recognition_metadata[RecognizerResult.IS_SCORE_ENHANCED_BY_CONTEXT_KEY]
PARAMETER | DESCRIPTION |
---|---|
text
|
The actual text that was analyzed
TYPE:
|
raw_recognizer_results
|
This recognizer's results, to be updated based on recognizer specific context.
TYPE:
|
other_raw_recognizer_results
|
Other recognizer results matched in the given text to allow related entity context enhancement
TYPE:
|
nlp_artifacts
|
The nlp artifacts contains elements such as lemmatized tokens for better accuracy of the context enhancement process
TYPE:
|
context
|
list of context words
TYPE:
|
Source code in presidio_analyzer/entity_recognizer.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
|
get_supported_entities
get_supported_entities() -> List[str]
Return the list of entities this recognizer can identify.
RETURNS | DESCRIPTION |
---|---|
List[str]
|
A list of the supported entities by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
119 120 121 122 123 124 125 |
|
get_supported_language
get_supported_language() -> str
Return the language this recognizer can support.
RETURNS | DESCRIPTION |
---|---|
str
|
A list of the supported language by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
127 128 129 130 131 132 133 |
|
get_version
get_version() -> str
Return the version of this recognizer.
RETURNS | DESCRIPTION |
---|---|
str
|
The current version of this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
135 136 137 138 139 140 141 |
|
to_dict
to_dict() -> Dict
Serialize instance into a dictionary.
Source code in presidio_analyzer/pattern_recognizer.py
254 255 256 257 258 259 260 261 262 263 264 |
|
from_dict
classmethod
from_dict(entity_recognizer_dict: Dict) -> PatternRecognizer
Create instance from a serialized dict.
Source code in presidio_analyzer/pattern_recognizer.py
266 267 268 269 270 271 272 273 274 |
|
remove_duplicates
staticmethod
remove_duplicates(results: List[RecognizerResult]) -> List[RecognizerResult]
Remove duplicate results.
Remove duplicates in case the two results have identical start and ends and types.
PARAMETER | DESCRIPTION |
---|---|
results
|
List[RecognizerResult]
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
List[RecognizerResult] |
Source code in presidio_analyzer/entity_recognizer.py
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
|
sanitize_value
staticmethod
sanitize_value(text: str, replacement_pairs: List[Tuple[str, str]]) -> str
Cleanse the input string of the replacement pairs specified as argument.
PARAMETER | DESCRIPTION |
---|---|
text
|
input string
TYPE:
|
replacement_pairs
|
pairs of what has to be replaced with which value
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
cleansed string |
Source code in presidio_analyzer/entity_recognizer.py
200 201 202 203 204 205 206 207 208 209 210 211 |
|
invalidate_result
invalidate_result(pattern_text: str) -> Optional[bool]
Logic to check for result invalidation by running pruning logic.
For example, each SSN number group should not consist of all the same digits.
PARAMETER | DESCRIPTION |
---|---|
pattern_text
|
the text to validated. Only the part in text that was detected by the regex engine
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Optional[bool]
|
A bool indicating whether the result is invalidated |
Source code in presidio_analyzer/pattern_recognizer.py
127 128 129 130 131 132 133 134 135 136 137 |
|
build_regex_explanation
staticmethod
build_regex_explanation(
recognizer_name: str,
pattern_name: str,
pattern: str,
original_score: float,
validation_result: bool,
regex_flags: int,
) -> AnalysisExplanation
Construct an explanation for why this entity was detected.
PARAMETER | DESCRIPTION |
---|---|
recognizer_name
|
Name of recognizer detecting the entity
TYPE:
|
pattern_name
|
Regex pattern name which detected the entity
TYPE:
|
pattern
|
Regex pattern logic
TYPE:
|
original_score
|
Score given by the recognizer
TYPE:
|
validation_result
|
Whether validation was used and its result
TYPE:
|
regex_flags
|
Regex flags used in the regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
AnalysisExplanation
|
Analysis explanation |
Source code in presidio_analyzer/pattern_recognizer.py
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
|
PhoneRecognizer
Bases: LocalRecognizer
Recognize multi-regional phone numbers.
Using python-phonenumbers, along with fixed and regional context words.
PARAMETER | DESCRIPTION |
---|---|
context
|
Base context words for enhancing the assurance scores.
TYPE:
|
supported_language
|
Language this recognizer supports
TYPE:
|
supported_regions
|
The regions for phone number matching and validation
DEFAULT:
|
leniency
|
The strictness level of phone number formats. Accepts values from 0 to 3, where 0 is the lenient and 3 is the most strictest.
TYPE:
|
METHOD | DESCRIPTION |
---|---|
enhance_using_context |
Enhance confidence score using context of the entity. |
get_supported_language |
Return the language this recognizer can support. |
get_version |
Return the version of this recognizer. |
to_dict |
Serialize self to dictionary. |
from_dict |
Create EntityRecognizer from a dict input. |
remove_duplicates |
Remove duplicate results. |
sanitize_value |
Cleanse the input string of the replacement pairs specified as argument. |
analyze |
Analyzes text to detect phone numbers using python-phonenumbers. |
Source code in presidio_analyzer/predefined_recognizers/phone_recognizer.py
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 |
|
id
property
id
Return a unique identifier of this recognizer.
enhance_using_context
enhance_using_context(
text: str,
raw_recognizer_results: List[RecognizerResult],
other_raw_recognizer_results: List[RecognizerResult],
nlp_artifacts: NlpArtifacts,
context: Optional[List[str]] = None,
) -> List[RecognizerResult]
Enhance confidence score using context of the entity.
Override this method in derived class in case a custom logic is needed, otherwise return value will be equal to raw_results.
in case a result score is boosted, derived class need to update result.recognition_metadata[RecognizerResult.IS_SCORE_ENHANCED_BY_CONTEXT_KEY]
PARAMETER | DESCRIPTION |
---|---|
text
|
The actual text that was analyzed
TYPE:
|
raw_recognizer_results
|
This recognizer's results, to be updated based on recognizer specific context.
TYPE:
|
other_raw_recognizer_results
|
Other recognizer results matched in the given text to allow related entity context enhancement
TYPE:
|
nlp_artifacts
|
The nlp artifacts contains elements such as lemmatized tokens for better accuracy of the context enhancement process
TYPE:
|
context
|
list of context words
TYPE:
|
Source code in presidio_analyzer/entity_recognizer.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
|
get_supported_language
get_supported_language() -> str
Return the language this recognizer can support.
RETURNS | DESCRIPTION |
---|---|
str
|
A list of the supported language by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
127 128 129 130 131 132 133 |
|
get_version
get_version() -> str
Return the version of this recognizer.
RETURNS | DESCRIPTION |
---|---|
str
|
The current version of this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
135 136 137 138 139 140 141 |
|
to_dict
to_dict() -> Dict
Serialize self to dictionary.
RETURNS | DESCRIPTION |
---|---|
Dict
|
a dictionary |
Source code in presidio_analyzer/entity_recognizer.py
143 144 145 146 147 148 149 150 151 152 153 154 155 |
|
from_dict
classmethod
from_dict(entity_recognizer_dict: Dict) -> EntityRecognizer
Create EntityRecognizer from a dict input.
PARAMETER | DESCRIPTION |
---|---|
entity_recognizer_dict
|
Dict containing keys and values for instantiation
TYPE:
|
Source code in presidio_analyzer/entity_recognizer.py
157 158 159 160 161 162 163 164 |
|
remove_duplicates
staticmethod
remove_duplicates(results: List[RecognizerResult]) -> List[RecognizerResult]
Remove duplicate results.
Remove duplicates in case the two results have identical start and ends and types.
PARAMETER | DESCRIPTION |
---|---|
results
|
List[RecognizerResult]
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
List[RecognizerResult] |
Source code in presidio_analyzer/entity_recognizer.py
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
|
sanitize_value
staticmethod
sanitize_value(text: str, replacement_pairs: List[Tuple[str, str]]) -> str
Cleanse the input string of the replacement pairs specified as argument.
PARAMETER | DESCRIPTION |
---|---|
text
|
input string
TYPE:
|
replacement_pairs
|
pairs of what has to be replaced with which value
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
cleansed string |
Source code in presidio_analyzer/entity_recognizer.py
200 201 202 203 204 205 206 207 208 209 210 211 |
|
analyze
analyze(
text: str, entities: List[str], nlp_artifacts: NlpArtifacts = None
) -> List[RecognizerResult]
Analyzes text to detect phone numbers using python-phonenumbers.
Iterates over entities, fetching regions, then matching regional phone numbers patterns against the text.
PARAMETER | DESCRIPTION |
---|---|
text
|
Text to be analyzed
TYPE:
|
entities
|
Entities this recognizer can detect
TYPE:
|
nlp_artifacts
|
Additional metadata from the NLP engine
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
List of phone numbers RecognizerResults |
Source code in presidio_analyzer/predefined_recognizers/phone_recognizer.py
53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 |
|
PlPeselRecognizer
Bases: PatternRecognizer
Recognize PESEL number using regex and checksum.
For more information about PESEL: https://en.wikipedia.org/wiki/PESEL
PARAMETER | DESCRIPTION |
---|---|
patterns
|
List of patterns to be used by this recognizer
TYPE:
|
context
|
List of context words to increase confidence in detection
TYPE:
|
supported_language
|
Language this recognizer supports
TYPE:
|
supported_entity
|
The entity this recognizer can detect
TYPE:
|
METHOD | DESCRIPTION |
---|---|
analyze |
Analyzes text to detect PII using regular expressions or deny-lists. |
enhance_using_context |
Enhance confidence score using context of the entity. |
get_supported_entities |
Return the list of entities this recognizer can identify. |
get_supported_language |
Return the language this recognizer can support. |
get_version |
Return the version of this recognizer. |
to_dict |
Serialize instance into a dictionary. |
from_dict |
Create instance from a serialized dict. |
remove_duplicates |
Remove duplicate results. |
sanitize_value |
Cleanse the input string of the replacement pairs specified as argument. |
invalidate_result |
Logic to check for result invalidation by running pruning logic. |
build_regex_explanation |
Construct an explanation for why this entity was detected. |
Source code in presidio_analyzer/predefined_recognizers/pl_pesel_recognizer.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
|
id
property
id
Return a unique identifier of this recognizer.
analyze
analyze(
text: str,
entities: List[str],
nlp_artifacts: Optional[NlpArtifacts] = None,
regex_flags: Optional[int] = None,
) -> List[RecognizerResult]
Analyzes text to detect PII using regular expressions or deny-lists.
PARAMETER | DESCRIPTION |
---|---|
text
|
Text to be analyzed
TYPE:
|
entities
|
Entities this recognizer can detect
TYPE:
|
nlp_artifacts
|
Output values from the NLP engine
TYPE:
|
regex_flags
|
regex flags to be used in regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
|
Source code in presidio_analyzer/pattern_recognizer.py
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
|
enhance_using_context
enhance_using_context(
text: str,
raw_recognizer_results: List[RecognizerResult],
other_raw_recognizer_results: List[RecognizerResult],
nlp_artifacts: NlpArtifacts,
context: Optional[List[str]] = None,
) -> List[RecognizerResult]
Enhance confidence score using context of the entity.
Override this method in derived class in case a custom logic is needed, otherwise return value will be equal to raw_results.
in case a result score is boosted, derived class need to update result.recognition_metadata[RecognizerResult.IS_SCORE_ENHANCED_BY_CONTEXT_KEY]
PARAMETER | DESCRIPTION |
---|---|
text
|
The actual text that was analyzed
TYPE:
|
raw_recognizer_results
|
This recognizer's results, to be updated based on recognizer specific context.
TYPE:
|
other_raw_recognizer_results
|
Other recognizer results matched in the given text to allow related entity context enhancement
TYPE:
|
nlp_artifacts
|
The nlp artifacts contains elements such as lemmatized tokens for better accuracy of the context enhancement process
TYPE:
|
context
|
list of context words
TYPE:
|
Source code in presidio_analyzer/entity_recognizer.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
|
get_supported_entities
get_supported_entities() -> List[str]
Return the list of entities this recognizer can identify.
RETURNS | DESCRIPTION |
---|---|
List[str]
|
A list of the supported entities by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
119 120 121 122 123 124 125 |
|
get_supported_language
get_supported_language() -> str
Return the language this recognizer can support.
RETURNS | DESCRIPTION |
---|---|
str
|
A list of the supported language by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
127 128 129 130 131 132 133 |
|
get_version
get_version() -> str
Return the version of this recognizer.
RETURNS | DESCRIPTION |
---|---|
str
|
The current version of this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
135 136 137 138 139 140 141 |
|
to_dict
to_dict() -> Dict
Serialize instance into a dictionary.
Source code in presidio_analyzer/pattern_recognizer.py
254 255 256 257 258 259 260 261 262 263 264 |
|
from_dict
classmethod
from_dict(entity_recognizer_dict: Dict) -> PatternRecognizer
Create instance from a serialized dict.
Source code in presidio_analyzer/pattern_recognizer.py
266 267 268 269 270 271 272 273 274 |
|
remove_duplicates
staticmethod
remove_duplicates(results: List[RecognizerResult]) -> List[RecognizerResult]
Remove duplicate results.
Remove duplicates in case the two results have identical start and ends and types.
PARAMETER | DESCRIPTION |
---|---|
results
|
List[RecognizerResult]
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
List[RecognizerResult] |
Source code in presidio_analyzer/entity_recognizer.py
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
|
sanitize_value
staticmethod
sanitize_value(text: str, replacement_pairs: List[Tuple[str, str]]) -> str
Cleanse the input string of the replacement pairs specified as argument.
PARAMETER | DESCRIPTION |
---|---|
text
|
input string
TYPE:
|
replacement_pairs
|
pairs of what has to be replaced with which value
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
cleansed string |
Source code in presidio_analyzer/entity_recognizer.py
200 201 202 203 204 205 206 207 208 209 210 211 |
|
invalidate_result
invalidate_result(pattern_text: str) -> Optional[bool]
Logic to check for result invalidation by running pruning logic.
For example, each SSN number group should not consist of all the same digits.
PARAMETER | DESCRIPTION |
---|---|
pattern_text
|
the text to validated. Only the part in text that was detected by the regex engine
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Optional[bool]
|
A bool indicating whether the result is invalidated |
Source code in presidio_analyzer/pattern_recognizer.py
127 128 129 130 131 132 133 134 135 136 137 |
|
build_regex_explanation
staticmethod
build_regex_explanation(
recognizer_name: str,
pattern_name: str,
pattern: str,
original_score: float,
validation_result: bool,
regex_flags: int,
) -> AnalysisExplanation
Construct an explanation for why this entity was detected.
PARAMETER | DESCRIPTION |
---|---|
recognizer_name
|
Name of recognizer detecting the entity
TYPE:
|
pattern_name
|
Regex pattern name which detected the entity
TYPE:
|
pattern
|
Regex pattern logic
TYPE:
|
original_score
|
Score given by the recognizer
TYPE:
|
validation_result
|
Whether validation was used and its result
TYPE:
|
regex_flags
|
Regex flags used in the regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
AnalysisExplanation
|
Analysis explanation |
Source code in presidio_analyzer/pattern_recognizer.py
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
|
SgFinRecognizer
Bases: PatternRecognizer
Recognize SG FIN/NRIC number using regex.
PARAMETER | DESCRIPTION |
---|---|
patterns
|
List of patterns to be used by this recognizer
TYPE:
|
context
|
List of context words to increase confidence in detection
TYPE:
|
supported_language
|
Language this recognizer supports
TYPE:
|
supported_entity
|
The entity this recognizer can detect
TYPE:
|
METHOD | DESCRIPTION |
---|---|
analyze |
Analyzes text to detect PII using regular expressions or deny-lists. |
enhance_using_context |
Enhance confidence score using context of the entity. |
get_supported_entities |
Return the list of entities this recognizer can identify. |
get_supported_language |
Return the language this recognizer can support. |
get_version |
Return the version of this recognizer. |
to_dict |
Serialize instance into a dictionary. |
from_dict |
Create instance from a serialized dict. |
remove_duplicates |
Remove duplicate results. |
sanitize_value |
Cleanse the input string of the replacement pairs specified as argument. |
validate_result |
Validate the pattern logic e.g., by running checksum on a detected pattern. |
invalidate_result |
Logic to check for result invalidation by running pruning logic. |
build_regex_explanation |
Construct an explanation for why this entity was detected. |
Source code in presidio_analyzer/predefined_recognizers/sg_fin_recognizer.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
|
id
property
id
Return a unique identifier of this recognizer.
analyze
analyze(
text: str,
entities: List[str],
nlp_artifacts: Optional[NlpArtifacts] = None,
regex_flags: Optional[int] = None,
) -> List[RecognizerResult]
Analyzes text to detect PII using regular expressions or deny-lists.
PARAMETER | DESCRIPTION |
---|---|
text
|
Text to be analyzed
TYPE:
|
entities
|
Entities this recognizer can detect
TYPE:
|
nlp_artifacts
|
Output values from the NLP engine
TYPE:
|
regex_flags
|
regex flags to be used in regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
|
Source code in presidio_analyzer/pattern_recognizer.py
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
|
enhance_using_context
enhance_using_context(
text: str,
raw_recognizer_results: List[RecognizerResult],
other_raw_recognizer_results: List[RecognizerResult],
nlp_artifacts: NlpArtifacts,
context: Optional[List[str]] = None,
) -> List[RecognizerResult]
Enhance confidence score using context of the entity.
Override this method in derived class in case a custom logic is needed, otherwise return value will be equal to raw_results.
in case a result score is boosted, derived class need to update result.recognition_metadata[RecognizerResult.IS_SCORE_ENHANCED_BY_CONTEXT_KEY]
PARAMETER | DESCRIPTION |
---|---|
text
|
The actual text that was analyzed
TYPE:
|
raw_recognizer_results
|
This recognizer's results, to be updated based on recognizer specific context.
TYPE:
|
other_raw_recognizer_results
|
Other recognizer results matched in the given text to allow related entity context enhancement
TYPE:
|
nlp_artifacts
|
The nlp artifacts contains elements such as lemmatized tokens for better accuracy of the context enhancement process
TYPE:
|
context
|
list of context words
TYPE:
|
Source code in presidio_analyzer/entity_recognizer.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
|
get_supported_entities
get_supported_entities() -> List[str]
Return the list of entities this recognizer can identify.
RETURNS | DESCRIPTION |
---|---|
List[str]
|
A list of the supported entities by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
119 120 121 122 123 124 125 |
|
get_supported_language
get_supported_language() -> str
Return the language this recognizer can support.
RETURNS | DESCRIPTION |
---|---|
str
|
A list of the supported language by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
127 128 129 130 131 132 133 |
|
get_version
get_version() -> str
Return the version of this recognizer.
RETURNS | DESCRIPTION |
---|---|
str
|
The current version of this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
135 136 137 138 139 140 141 |
|
to_dict
to_dict() -> Dict
Serialize instance into a dictionary.
Source code in presidio_analyzer/pattern_recognizer.py
254 255 256 257 258 259 260 261 262 263 264 |
|
from_dict
classmethod
from_dict(entity_recognizer_dict: Dict) -> PatternRecognizer
Create instance from a serialized dict.
Source code in presidio_analyzer/pattern_recognizer.py
266 267 268 269 270 271 272 273 274 |
|
remove_duplicates
staticmethod
remove_duplicates(results: List[RecognizerResult]) -> List[RecognizerResult]
Remove duplicate results.
Remove duplicates in case the two results have identical start and ends and types.
PARAMETER | DESCRIPTION |
---|---|
results
|
List[RecognizerResult]
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
List[RecognizerResult] |
Source code in presidio_analyzer/entity_recognizer.py
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
|
sanitize_value
staticmethod
sanitize_value(text: str, replacement_pairs: List[Tuple[str, str]]) -> str
Cleanse the input string of the replacement pairs specified as argument.
PARAMETER | DESCRIPTION |
---|---|
text
|
input string
TYPE:
|
replacement_pairs
|
pairs of what has to be replaced with which value
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
cleansed string |
Source code in presidio_analyzer/entity_recognizer.py
200 201 202 203 204 205 206 207 208 209 210 211 |
|
validate_result
validate_result(pattern_text: str) -> Optional[bool]
Validate the pattern logic e.g., by running checksum on a detected pattern.
PARAMETER | DESCRIPTION |
---|---|
pattern_text
|
the text to validated. Only the part in text that was detected by the regex engine
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Optional[bool]
|
A bool indicating whether the validation was successful. |
Source code in presidio_analyzer/pattern_recognizer.py
117 118 119 120 121 122 123 124 125 |
|
invalidate_result
invalidate_result(pattern_text: str) -> Optional[bool]
Logic to check for result invalidation by running pruning logic.
For example, each SSN number group should not consist of all the same digits.
PARAMETER | DESCRIPTION |
---|---|
pattern_text
|
the text to validated. Only the part in text that was detected by the regex engine
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Optional[bool]
|
A bool indicating whether the result is invalidated |
Source code in presidio_analyzer/pattern_recognizer.py
127 128 129 130 131 132 133 134 135 136 137 |
|
build_regex_explanation
staticmethod
build_regex_explanation(
recognizer_name: str,
pattern_name: str,
pattern: str,
original_score: float,
validation_result: bool,
regex_flags: int,
) -> AnalysisExplanation
Construct an explanation for why this entity was detected.
PARAMETER | DESCRIPTION |
---|---|
recognizer_name
|
Name of recognizer detecting the entity
TYPE:
|
pattern_name
|
Regex pattern name which detected the entity
TYPE:
|
pattern
|
Regex pattern logic
TYPE:
|
original_score
|
Score given by the recognizer
TYPE:
|
validation_result
|
Whether validation was used and its result
TYPE:
|
regex_flags
|
Regex flags used in the regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
AnalysisExplanation
|
Analysis explanation |
Source code in presidio_analyzer/pattern_recognizer.py
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
|
SgUenRecognizer
Bases: PatternRecognizer
Recognize Singapore UEN (Unique Entity Number) using regex.
PARAMETER | DESCRIPTION |
---|---|
patterns
|
List of patterns to be used by this recognizer
TYPE:
|
context
|
List of context words to increase confidence in detection
TYPE:
|
supported_language
|
Language this recognizer supports
TYPE:
|
supported_entity
|
The entity this recognizer can detect
TYPE:
|
METHOD | DESCRIPTION |
---|---|
analyze |
Analyzes text to detect PII using regular expressions or deny-lists. |
enhance_using_context |
Enhance confidence score using context of the entity. |
get_supported_entities |
Return the list of entities this recognizer can identify. |
get_supported_language |
Return the language this recognizer can support. |
get_version |
Return the version of this recognizer. |
to_dict |
Serialize instance into a dictionary. |
from_dict |
Create instance from a serialized dict. |
remove_duplicates |
Remove duplicate results. |
sanitize_value |
Cleanse the input string of the replacement pairs specified as argument. |
invalidate_result |
Logic to check for result invalidation by running pruning logic. |
build_regex_explanation |
Construct an explanation for why this entity was detected. |
validate_result |
Validate the pattern logic e.g., by running checksum on a detected pattern. |
validate_uen_format_a |
Validate the UEN format A using checksum. |
validate_uen_format_b |
Validate the UEN format B using checksum. |
validate_uen_format_c |
Validate the UEN format C using checksum. |
Source code in presidio_analyzer/predefined_recognizers/sg_uen_recognizer.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 |
|
id
property
id
Return a unique identifier of this recognizer.
analyze
analyze(
text: str,
entities: List[str],
nlp_artifacts: Optional[NlpArtifacts] = None,
regex_flags: Optional[int] = None,
) -> List[RecognizerResult]
Analyzes text to detect PII using regular expressions or deny-lists.
PARAMETER | DESCRIPTION |
---|---|
text
|
Text to be analyzed
TYPE:
|
entities
|
Entities this recognizer can detect
TYPE:
|
nlp_artifacts
|
Output values from the NLP engine
TYPE:
|
regex_flags
|
regex flags to be used in regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
|
Source code in presidio_analyzer/pattern_recognizer.py
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
|
enhance_using_context
enhance_using_context(
text: str,
raw_recognizer_results: List[RecognizerResult],
other_raw_recognizer_results: List[RecognizerResult],
nlp_artifacts: NlpArtifacts,
context: Optional[List[str]] = None,
) -> List[RecognizerResult]
Enhance confidence score using context of the entity.
Override this method in derived class in case a custom logic is needed, otherwise return value will be equal to raw_results.
in case a result score is boosted, derived class need to update result.recognition_metadata[RecognizerResult.IS_SCORE_ENHANCED_BY_CONTEXT_KEY]
PARAMETER | DESCRIPTION |
---|---|
text
|
The actual text that was analyzed
TYPE:
|
raw_recognizer_results
|
This recognizer's results, to be updated based on recognizer specific context.
TYPE:
|
other_raw_recognizer_results
|
Other recognizer results matched in the given text to allow related entity context enhancement
TYPE:
|
nlp_artifacts
|
The nlp artifacts contains elements such as lemmatized tokens for better accuracy of the context enhancement process
TYPE:
|
context
|
list of context words
TYPE:
|
Source code in presidio_analyzer/entity_recognizer.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
|
get_supported_entities
get_supported_entities() -> List[str]
Return the list of entities this recognizer can identify.
RETURNS | DESCRIPTION |
---|---|
List[str]
|
A list of the supported entities by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
119 120 121 122 123 124 125 |
|
get_supported_language
get_supported_language() -> str
Return the language this recognizer can support.
RETURNS | DESCRIPTION |
---|---|
str
|
A list of the supported language by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
127 128 129 130 131 132 133 |
|
get_version
get_version() -> str
Return the version of this recognizer.
RETURNS | DESCRIPTION |
---|---|
str
|
The current version of this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
135 136 137 138 139 140 141 |
|
to_dict
to_dict() -> Dict
Serialize instance into a dictionary.
Source code in presidio_analyzer/pattern_recognizer.py
254 255 256 257 258 259 260 261 262 263 264 |
|
from_dict
classmethod
from_dict(entity_recognizer_dict: Dict) -> PatternRecognizer
Create instance from a serialized dict.
Source code in presidio_analyzer/pattern_recognizer.py
266 267 268 269 270 271 272 273 274 |
|
remove_duplicates
staticmethod
remove_duplicates(results: List[RecognizerResult]) -> List[RecognizerResult]
Remove duplicate results.
Remove duplicates in case the two results have identical start and ends and types.
PARAMETER | DESCRIPTION |
---|---|
results
|
List[RecognizerResult]
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
List[RecognizerResult] |
Source code in presidio_analyzer/entity_recognizer.py
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
|
sanitize_value
staticmethod
sanitize_value(text: str, replacement_pairs: List[Tuple[str, str]]) -> str
Cleanse the input string of the replacement pairs specified as argument.
PARAMETER | DESCRIPTION |
---|---|
text
|
input string
TYPE:
|
replacement_pairs
|
pairs of what has to be replaced with which value
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
cleansed string |
Source code in presidio_analyzer/entity_recognizer.py
200 201 202 203 204 205 206 207 208 209 210 211 |
|
invalidate_result
invalidate_result(pattern_text: str) -> Optional[bool]
Logic to check for result invalidation by running pruning logic.
For example, each SSN number group should not consist of all the same digits.
PARAMETER | DESCRIPTION |
---|---|
pattern_text
|
the text to validated. Only the part in text that was detected by the regex engine
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Optional[bool]
|
A bool indicating whether the result is invalidated |
Source code in presidio_analyzer/pattern_recognizer.py
127 128 129 130 131 132 133 134 135 136 137 |
|
build_regex_explanation
staticmethod
build_regex_explanation(
recognizer_name: str,
pattern_name: str,
pattern: str,
original_score: float,
validation_result: bool,
regex_flags: int,
) -> AnalysisExplanation
Construct an explanation for why this entity was detected.
PARAMETER | DESCRIPTION |
---|---|
recognizer_name
|
Name of recognizer detecting the entity
TYPE:
|
pattern_name
|
Regex pattern name which detected the entity
TYPE:
|
pattern
|
Regex pattern logic
TYPE:
|
original_score
|
Score given by the recognizer
TYPE:
|
validation_result
|
Whether validation was used and its result
TYPE:
|
regex_flags
|
Regex flags used in the regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
AnalysisExplanation
|
Analysis explanation |
Source code in presidio_analyzer/pattern_recognizer.py
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
|
validate_result
validate_result(pattern_text: str) -> Optional[bool]
Validate the pattern logic e.g., by running checksum on a detected pattern.
PARAMETER | DESCRIPTION |
---|---|
pattern_text
|
the text to validated. Only the part in text that was detected by the regex engine
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Optional[bool]
|
A bool indicating whether the validation was successful. |
Source code in presidio_analyzer/predefined_recognizers/sg_uen_recognizer.py
96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 |
|
validate_uen_format_a
staticmethod
validate_uen_format_a(uen: str) -> bool
Validate the UEN format A using checksum.
PARAMETER | DESCRIPTION |
---|---|
uen
|
The UEN to validate.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
bool
|
True if the UEN is valid according to its respective format, False otherwise. |
Source code in presidio_analyzer/predefined_recognizers/sg_uen_recognizer.py
117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 |
|
validate_uen_format_b
staticmethod
validate_uen_format_b(uen: str) -> bool
Validate the UEN format B using checksum.
PARAMETER | DESCRIPTION |
---|---|
uen
|
The UEN to validate.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
bool
|
True if the UEN is valid according to its respective format, False otherwise. |
Source code in presidio_analyzer/predefined_recognizers/sg_uen_recognizer.py
136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 |
|
validate_uen_format_c
staticmethod
validate_uen_format_c(uen: str) -> bool
Validate the UEN format C using checksum.
PARAMETER | DESCRIPTION |
---|---|
uen
|
The UEN to validate.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
bool
|
True if the UEN is valid according to its respective format, False otherwise. |
Source code in presidio_analyzer/predefined_recognizers/sg_uen_recognizer.py
160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 |
|
SpacyRecognizer
Bases: LocalRecognizer
Recognize PII entities using a spaCy NLP model.
Since the spaCy pipeline is ran by the AnalyzerEngine/SpacyNlpEngine,
this recognizer only extracts the entities from the NlpArtifacts
and returns them.
METHOD | DESCRIPTION |
---|---|
enhance_using_context |
Enhance confidence score using context of the entity. |
get_supported_entities |
Return the list of entities this recognizer can identify. |
get_supported_language |
Return the language this recognizer can support. |
get_version |
Return the version of this recognizer. |
to_dict |
Serialize self to dictionary. |
from_dict |
Create EntityRecognizer from a dict input. |
remove_duplicates |
Remove duplicate results. |
sanitize_value |
Cleanse the input string of the replacement pairs specified as argument. |
build_explanation |
Create explanation for why this result was detected. |
Source code in presidio_analyzer/predefined_recognizers/spacy_recognizer.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 |
|
id
property
id
Return a unique identifier of this recognizer.
enhance_using_context
enhance_using_context(
text: str,
raw_recognizer_results: List[RecognizerResult],
other_raw_recognizer_results: List[RecognizerResult],
nlp_artifacts: NlpArtifacts,
context: Optional[List[str]] = None,
) -> List[RecognizerResult]
Enhance confidence score using context of the entity.
Override this method in derived class in case a custom logic is needed, otherwise return value will be equal to raw_results.
in case a result score is boosted, derived class need to update result.recognition_metadata[RecognizerResult.IS_SCORE_ENHANCED_BY_CONTEXT_KEY]
PARAMETER | DESCRIPTION |
---|---|
text
|
The actual text that was analyzed
TYPE:
|
raw_recognizer_results
|
This recognizer's results, to be updated based on recognizer specific context.
TYPE:
|
other_raw_recognizer_results
|
Other recognizer results matched in the given text to allow related entity context enhancement
TYPE:
|
nlp_artifacts
|
The nlp artifacts contains elements such as lemmatized tokens for better accuracy of the context enhancement process
TYPE:
|
context
|
list of context words
TYPE:
|
Source code in presidio_analyzer/entity_recognizer.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
|
get_supported_entities
get_supported_entities() -> List[str]
Return the list of entities this recognizer can identify.
RETURNS | DESCRIPTION |
---|---|
List[str]
|
A list of the supported entities by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
119 120 121 122 123 124 125 |
|
get_supported_language
get_supported_language() -> str
Return the language this recognizer can support.
RETURNS | DESCRIPTION |
---|---|
str
|
A list of the supported language by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
127 128 129 130 131 132 133 |
|
get_version
get_version() -> str
Return the version of this recognizer.
RETURNS | DESCRIPTION |
---|---|
str
|
The current version of this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
135 136 137 138 139 140 141 |
|
to_dict
to_dict() -> Dict
Serialize self to dictionary.
RETURNS | DESCRIPTION |
---|---|
Dict
|
a dictionary |
Source code in presidio_analyzer/entity_recognizer.py
143 144 145 146 147 148 149 150 151 152 153 154 155 |
|
from_dict
classmethod
from_dict(entity_recognizer_dict: Dict) -> EntityRecognizer
Create EntityRecognizer from a dict input.
PARAMETER | DESCRIPTION |
---|---|
entity_recognizer_dict
|
Dict containing keys and values for instantiation
TYPE:
|
Source code in presidio_analyzer/entity_recognizer.py
157 158 159 160 161 162 163 164 |
|
remove_duplicates
staticmethod
remove_duplicates(results: List[RecognizerResult]) -> List[RecognizerResult]
Remove duplicate results.
Remove duplicates in case the two results have identical start and ends and types.
PARAMETER | DESCRIPTION |
---|---|
results
|
List[RecognizerResult]
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
List[RecognizerResult] |
Source code in presidio_analyzer/entity_recognizer.py
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
|
sanitize_value
staticmethod
sanitize_value(text: str, replacement_pairs: List[Tuple[str, str]]) -> str
Cleanse the input string of the replacement pairs specified as argument.
PARAMETER | DESCRIPTION |
---|---|
text
|
input string
TYPE:
|
replacement_pairs
|
pairs of what has to be replaced with which value
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
cleansed string |
Source code in presidio_analyzer/entity_recognizer.py
200 201 202 203 204 205 206 207 208 209 210 211 |
|
build_explanation
build_explanation(
original_score: float, explanation: str
) -> AnalysisExplanation
Create explanation for why this result was detected.
PARAMETER | DESCRIPTION |
---|---|
original_score
|
Score given by this recognizer
TYPE:
|
explanation
|
Explanation string
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
AnalysisExplanation
|
|
Source code in presidio_analyzer/predefined_recognizers/spacy_recognizer.py
79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 |
|
StanzaRecognizer
Bases: SpacyRecognizer
Recognize entities using the Stanza NLP package.
See https://stanfordnlp.github.io/stanza/. Uses the spaCy-Stanza package (https://github.com/explosion/spacy-stanza) to align Stanza's interface with spaCy's
METHOD | DESCRIPTION |
---|---|
enhance_using_context |
Enhance confidence score using context of the entity. |
get_supported_entities |
Return the list of entities this recognizer can identify. |
get_supported_language |
Return the language this recognizer can support. |
get_version |
Return the version of this recognizer. |
to_dict |
Serialize self to dictionary. |
from_dict |
Create EntityRecognizer from a dict input. |
remove_duplicates |
Remove duplicate results. |
sanitize_value |
Cleanse the input string of the replacement pairs specified as argument. |
build_explanation |
Create explanation for why this result was detected. |
Source code in presidio_analyzer/predefined_recognizers/stanza_recognizer.py
4 5 6 7 8 9 10 11 12 13 14 15 |
|
id
property
id
Return a unique identifier of this recognizer.
enhance_using_context
enhance_using_context(
text: str,
raw_recognizer_results: List[RecognizerResult],
other_raw_recognizer_results: List[RecognizerResult],
nlp_artifacts: NlpArtifacts,
context: Optional[List[str]] = None,
) -> List[RecognizerResult]
Enhance confidence score using context of the entity.
Override this method in derived class in case a custom logic is needed, otherwise return value will be equal to raw_results.
in case a result score is boosted, derived class need to update result.recognition_metadata[RecognizerResult.IS_SCORE_ENHANCED_BY_CONTEXT_KEY]
PARAMETER | DESCRIPTION |
---|---|
text
|
The actual text that was analyzed
TYPE:
|
raw_recognizer_results
|
This recognizer's results, to be updated based on recognizer specific context.
TYPE:
|
other_raw_recognizer_results
|
Other recognizer results matched in the given text to allow related entity context enhancement
TYPE:
|
nlp_artifacts
|
The nlp artifacts contains elements such as lemmatized tokens for better accuracy of the context enhancement process
TYPE:
|
context
|
list of context words
TYPE:
|
Source code in presidio_analyzer/entity_recognizer.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
|
get_supported_entities
get_supported_entities() -> List[str]
Return the list of entities this recognizer can identify.
RETURNS | DESCRIPTION |
---|---|
List[str]
|
A list of the supported entities by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
119 120 121 122 123 124 125 |
|
get_supported_language
get_supported_language() -> str
Return the language this recognizer can support.
RETURNS | DESCRIPTION |
---|---|
str
|
A list of the supported language by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
127 128 129 130 131 132 133 |
|
get_version
get_version() -> str
Return the version of this recognizer.
RETURNS | DESCRIPTION |
---|---|
str
|
The current version of this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
135 136 137 138 139 140 141 |
|
to_dict
to_dict() -> Dict
Serialize self to dictionary.
RETURNS | DESCRIPTION |
---|---|
Dict
|
a dictionary |
Source code in presidio_analyzer/entity_recognizer.py
143 144 145 146 147 148 149 150 151 152 153 154 155 |
|
from_dict
classmethod
from_dict(entity_recognizer_dict: Dict) -> EntityRecognizer
Create EntityRecognizer from a dict input.
PARAMETER | DESCRIPTION |
---|---|
entity_recognizer_dict
|
Dict containing keys and values for instantiation
TYPE:
|
Source code in presidio_analyzer/entity_recognizer.py
157 158 159 160 161 162 163 164 |
|
remove_duplicates
staticmethod
remove_duplicates(results: List[RecognizerResult]) -> List[RecognizerResult]
Remove duplicate results.
Remove duplicates in case the two results have identical start and ends and types.
PARAMETER | DESCRIPTION |
---|---|
results
|
List[RecognizerResult]
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
List[RecognizerResult] |
Source code in presidio_analyzer/entity_recognizer.py
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
|
sanitize_value
staticmethod
sanitize_value(text: str, replacement_pairs: List[Tuple[str, str]]) -> str
Cleanse the input string of the replacement pairs specified as argument.
PARAMETER | DESCRIPTION |
---|---|
text
|
input string
TYPE:
|
replacement_pairs
|
pairs of what has to be replaced with which value
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
cleansed string |
Source code in presidio_analyzer/entity_recognizer.py
200 201 202 203 204 205 206 207 208 209 210 211 |
|
build_explanation
build_explanation(
original_score: float, explanation: str
) -> AnalysisExplanation
Create explanation for why this result was detected.
PARAMETER | DESCRIPTION |
---|---|
original_score
|
Score given by this recognizer
TYPE:
|
explanation
|
Explanation string
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
AnalysisExplanation
|
|
Source code in presidio_analyzer/predefined_recognizers/spacy_recognizer.py
79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 |
|
NhsRecognizer
Bases: PatternRecognizer
Recognizes NHS number using regex and checksum.
PARAMETER | DESCRIPTION |
---|---|
patterns
|
List of patterns to be used by this recognizer
TYPE:
|
context
|
List of context words to increase confidence in detection
TYPE:
|
supported_language
|
Language this recognizer supports
TYPE:
|
supported_entity
|
The entity this recognizer can detect
TYPE:
|
replacement_pairs
|
List of tuples with potential replacement values for different strings to be used during pattern matching. This can allow a greater variety in input, for example by removing dashes or spaces.
TYPE:
|
METHOD | DESCRIPTION |
---|---|
analyze |
Analyzes text to detect PII using regular expressions or deny-lists. |
enhance_using_context |
Enhance confidence score using context of the entity. |
get_supported_entities |
Return the list of entities this recognizer can identify. |
get_supported_language |
Return the language this recognizer can support. |
get_version |
Return the version of this recognizer. |
to_dict |
Serialize instance into a dictionary. |
from_dict |
Create instance from a serialized dict. |
remove_duplicates |
Remove duplicate results. |
sanitize_value |
Cleanse the input string of the replacement pairs specified as argument. |
invalidate_result |
Logic to check for result invalidation by running pruning logic. |
build_regex_explanation |
Construct an explanation for why this entity was detected. |
validate_result |
Validate the pattern logic e.g., by running checksum on a detected pattern. |
Source code in presidio_analyzer/predefined_recognizers/uk_nhs_recognizer.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 |
|
id
property
id
Return a unique identifier of this recognizer.
analyze
analyze(
text: str,
entities: List[str],
nlp_artifacts: Optional[NlpArtifacts] = None,
regex_flags: Optional[int] = None,
) -> List[RecognizerResult]
Analyzes text to detect PII using regular expressions or deny-lists.
PARAMETER | DESCRIPTION |
---|---|
text
|
Text to be analyzed
TYPE:
|
entities
|
Entities this recognizer can detect
TYPE:
|
nlp_artifacts
|
Output values from the NLP engine
TYPE:
|
regex_flags
|
regex flags to be used in regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
|
Source code in presidio_analyzer/pattern_recognizer.py
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
|
enhance_using_context
enhance_using_context(
text: str,
raw_recognizer_results: List[RecognizerResult],
other_raw_recognizer_results: List[RecognizerResult],
nlp_artifacts: NlpArtifacts,
context: Optional[List[str]] = None,
) -> List[RecognizerResult]
Enhance confidence score using context of the entity.
Override this method in derived class in case a custom logic is needed, otherwise return value will be equal to raw_results.
in case a result score is boosted, derived class need to update result.recognition_metadata[RecognizerResult.IS_SCORE_ENHANCED_BY_CONTEXT_KEY]
PARAMETER | DESCRIPTION |
---|---|
text
|
The actual text that was analyzed
TYPE:
|
raw_recognizer_results
|
This recognizer's results, to be updated based on recognizer specific context.
TYPE:
|
other_raw_recognizer_results
|
Other recognizer results matched in the given text to allow related entity context enhancement
TYPE:
|
nlp_artifacts
|
The nlp artifacts contains elements such as lemmatized tokens for better accuracy of the context enhancement process
TYPE:
|
context
|
list of context words
TYPE:
|
Source code in presidio_analyzer/entity_recognizer.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
|
get_supported_entities
get_supported_entities() -> List[str]
Return the list of entities this recognizer can identify.
RETURNS | DESCRIPTION |
---|---|
List[str]
|
A list of the supported entities by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
119 120 121 122 123 124 125 |
|
get_supported_language
get_supported_language() -> str
Return the language this recognizer can support.
RETURNS | DESCRIPTION |
---|---|
str
|
A list of the supported language by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
127 128 129 130 131 132 133 |
|
get_version
get_version() -> str
Return the version of this recognizer.
RETURNS | DESCRIPTION |
---|---|
str
|
The current version of this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
135 136 137 138 139 140 141 |
|
to_dict
to_dict() -> Dict
Serialize instance into a dictionary.
Source code in presidio_analyzer/pattern_recognizer.py
254 255 256 257 258 259 260 261 262 263 264 |
|
from_dict
classmethod
from_dict(entity_recognizer_dict: Dict) -> PatternRecognizer
Create instance from a serialized dict.
Source code in presidio_analyzer/pattern_recognizer.py
266 267 268 269 270 271 272 273 274 |
|
remove_duplicates
staticmethod
remove_duplicates(results: List[RecognizerResult]) -> List[RecognizerResult]
Remove duplicate results.
Remove duplicates in case the two results have identical start and ends and types.
PARAMETER | DESCRIPTION |
---|---|
results
|
List[RecognizerResult]
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
List[RecognizerResult] |
Source code in presidio_analyzer/entity_recognizer.py
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
|
sanitize_value
staticmethod
sanitize_value(text: str, replacement_pairs: List[Tuple[str, str]]) -> str
Cleanse the input string of the replacement pairs specified as argument.
PARAMETER | DESCRIPTION |
---|---|
text
|
input string
TYPE:
|
replacement_pairs
|
pairs of what has to be replaced with which value
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
cleansed string |
Source code in presidio_analyzer/entity_recognizer.py
200 201 202 203 204 205 206 207 208 209 210 211 |
|
invalidate_result
invalidate_result(pattern_text: str) -> Optional[bool]
Logic to check for result invalidation by running pruning logic.
For example, each SSN number group should not consist of all the same digits.
PARAMETER | DESCRIPTION |
---|---|
pattern_text
|
the text to validated. Only the part in text that was detected by the regex engine
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Optional[bool]
|
A bool indicating whether the result is invalidated |
Source code in presidio_analyzer/pattern_recognizer.py
127 128 129 130 131 132 133 134 135 136 137 |
|
build_regex_explanation
staticmethod
build_regex_explanation(
recognizer_name: str,
pattern_name: str,
pattern: str,
original_score: float,
validation_result: bool,
regex_flags: int,
) -> AnalysisExplanation
Construct an explanation for why this entity was detected.
PARAMETER | DESCRIPTION |
---|---|
recognizer_name
|
Name of recognizer detecting the entity
TYPE:
|
pattern_name
|
Regex pattern name which detected the entity
TYPE:
|
pattern
|
Regex pattern logic
TYPE:
|
original_score
|
Score given by the recognizer
TYPE:
|
validation_result
|
Whether validation was used and its result
TYPE:
|
regex_flags
|
Regex flags used in the regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
AnalysisExplanation
|
Analysis explanation |
Source code in presidio_analyzer/pattern_recognizer.py
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
|
validate_result
validate_result(pattern_text: str) -> bool
Validate the pattern logic e.g., by running checksum on a detected pattern.
PARAMETER | DESCRIPTION |
---|---|
pattern_text
|
the text to validated. Only the part in text that was detected by the regex engine
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
bool
|
A bool indicating whether the validation was successful. |
Source code in presidio_analyzer/predefined_recognizers/uk_nhs_recognizer.py
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 |
|
UkNinoRecognizer
Bases: PatternRecognizer
Recognizes UK National Insurance Number using regex.
PARAMETER | DESCRIPTION |
---|---|
patterns
|
List of patterns to be used by this recognizer
TYPE:
|
context
|
List of context words to increase confidence in detection
TYPE:
|
supported_language
|
Language this recognizer supports
TYPE:
|
supported_entity
|
The entity this recognizer can detect
TYPE:
|
METHOD | DESCRIPTION |
---|---|
analyze |
Analyzes text to detect PII using regular expressions or deny-lists. |
enhance_using_context |
Enhance confidence score using context of the entity. |
get_supported_entities |
Return the list of entities this recognizer can identify. |
get_supported_language |
Return the language this recognizer can support. |
get_version |
Return the version of this recognizer. |
to_dict |
Serialize instance into a dictionary. |
from_dict |
Create instance from a serialized dict. |
remove_duplicates |
Remove duplicate results. |
sanitize_value |
Cleanse the input string of the replacement pairs specified as argument. |
validate_result |
Validate the pattern logic e.g., by running checksum on a detected pattern. |
invalidate_result |
Logic to check for result invalidation by running pruning logic. |
build_regex_explanation |
Construct an explanation for why this entity was detected. |
Source code in presidio_analyzer/predefined_recognizers/uk_nino_recognizer.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
|
id
property
id
Return a unique identifier of this recognizer.
analyze
analyze(
text: str,
entities: List[str],
nlp_artifacts: Optional[NlpArtifacts] = None,
regex_flags: Optional[int] = None,
) -> List[RecognizerResult]
Analyzes text to detect PII using regular expressions or deny-lists.
PARAMETER | DESCRIPTION |
---|---|
text
|
Text to be analyzed
TYPE:
|
entities
|
Entities this recognizer can detect
TYPE:
|
nlp_artifacts
|
Output values from the NLP engine
TYPE:
|
regex_flags
|
regex flags to be used in regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
|
Source code in presidio_analyzer/pattern_recognizer.py
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
|
enhance_using_context
enhance_using_context(
text: str,
raw_recognizer_results: List[RecognizerResult],
other_raw_recognizer_results: List[RecognizerResult],
nlp_artifacts: NlpArtifacts,
context: Optional[List[str]] = None,
) -> List[RecognizerResult]
Enhance confidence score using context of the entity.
Override this method in derived class in case a custom logic is needed, otherwise return value will be equal to raw_results.
in case a result score is boosted, derived class need to update result.recognition_metadata[RecognizerResult.IS_SCORE_ENHANCED_BY_CONTEXT_KEY]
PARAMETER | DESCRIPTION |
---|---|
text
|
The actual text that was analyzed
TYPE:
|
raw_recognizer_results
|
This recognizer's results, to be updated based on recognizer specific context.
TYPE:
|
other_raw_recognizer_results
|
Other recognizer results matched in the given text to allow related entity context enhancement
TYPE:
|
nlp_artifacts
|
The nlp artifacts contains elements such as lemmatized tokens for better accuracy of the context enhancement process
TYPE:
|
context
|
list of context words
TYPE:
|
Source code in presidio_analyzer/entity_recognizer.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
|
get_supported_entities
get_supported_entities() -> List[str]
Return the list of entities this recognizer can identify.
RETURNS | DESCRIPTION |
---|---|
List[str]
|
A list of the supported entities by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
119 120 121 122 123 124 125 |
|
get_supported_language
get_supported_language() -> str
Return the language this recognizer can support.
RETURNS | DESCRIPTION |
---|---|
str
|
A list of the supported language by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
127 128 129 130 131 132 133 |
|
get_version
get_version() -> str
Return the version of this recognizer.
RETURNS | DESCRIPTION |
---|---|
str
|
The current version of this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
135 136 137 138 139 140 141 |
|
to_dict
to_dict() -> Dict
Serialize instance into a dictionary.
Source code in presidio_analyzer/pattern_recognizer.py
254 255 256 257 258 259 260 261 262 263 264 |
|
from_dict
classmethod
from_dict(entity_recognizer_dict: Dict) -> PatternRecognizer
Create instance from a serialized dict.
Source code in presidio_analyzer/pattern_recognizer.py
266 267 268 269 270 271 272 273 274 |
|
remove_duplicates
staticmethod
remove_duplicates(results: List[RecognizerResult]) -> List[RecognizerResult]
Remove duplicate results.
Remove duplicates in case the two results have identical start and ends and types.
PARAMETER | DESCRIPTION |
---|---|
results
|
List[RecognizerResult]
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
List[RecognizerResult] |
Source code in presidio_analyzer/entity_recognizer.py
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
|
sanitize_value
staticmethod
sanitize_value(text: str, replacement_pairs: List[Tuple[str, str]]) -> str
Cleanse the input string of the replacement pairs specified as argument.
PARAMETER | DESCRIPTION |
---|---|
text
|
input string
TYPE:
|
replacement_pairs
|
pairs of what has to be replaced with which value
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
cleansed string |
Source code in presidio_analyzer/entity_recognizer.py
200 201 202 203 204 205 206 207 208 209 210 211 |
|
validate_result
validate_result(pattern_text: str) -> Optional[bool]
Validate the pattern logic e.g., by running checksum on a detected pattern.
PARAMETER | DESCRIPTION |
---|---|
pattern_text
|
the text to validated. Only the part in text that was detected by the regex engine
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Optional[bool]
|
A bool indicating whether the validation was successful. |
Source code in presidio_analyzer/pattern_recognizer.py
117 118 119 120 121 122 123 124 125 |
|
invalidate_result
invalidate_result(pattern_text: str) -> Optional[bool]
Logic to check for result invalidation by running pruning logic.
For example, each SSN number group should not consist of all the same digits.
PARAMETER | DESCRIPTION |
---|---|
pattern_text
|
the text to validated. Only the part in text that was detected by the regex engine
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Optional[bool]
|
A bool indicating whether the result is invalidated |
Source code in presidio_analyzer/pattern_recognizer.py
127 128 129 130 131 132 133 134 135 136 137 |
|
build_regex_explanation
staticmethod
build_regex_explanation(
recognizer_name: str,
pattern_name: str,
pattern: str,
original_score: float,
validation_result: bool,
regex_flags: int,
) -> AnalysisExplanation
Construct an explanation for why this entity was detected.
PARAMETER | DESCRIPTION |
---|---|
recognizer_name
|
Name of recognizer detecting the entity
TYPE:
|
pattern_name
|
Regex pattern name which detected the entity
TYPE:
|
pattern
|
Regex pattern logic
TYPE:
|
original_score
|
Score given by the recognizer
TYPE:
|
validation_result
|
Whether validation was used and its result
TYPE:
|
regex_flags
|
Regex flags used in the regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
AnalysisExplanation
|
Analysis explanation |
Source code in presidio_analyzer/pattern_recognizer.py
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
|
UrlRecognizer
Bases: PatternRecognizer
Recognize urls using regex.
This application uses Open Source components: Project: CommonRegex https://github.com/madisonmay/CommonRegex Copyright (c) 2014 Madison May License (MIT) https://github.com/madisonmay/CommonRegex/blob/master/LICENSE
PARAMETER | DESCRIPTION |
---|---|
patterns
|
List of patterns to be used by this recognizer
TYPE:
|
context
|
List of context words to increase confidence in detection
TYPE:
|
supported_language
|
Language this recognizer supports
TYPE:
|
supported_entity
|
The entity this recognizer can detect
TYPE:
|
METHOD | DESCRIPTION |
---|---|
analyze |
Analyzes text to detect PII using regular expressions or deny-lists. |
enhance_using_context |
Enhance confidence score using context of the entity. |
get_supported_entities |
Return the list of entities this recognizer can identify. |
get_supported_language |
Return the language this recognizer can support. |
get_version |
Return the version of this recognizer. |
to_dict |
Serialize instance into a dictionary. |
from_dict |
Create instance from a serialized dict. |
remove_duplicates |
Remove duplicate results. |
sanitize_value |
Cleanse the input string of the replacement pairs specified as argument. |
validate_result |
Validate the pattern logic e.g., by running checksum on a detected pattern. |
invalidate_result |
Logic to check for result invalidation by running pruning logic. |
build_regex_explanation |
Construct an explanation for why this entity was detected. |
Source code in presidio_analyzer/predefined_recognizers/url_recognizer.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
|
id
property
id
Return a unique identifier of this recognizer.
analyze
analyze(
text: str,
entities: List[str],
nlp_artifacts: Optional[NlpArtifacts] = None,
regex_flags: Optional[int] = None,
) -> List[RecognizerResult]
Analyzes text to detect PII using regular expressions or deny-lists.
PARAMETER | DESCRIPTION |
---|---|
text
|
Text to be analyzed
TYPE:
|
entities
|
Entities this recognizer can detect
TYPE:
|
nlp_artifacts
|
Output values from the NLP engine
TYPE:
|
regex_flags
|
regex flags to be used in regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
|
Source code in presidio_analyzer/pattern_recognizer.py
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
|
enhance_using_context
enhance_using_context(
text: str,
raw_recognizer_results: List[RecognizerResult],
other_raw_recognizer_results: List[RecognizerResult],
nlp_artifacts: NlpArtifacts,
context: Optional[List[str]] = None,
) -> List[RecognizerResult]
Enhance confidence score using context of the entity.
Override this method in derived class in case a custom logic is needed, otherwise return value will be equal to raw_results.
in case a result score is boosted, derived class need to update result.recognition_metadata[RecognizerResult.IS_SCORE_ENHANCED_BY_CONTEXT_KEY]
PARAMETER | DESCRIPTION |
---|---|
text
|
The actual text that was analyzed
TYPE:
|
raw_recognizer_results
|
This recognizer's results, to be updated based on recognizer specific context.
TYPE:
|
other_raw_recognizer_results
|
Other recognizer results matched in the given text to allow related entity context enhancement
TYPE:
|
nlp_artifacts
|
The nlp artifacts contains elements such as lemmatized tokens for better accuracy of the context enhancement process
TYPE:
|
context
|
list of context words
TYPE:
|
Source code in presidio_analyzer/entity_recognizer.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
|
get_supported_entities
get_supported_entities() -> List[str]
Return the list of entities this recognizer can identify.
RETURNS | DESCRIPTION |
---|---|
List[str]
|
A list of the supported entities by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
119 120 121 122 123 124 125 |
|
get_supported_language
get_supported_language() -> str
Return the language this recognizer can support.
RETURNS | DESCRIPTION |
---|---|
str
|
A list of the supported language by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
127 128 129 130 131 132 133 |
|
get_version
get_version() -> str
Return the version of this recognizer.
RETURNS | DESCRIPTION |
---|---|
str
|
The current version of this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
135 136 137 138 139 140 141 |
|
to_dict
to_dict() -> Dict
Serialize instance into a dictionary.
Source code in presidio_analyzer/pattern_recognizer.py
254 255 256 257 258 259 260 261 262 263 264 |
|
from_dict
classmethod
from_dict(entity_recognizer_dict: Dict) -> PatternRecognizer
Create instance from a serialized dict.
Source code in presidio_analyzer/pattern_recognizer.py
266 267 268 269 270 271 272 273 274 |
|
remove_duplicates
staticmethod
remove_duplicates(results: List[RecognizerResult]) -> List[RecognizerResult]
Remove duplicate results.
Remove duplicates in case the two results have identical start and ends and types.
PARAMETER | DESCRIPTION |
---|---|
results
|
List[RecognizerResult]
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
List[RecognizerResult] |
Source code in presidio_analyzer/entity_recognizer.py
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
|
sanitize_value
staticmethod
sanitize_value(text: str, replacement_pairs: List[Tuple[str, str]]) -> str
Cleanse the input string of the replacement pairs specified as argument.
PARAMETER | DESCRIPTION |
---|---|
text
|
input string
TYPE:
|
replacement_pairs
|
pairs of what has to be replaced with which value
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
cleansed string |
Source code in presidio_analyzer/entity_recognizer.py
200 201 202 203 204 205 206 207 208 209 210 211 |
|
validate_result
validate_result(pattern_text: str) -> Optional[bool]
Validate the pattern logic e.g., by running checksum on a detected pattern.
PARAMETER | DESCRIPTION |
---|---|
pattern_text
|
the text to validated. Only the part in text that was detected by the regex engine
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Optional[bool]
|
A bool indicating whether the validation was successful. |
Source code in presidio_analyzer/pattern_recognizer.py
117 118 119 120 121 122 123 124 125 |
|
invalidate_result
invalidate_result(pattern_text: str) -> Optional[bool]
Logic to check for result invalidation by running pruning logic.
For example, each SSN number group should not consist of all the same digits.
PARAMETER | DESCRIPTION |
---|---|
pattern_text
|
the text to validated. Only the part in text that was detected by the regex engine
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Optional[bool]
|
A bool indicating whether the result is invalidated |
Source code in presidio_analyzer/pattern_recognizer.py
127 128 129 130 131 132 133 134 135 136 137 |
|
build_regex_explanation
staticmethod
build_regex_explanation(
recognizer_name: str,
pattern_name: str,
pattern: str,
original_score: float,
validation_result: bool,
regex_flags: int,
) -> AnalysisExplanation
Construct an explanation for why this entity was detected.
PARAMETER | DESCRIPTION |
---|---|
recognizer_name
|
Name of recognizer detecting the entity
TYPE:
|
pattern_name
|
Regex pattern name which detected the entity
TYPE:
|
pattern
|
Regex pattern logic
TYPE:
|
original_score
|
Score given by the recognizer
TYPE:
|
validation_result
|
Whether validation was used and its result
TYPE:
|
regex_flags
|
Regex flags used in the regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
AnalysisExplanation
|
Analysis explanation |
Source code in presidio_analyzer/pattern_recognizer.py
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
|
UsBankRecognizer
Bases: PatternRecognizer
Recognizes US bank number using regex.
PARAMETER | DESCRIPTION |
---|---|
patterns
|
List of patterns to be used by this recognizer
TYPE:
|
context
|
List of context words to increase confidence in detection
TYPE:
|
supported_language
|
Language this recognizer supports
TYPE:
|
supported_entity
|
The entity this recognizer can detect
TYPE:
|
METHOD | DESCRIPTION |
---|---|
analyze |
Analyzes text to detect PII using regular expressions or deny-lists. |
enhance_using_context |
Enhance confidence score using context of the entity. |
get_supported_entities |
Return the list of entities this recognizer can identify. |
get_supported_language |
Return the language this recognizer can support. |
get_version |
Return the version of this recognizer. |
to_dict |
Serialize instance into a dictionary. |
from_dict |
Create instance from a serialized dict. |
remove_duplicates |
Remove duplicate results. |
sanitize_value |
Cleanse the input string of the replacement pairs specified as argument. |
validate_result |
Validate the pattern logic e.g., by running checksum on a detected pattern. |
invalidate_result |
Logic to check for result invalidation by running pruning logic. |
build_regex_explanation |
Construct an explanation for why this entity was detected. |
Source code in presidio_analyzer/predefined_recognizers/us_bank_recognizer.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
|
id
property
id
Return a unique identifier of this recognizer.
analyze
analyze(
text: str,
entities: List[str],
nlp_artifacts: Optional[NlpArtifacts] = None,
regex_flags: Optional[int] = None,
) -> List[RecognizerResult]
Analyzes text to detect PII using regular expressions or deny-lists.
PARAMETER | DESCRIPTION |
---|---|
text
|
Text to be analyzed
TYPE:
|
entities
|
Entities this recognizer can detect
TYPE:
|
nlp_artifacts
|
Output values from the NLP engine
TYPE:
|
regex_flags
|
regex flags to be used in regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
|
Source code in presidio_analyzer/pattern_recognizer.py
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
|
enhance_using_context
enhance_using_context(
text: str,
raw_recognizer_results: List[RecognizerResult],
other_raw_recognizer_results: List[RecognizerResult],
nlp_artifacts: NlpArtifacts,
context: Optional[List[str]] = None,
) -> List[RecognizerResult]
Enhance confidence score using context of the entity.
Override this method in derived class in case a custom logic is needed, otherwise return value will be equal to raw_results.
in case a result score is boosted, derived class need to update result.recognition_metadata[RecognizerResult.IS_SCORE_ENHANCED_BY_CONTEXT_KEY]
PARAMETER | DESCRIPTION |
---|---|
text
|
The actual text that was analyzed
TYPE:
|
raw_recognizer_results
|
This recognizer's results, to be updated based on recognizer specific context.
TYPE:
|
other_raw_recognizer_results
|
Other recognizer results matched in the given text to allow related entity context enhancement
TYPE:
|
nlp_artifacts
|
The nlp artifacts contains elements such as lemmatized tokens for better accuracy of the context enhancement process
TYPE:
|
context
|
list of context words
TYPE:
|
Source code in presidio_analyzer/entity_recognizer.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
|
get_supported_entities
get_supported_entities() -> List[str]
Return the list of entities this recognizer can identify.
RETURNS | DESCRIPTION |
---|---|
List[str]
|
A list of the supported entities by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
119 120 121 122 123 124 125 |
|
get_supported_language
get_supported_language() -> str
Return the language this recognizer can support.
RETURNS | DESCRIPTION |
---|---|
str
|
A list of the supported language by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
127 128 129 130 131 132 133 |
|
get_version
get_version() -> str
Return the version of this recognizer.
RETURNS | DESCRIPTION |
---|---|
str
|
The current version of this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
135 136 137 138 139 140 141 |
|
to_dict
to_dict() -> Dict
Serialize instance into a dictionary.
Source code in presidio_analyzer/pattern_recognizer.py
254 255 256 257 258 259 260 261 262 263 264 |
|
from_dict
classmethod
from_dict(entity_recognizer_dict: Dict) -> PatternRecognizer
Create instance from a serialized dict.
Source code in presidio_analyzer/pattern_recognizer.py
266 267 268 269 270 271 272 273 274 |
|
remove_duplicates
staticmethod
remove_duplicates(results: List[RecognizerResult]) -> List[RecognizerResult]
Remove duplicate results.
Remove duplicates in case the two results have identical start and ends and types.
PARAMETER | DESCRIPTION |
---|---|
results
|
List[RecognizerResult]
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
List[RecognizerResult] |
Source code in presidio_analyzer/entity_recognizer.py
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
|
sanitize_value
staticmethod
sanitize_value(text: str, replacement_pairs: List[Tuple[str, str]]) -> str
Cleanse the input string of the replacement pairs specified as argument.
PARAMETER | DESCRIPTION |
---|---|
text
|
input string
TYPE:
|
replacement_pairs
|
pairs of what has to be replaced with which value
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
cleansed string |
Source code in presidio_analyzer/entity_recognizer.py
200 201 202 203 204 205 206 207 208 209 210 211 |
|
validate_result
validate_result(pattern_text: str) -> Optional[bool]
Validate the pattern logic e.g., by running checksum on a detected pattern.
PARAMETER | DESCRIPTION |
---|---|
pattern_text
|
the text to validated. Only the part in text that was detected by the regex engine
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Optional[bool]
|
A bool indicating whether the validation was successful. |
Source code in presidio_analyzer/pattern_recognizer.py
117 118 119 120 121 122 123 124 125 |
|
invalidate_result
invalidate_result(pattern_text: str) -> Optional[bool]
Logic to check for result invalidation by running pruning logic.
For example, each SSN number group should not consist of all the same digits.
PARAMETER | DESCRIPTION |
---|---|
pattern_text
|
the text to validated. Only the part in text that was detected by the regex engine
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Optional[bool]
|
A bool indicating whether the result is invalidated |
Source code in presidio_analyzer/pattern_recognizer.py
127 128 129 130 131 132 133 134 135 136 137 |
|
build_regex_explanation
staticmethod
build_regex_explanation(
recognizer_name: str,
pattern_name: str,
pattern: str,
original_score: float,
validation_result: bool,
regex_flags: int,
) -> AnalysisExplanation
Construct an explanation for why this entity was detected.
PARAMETER | DESCRIPTION |
---|---|
recognizer_name
|
Name of recognizer detecting the entity
TYPE:
|
pattern_name
|
Regex pattern name which detected the entity
TYPE:
|
pattern
|
Regex pattern logic
TYPE:
|
original_score
|
Score given by the recognizer
TYPE:
|
validation_result
|
Whether validation was used and its result
TYPE:
|
regex_flags
|
Regex flags used in the regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
AnalysisExplanation
|
Analysis explanation |
Source code in presidio_analyzer/pattern_recognizer.py
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
|
UsLicenseRecognizer
Bases: PatternRecognizer
Recognizes US driver license using regex.
PARAMETER | DESCRIPTION |
---|---|
patterns
|
List of patterns to be used by this recognizer
TYPE:
|
context
|
List of context words to increase confidence in detection
TYPE:
|
supported_language
|
Language this recognizer supports
TYPE:
|
supported_entity
|
The entity this recognizer can detect
TYPE:
|
METHOD | DESCRIPTION |
---|---|
analyze |
Analyzes text to detect PII using regular expressions or deny-lists. |
enhance_using_context |
Enhance confidence score using context of the entity. |
get_supported_entities |
Return the list of entities this recognizer can identify. |
get_supported_language |
Return the language this recognizer can support. |
get_version |
Return the version of this recognizer. |
to_dict |
Serialize instance into a dictionary. |
from_dict |
Create instance from a serialized dict. |
remove_duplicates |
Remove duplicate results. |
sanitize_value |
Cleanse the input string of the replacement pairs specified as argument. |
validate_result |
Validate the pattern logic e.g., by running checksum on a detected pattern. |
invalidate_result |
Logic to check for result invalidation by running pruning logic. |
build_regex_explanation |
Construct an explanation for why this entity was detected. |
Source code in presidio_analyzer/predefined_recognizers/us_driver_license_recognizer.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 |
|
id
property
id
Return a unique identifier of this recognizer.
analyze
analyze(
text: str,
entities: List[str],
nlp_artifacts: Optional[NlpArtifacts] = None,
regex_flags: Optional[int] = None,
) -> List[RecognizerResult]
Analyzes text to detect PII using regular expressions or deny-lists.
PARAMETER | DESCRIPTION |
---|---|
text
|
Text to be analyzed
TYPE:
|
entities
|
Entities this recognizer can detect
TYPE:
|
nlp_artifacts
|
Output values from the NLP engine
TYPE:
|
regex_flags
|
regex flags to be used in regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
|
Source code in presidio_analyzer/pattern_recognizer.py
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
|
enhance_using_context
enhance_using_context(
text: str,
raw_recognizer_results: List[RecognizerResult],
other_raw_recognizer_results: List[RecognizerResult],
nlp_artifacts: NlpArtifacts,
context: Optional[List[str]] = None,
) -> List[RecognizerResult]
Enhance confidence score using context of the entity.
Override this method in derived class in case a custom logic is needed, otherwise return value will be equal to raw_results.
in case a result score is boosted, derived class need to update result.recognition_metadata[RecognizerResult.IS_SCORE_ENHANCED_BY_CONTEXT_KEY]
PARAMETER | DESCRIPTION |
---|---|
text
|
The actual text that was analyzed
TYPE:
|
raw_recognizer_results
|
This recognizer's results, to be updated based on recognizer specific context.
TYPE:
|
other_raw_recognizer_results
|
Other recognizer results matched in the given text to allow related entity context enhancement
TYPE:
|
nlp_artifacts
|
The nlp artifacts contains elements such as lemmatized tokens for better accuracy of the context enhancement process
TYPE:
|
context
|
list of context words
TYPE:
|
Source code in presidio_analyzer/entity_recognizer.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
|
get_supported_entities
get_supported_entities() -> List[str]
Return the list of entities this recognizer can identify.
RETURNS | DESCRIPTION |
---|---|
List[str]
|
A list of the supported entities by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
119 120 121 122 123 124 125 |
|
get_supported_language
get_supported_language() -> str
Return the language this recognizer can support.
RETURNS | DESCRIPTION |
---|---|
str
|
A list of the supported language by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
127 128 129 130 131 132 133 |
|
get_version
get_version() -> str
Return the version of this recognizer.
RETURNS | DESCRIPTION |
---|---|
str
|
The current version of this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
135 136 137 138 139 140 141 |
|
to_dict
to_dict() -> Dict
Serialize instance into a dictionary.
Source code in presidio_analyzer/pattern_recognizer.py
254 255 256 257 258 259 260 261 262 263 264 |
|
from_dict
classmethod
from_dict(entity_recognizer_dict: Dict) -> PatternRecognizer
Create instance from a serialized dict.
Source code in presidio_analyzer/pattern_recognizer.py
266 267 268 269 270 271 272 273 274 |
|
remove_duplicates
staticmethod
remove_duplicates(results: List[RecognizerResult]) -> List[RecognizerResult]
Remove duplicate results.
Remove duplicates in case the two results have identical start and ends and types.
PARAMETER | DESCRIPTION |
---|---|
results
|
List[RecognizerResult]
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
List[RecognizerResult] |
Source code in presidio_analyzer/entity_recognizer.py
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
|
sanitize_value
staticmethod
sanitize_value(text: str, replacement_pairs: List[Tuple[str, str]]) -> str
Cleanse the input string of the replacement pairs specified as argument.
PARAMETER | DESCRIPTION |
---|---|
text
|
input string
TYPE:
|
replacement_pairs
|
pairs of what has to be replaced with which value
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
cleansed string |
Source code in presidio_analyzer/entity_recognizer.py
200 201 202 203 204 205 206 207 208 209 210 211 |
|
validate_result
validate_result(pattern_text: str) -> Optional[bool]
Validate the pattern logic e.g., by running checksum on a detected pattern.
PARAMETER | DESCRIPTION |
---|---|
pattern_text
|
the text to validated. Only the part in text that was detected by the regex engine
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Optional[bool]
|
A bool indicating whether the validation was successful. |
Source code in presidio_analyzer/pattern_recognizer.py
117 118 119 120 121 122 123 124 125 |
|
invalidate_result
invalidate_result(pattern_text: str) -> Optional[bool]
Logic to check for result invalidation by running pruning logic.
For example, each SSN number group should not consist of all the same digits.
PARAMETER | DESCRIPTION |
---|---|
pattern_text
|
the text to validated. Only the part in text that was detected by the regex engine
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Optional[bool]
|
A bool indicating whether the result is invalidated |
Source code in presidio_analyzer/pattern_recognizer.py
127 128 129 130 131 132 133 134 135 136 137 |
|
build_regex_explanation
staticmethod
build_regex_explanation(
recognizer_name: str,
pattern_name: str,
pattern: str,
original_score: float,
validation_result: bool,
regex_flags: int,
) -> AnalysisExplanation
Construct an explanation for why this entity was detected.
PARAMETER | DESCRIPTION |
---|---|
recognizer_name
|
Name of recognizer detecting the entity
TYPE:
|
pattern_name
|
Regex pattern name which detected the entity
TYPE:
|
pattern
|
Regex pattern logic
TYPE:
|
original_score
|
Score given by the recognizer
TYPE:
|
validation_result
|
Whether validation was used and its result
TYPE:
|
regex_flags
|
Regex flags used in the regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
AnalysisExplanation
|
Analysis explanation |
Source code in presidio_analyzer/pattern_recognizer.py
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
|
UsItinRecognizer
Bases: PatternRecognizer
Recognizes US ITIN (Individual Taxpayer Identification Number) using regex.
PARAMETER | DESCRIPTION |
---|---|
patterns
|
List of patterns to be used by this recognizer
TYPE:
|
context
|
List of context words to increase confidence in detection
TYPE:
|
supported_language
|
Language this recognizer supports
TYPE:
|
supported_entity
|
The entity this recognizer can detect
TYPE:
|
METHOD | DESCRIPTION |
---|---|
analyze |
Analyzes text to detect PII using regular expressions or deny-lists. |
enhance_using_context |
Enhance confidence score using context of the entity. |
get_supported_entities |
Return the list of entities this recognizer can identify. |
get_supported_language |
Return the language this recognizer can support. |
get_version |
Return the version of this recognizer. |
to_dict |
Serialize instance into a dictionary. |
from_dict |
Create instance from a serialized dict. |
remove_duplicates |
Remove duplicate results. |
sanitize_value |
Cleanse the input string of the replacement pairs specified as argument. |
validate_result |
Validate the pattern logic e.g., by running checksum on a detected pattern. |
invalidate_result |
Logic to check for result invalidation by running pruning logic. |
build_regex_explanation |
Construct an explanation for why this entity was detected. |
Source code in presidio_analyzer/predefined_recognizers/us_itin_recognizer.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
|
id
property
id
Return a unique identifier of this recognizer.
analyze
analyze(
text: str,
entities: List[str],
nlp_artifacts: Optional[NlpArtifacts] = None,
regex_flags: Optional[int] = None,
) -> List[RecognizerResult]
Analyzes text to detect PII using regular expressions or deny-lists.
PARAMETER | DESCRIPTION |
---|---|
text
|
Text to be analyzed
TYPE:
|
entities
|
Entities this recognizer can detect
TYPE:
|
nlp_artifacts
|
Output values from the NLP engine
TYPE:
|
regex_flags
|
regex flags to be used in regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
|
Source code in presidio_analyzer/pattern_recognizer.py
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
|
enhance_using_context
enhance_using_context(
text: str,
raw_recognizer_results: List[RecognizerResult],
other_raw_recognizer_results: List[RecognizerResult],
nlp_artifacts: NlpArtifacts,
context: Optional[List[str]] = None,
) -> List[RecognizerResult]
Enhance confidence score using context of the entity.
Override this method in derived class in case a custom logic is needed, otherwise return value will be equal to raw_results.
in case a result score is boosted, derived class need to update result.recognition_metadata[RecognizerResult.IS_SCORE_ENHANCED_BY_CONTEXT_KEY]
PARAMETER | DESCRIPTION |
---|---|
text
|
The actual text that was analyzed
TYPE:
|
raw_recognizer_results
|
This recognizer's results, to be updated based on recognizer specific context.
TYPE:
|
other_raw_recognizer_results
|
Other recognizer results matched in the given text to allow related entity context enhancement
TYPE:
|
nlp_artifacts
|
The nlp artifacts contains elements such as lemmatized tokens for better accuracy of the context enhancement process
TYPE:
|
context
|
list of context words
TYPE:
|
Source code in presidio_analyzer/entity_recognizer.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
|
get_supported_entities
get_supported_entities() -> List[str]
Return the list of entities this recognizer can identify.
RETURNS | DESCRIPTION |
---|---|
List[str]
|
A list of the supported entities by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
119 120 121 122 123 124 125 |
|
get_supported_language
get_supported_language() -> str
Return the language this recognizer can support.
RETURNS | DESCRIPTION |
---|---|
str
|
A list of the supported language by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
127 128 129 130 131 132 133 |
|
get_version
get_version() -> str
Return the version of this recognizer.
RETURNS | DESCRIPTION |
---|---|
str
|
The current version of this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
135 136 137 138 139 140 141 |
|
to_dict
to_dict() -> Dict
Serialize instance into a dictionary.
Source code in presidio_analyzer/pattern_recognizer.py
254 255 256 257 258 259 260 261 262 263 264 |
|
from_dict
classmethod
from_dict(entity_recognizer_dict: Dict) -> PatternRecognizer
Create instance from a serialized dict.
Source code in presidio_analyzer/pattern_recognizer.py
266 267 268 269 270 271 272 273 274 |
|
remove_duplicates
staticmethod
remove_duplicates(results: List[RecognizerResult]) -> List[RecognizerResult]
Remove duplicate results.
Remove duplicates in case the two results have identical start and ends and types.
PARAMETER | DESCRIPTION |
---|---|
results
|
List[RecognizerResult]
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
List[RecognizerResult] |
Source code in presidio_analyzer/entity_recognizer.py
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
|
sanitize_value
staticmethod
sanitize_value(text: str, replacement_pairs: List[Tuple[str, str]]) -> str
Cleanse the input string of the replacement pairs specified as argument.
PARAMETER | DESCRIPTION |
---|---|
text
|
input string
TYPE:
|
replacement_pairs
|
pairs of what has to be replaced with which value
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
cleansed string |
Source code in presidio_analyzer/entity_recognizer.py
200 201 202 203 204 205 206 207 208 209 210 211 |
|
validate_result
validate_result(pattern_text: str) -> Optional[bool]
Validate the pattern logic e.g., by running checksum on a detected pattern.
PARAMETER | DESCRIPTION |
---|---|
pattern_text
|
the text to validated. Only the part in text that was detected by the regex engine
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Optional[bool]
|
A bool indicating whether the validation was successful. |
Source code in presidio_analyzer/pattern_recognizer.py
117 118 119 120 121 122 123 124 125 |
|
invalidate_result
invalidate_result(pattern_text: str) -> Optional[bool]
Logic to check for result invalidation by running pruning logic.
For example, each SSN number group should not consist of all the same digits.
PARAMETER | DESCRIPTION |
---|---|
pattern_text
|
the text to validated. Only the part in text that was detected by the regex engine
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Optional[bool]
|
A bool indicating whether the result is invalidated |
Source code in presidio_analyzer/pattern_recognizer.py
127 128 129 130 131 132 133 134 135 136 137 |
|
build_regex_explanation
staticmethod
build_regex_explanation(
recognizer_name: str,
pattern_name: str,
pattern: str,
original_score: float,
validation_result: bool,
regex_flags: int,
) -> AnalysisExplanation
Construct an explanation for why this entity was detected.
PARAMETER | DESCRIPTION |
---|---|
recognizer_name
|
Name of recognizer detecting the entity
TYPE:
|
pattern_name
|
Regex pattern name which detected the entity
TYPE:
|
pattern
|
Regex pattern logic
TYPE:
|
original_score
|
Score given by the recognizer
TYPE:
|
validation_result
|
Whether validation was used and its result
TYPE:
|
regex_flags
|
Regex flags used in the regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
AnalysisExplanation
|
Analysis explanation |
Source code in presidio_analyzer/pattern_recognizer.py
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
|
UsPassportRecognizer
Bases: PatternRecognizer
Recognizes US Passport number using regex.
PARAMETER | DESCRIPTION |
---|---|
patterns
|
List of patterns to be used by this recognizer
TYPE:
|
context
|
List of context words to increase confidence in detection
TYPE:
|
supported_language
|
Language this recognizer supports
TYPE:
|
supported_entity
|
The entity this recognizer can detect
TYPE:
|
METHOD | DESCRIPTION |
---|---|
analyze |
Analyzes text to detect PII using regular expressions or deny-lists. |
enhance_using_context |
Enhance confidence score using context of the entity. |
get_supported_entities |
Return the list of entities this recognizer can identify. |
get_supported_language |
Return the language this recognizer can support. |
get_version |
Return the version of this recognizer. |
to_dict |
Serialize instance into a dictionary. |
from_dict |
Create instance from a serialized dict. |
remove_duplicates |
Remove duplicate results. |
sanitize_value |
Cleanse the input string of the replacement pairs specified as argument. |
validate_result |
Validate the pattern logic e.g., by running checksum on a detected pattern. |
invalidate_result |
Logic to check for result invalidation by running pruning logic. |
build_regex_explanation |
Construct an explanation for why this entity was detected. |
Source code in presidio_analyzer/predefined_recognizers/us_passport_recognizer.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
|
id
property
id
Return a unique identifier of this recognizer.
analyze
analyze(
text: str,
entities: List[str],
nlp_artifacts: Optional[NlpArtifacts] = None,
regex_flags: Optional[int] = None,
) -> List[RecognizerResult]
Analyzes text to detect PII using regular expressions or deny-lists.
PARAMETER | DESCRIPTION |
---|---|
text
|
Text to be analyzed
TYPE:
|
entities
|
Entities this recognizer can detect
TYPE:
|
nlp_artifacts
|
Output values from the NLP engine
TYPE:
|
regex_flags
|
regex flags to be used in regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
|
Source code in presidio_analyzer/pattern_recognizer.py
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
|
enhance_using_context
enhance_using_context(
text: str,
raw_recognizer_results: List[RecognizerResult],
other_raw_recognizer_results: List[RecognizerResult],
nlp_artifacts: NlpArtifacts,
context: Optional[List[str]] = None,
) -> List[RecognizerResult]
Enhance confidence score using context of the entity.
Override this method in derived class in case a custom logic is needed, otherwise return value will be equal to raw_results.
in case a result score is boosted, derived class need to update result.recognition_metadata[RecognizerResult.IS_SCORE_ENHANCED_BY_CONTEXT_KEY]
PARAMETER | DESCRIPTION |
---|---|
text
|
The actual text that was analyzed
TYPE:
|
raw_recognizer_results
|
This recognizer's results, to be updated based on recognizer specific context.
TYPE:
|
other_raw_recognizer_results
|
Other recognizer results matched in the given text to allow related entity context enhancement
TYPE:
|
nlp_artifacts
|
The nlp artifacts contains elements such as lemmatized tokens for better accuracy of the context enhancement process
TYPE:
|
context
|
list of context words
TYPE:
|
Source code in presidio_analyzer/entity_recognizer.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
|
get_supported_entities
get_supported_entities() -> List[str]
Return the list of entities this recognizer can identify.
RETURNS | DESCRIPTION |
---|---|
List[str]
|
A list of the supported entities by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
119 120 121 122 123 124 125 |
|
get_supported_language
get_supported_language() -> str
Return the language this recognizer can support.
RETURNS | DESCRIPTION |
---|---|
str
|
A list of the supported language by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
127 128 129 130 131 132 133 |
|
get_version
get_version() -> str
Return the version of this recognizer.
RETURNS | DESCRIPTION |
---|---|
str
|
The current version of this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
135 136 137 138 139 140 141 |
|
to_dict
to_dict() -> Dict
Serialize instance into a dictionary.
Source code in presidio_analyzer/pattern_recognizer.py
254 255 256 257 258 259 260 261 262 263 264 |
|
from_dict
classmethod
from_dict(entity_recognizer_dict: Dict) -> PatternRecognizer
Create instance from a serialized dict.
Source code in presidio_analyzer/pattern_recognizer.py
266 267 268 269 270 271 272 273 274 |
|
remove_duplicates
staticmethod
remove_duplicates(results: List[RecognizerResult]) -> List[RecognizerResult]
Remove duplicate results.
Remove duplicates in case the two results have identical start and ends and types.
PARAMETER | DESCRIPTION |
---|---|
results
|
List[RecognizerResult]
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
List[RecognizerResult] |
Source code in presidio_analyzer/entity_recognizer.py
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
|
sanitize_value
staticmethod
sanitize_value(text: str, replacement_pairs: List[Tuple[str, str]]) -> str
Cleanse the input string of the replacement pairs specified as argument.
PARAMETER | DESCRIPTION |
---|---|
text
|
input string
TYPE:
|
replacement_pairs
|
pairs of what has to be replaced with which value
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
cleansed string |
Source code in presidio_analyzer/entity_recognizer.py
200 201 202 203 204 205 206 207 208 209 210 211 |
|
validate_result
validate_result(pattern_text: str) -> Optional[bool]
Validate the pattern logic e.g., by running checksum on a detected pattern.
PARAMETER | DESCRIPTION |
---|---|
pattern_text
|
the text to validated. Only the part in text that was detected by the regex engine
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Optional[bool]
|
A bool indicating whether the validation was successful. |
Source code in presidio_analyzer/pattern_recognizer.py
117 118 119 120 121 122 123 124 125 |
|
invalidate_result
invalidate_result(pattern_text: str) -> Optional[bool]
Logic to check for result invalidation by running pruning logic.
For example, each SSN number group should not consist of all the same digits.
PARAMETER | DESCRIPTION |
---|---|
pattern_text
|
the text to validated. Only the part in text that was detected by the regex engine
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Optional[bool]
|
A bool indicating whether the result is invalidated |
Source code in presidio_analyzer/pattern_recognizer.py
127 128 129 130 131 132 133 134 135 136 137 |
|
build_regex_explanation
staticmethod
build_regex_explanation(
recognizer_name: str,
pattern_name: str,
pattern: str,
original_score: float,
validation_result: bool,
regex_flags: int,
) -> AnalysisExplanation
Construct an explanation for why this entity was detected.
PARAMETER | DESCRIPTION |
---|---|
recognizer_name
|
Name of recognizer detecting the entity
TYPE:
|
pattern_name
|
Regex pattern name which detected the entity
TYPE:
|
pattern
|
Regex pattern logic
TYPE:
|
original_score
|
Score given by the recognizer
TYPE:
|
validation_result
|
Whether validation was used and its result
TYPE:
|
regex_flags
|
Regex flags used in the regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
AnalysisExplanation
|
Analysis explanation |
Source code in presidio_analyzer/pattern_recognizer.py
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
|
UsSsnRecognizer
Bases: PatternRecognizer
Recognize US Social Security Number (SSN) using regex.
PARAMETER | DESCRIPTION |
---|---|
patterns
|
List of patterns to be used by this recognizer
TYPE:
|
context
|
List of context words to increase confidence in detection
TYPE:
|
supported_language
|
Language this recognizer supports
TYPE:
|
supported_entity
|
The entity this recognizer can detect
TYPE:
|
METHOD | DESCRIPTION |
---|---|
analyze |
Analyzes text to detect PII using regular expressions or deny-lists. |
enhance_using_context |
Enhance confidence score using context of the entity. |
get_supported_entities |
Return the list of entities this recognizer can identify. |
get_supported_language |
Return the language this recognizer can support. |
get_version |
Return the version of this recognizer. |
to_dict |
Serialize instance into a dictionary. |
from_dict |
Create instance from a serialized dict. |
remove_duplicates |
Remove duplicate results. |
sanitize_value |
Cleanse the input string of the replacement pairs specified as argument. |
validate_result |
Validate the pattern logic e.g., by running checksum on a detected pattern. |
build_regex_explanation |
Construct an explanation for why this entity was detected. |
invalidate_result |
Check if the pattern text cannot be validated as a US_SSN entity. |
Source code in presidio_analyzer/predefined_recognizers/us_ssn_recognizer.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 |
|
id
property
id
Return a unique identifier of this recognizer.
analyze
analyze(
text: str,
entities: List[str],
nlp_artifacts: Optional[NlpArtifacts] = None,
regex_flags: Optional[int] = None,
) -> List[RecognizerResult]
Analyzes text to detect PII using regular expressions or deny-lists.
PARAMETER | DESCRIPTION |
---|---|
text
|
Text to be analyzed
TYPE:
|
entities
|
Entities this recognizer can detect
TYPE:
|
nlp_artifacts
|
Output values from the NLP engine
TYPE:
|
regex_flags
|
regex flags to be used in regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
|
Source code in presidio_analyzer/pattern_recognizer.py
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
|
enhance_using_context
enhance_using_context(
text: str,
raw_recognizer_results: List[RecognizerResult],
other_raw_recognizer_results: List[RecognizerResult],
nlp_artifacts: NlpArtifacts,
context: Optional[List[str]] = None,
) -> List[RecognizerResult]
Enhance confidence score using context of the entity.
Override this method in derived class in case a custom logic is needed, otherwise return value will be equal to raw_results.
in case a result score is boosted, derived class need to update result.recognition_metadata[RecognizerResult.IS_SCORE_ENHANCED_BY_CONTEXT_KEY]
PARAMETER | DESCRIPTION |
---|---|
text
|
The actual text that was analyzed
TYPE:
|
raw_recognizer_results
|
This recognizer's results, to be updated based on recognizer specific context.
TYPE:
|
other_raw_recognizer_results
|
Other recognizer results matched in the given text to allow related entity context enhancement
TYPE:
|
nlp_artifacts
|
The nlp artifacts contains elements such as lemmatized tokens for better accuracy of the context enhancement process
TYPE:
|
context
|
list of context words
TYPE:
|
Source code in presidio_analyzer/entity_recognizer.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
|
get_supported_entities
get_supported_entities() -> List[str]
Return the list of entities this recognizer can identify.
RETURNS | DESCRIPTION |
---|---|
List[str]
|
A list of the supported entities by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
119 120 121 122 123 124 125 |
|
get_supported_language
get_supported_language() -> str
Return the language this recognizer can support.
RETURNS | DESCRIPTION |
---|---|
str
|
A list of the supported language by this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
127 128 129 130 131 132 133 |
|
get_version
get_version() -> str
Return the version of this recognizer.
RETURNS | DESCRIPTION |
---|---|
str
|
The current version of this recognizer |
Source code in presidio_analyzer/entity_recognizer.py
135 136 137 138 139 140 141 |
|
to_dict
to_dict() -> Dict
Serialize instance into a dictionary.
Source code in presidio_analyzer/pattern_recognizer.py
254 255 256 257 258 259 260 261 262 263 264 |
|
from_dict
classmethod
from_dict(entity_recognizer_dict: Dict) -> PatternRecognizer
Create instance from a serialized dict.
Source code in presidio_analyzer/pattern_recognizer.py
266 267 268 269 270 271 272 273 274 |
|
remove_duplicates
staticmethod
remove_duplicates(results: List[RecognizerResult]) -> List[RecognizerResult]
Remove duplicate results.
Remove duplicates in case the two results have identical start and ends and types.
PARAMETER | DESCRIPTION |
---|---|
results
|
List[RecognizerResult]
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[RecognizerResult]
|
List[RecognizerResult] |
Source code in presidio_analyzer/entity_recognizer.py
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
|
sanitize_value
staticmethod
sanitize_value(text: str, replacement_pairs: List[Tuple[str, str]]) -> str
Cleanse the input string of the replacement pairs specified as argument.
PARAMETER | DESCRIPTION |
---|---|
text
|
input string
TYPE:
|
replacement_pairs
|
pairs of what has to be replaced with which value
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
cleansed string |
Source code in presidio_analyzer/entity_recognizer.py
200 201 202 203 204 205 206 207 208 209 210 211 |
|
validate_result
validate_result(pattern_text: str) -> Optional[bool]
Validate the pattern logic e.g., by running checksum on a detected pattern.
PARAMETER | DESCRIPTION |
---|---|
pattern_text
|
the text to validated. Only the part in text that was detected by the regex engine
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Optional[bool]
|
A bool indicating whether the validation was successful. |
Source code in presidio_analyzer/pattern_recognizer.py
117 118 119 120 121 122 123 124 125 |
|
build_regex_explanation
staticmethod
build_regex_explanation(
recognizer_name: str,
pattern_name: str,
pattern: str,
original_score: float,
validation_result: bool,
regex_flags: int,
) -> AnalysisExplanation
Construct an explanation for why this entity was detected.
PARAMETER | DESCRIPTION |
---|---|
recognizer_name
|
Name of recognizer detecting the entity
TYPE:
|
pattern_name
|
Regex pattern name which detected the entity
TYPE:
|
pattern
|
Regex pattern logic
TYPE:
|
original_score
|
Score given by the recognizer
TYPE:
|
validation_result
|
Whether validation was used and its result
TYPE:
|
regex_flags
|
Regex flags used in the regex matching
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
AnalysisExplanation
|
Analysis explanation |
Source code in presidio_analyzer/pattern_recognizer.py
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
|
invalidate_result
invalidate_result(pattern_text: str) -> bool
Check if the pattern text cannot be validated as a US_SSN entity.
PARAMETER | DESCRIPTION |
---|---|
pattern_text
|
Text detected as pattern by regex
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
bool
|
True if invalidated |
Source code in presidio_analyzer/predefined_recognizers/us_ssn_recognizer.py
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 |
|
Misc
presidio_analyzer.analyzer_request.AnalyzerRequest
Analyzer request data.
PARAMETER | DESCRIPTION |
---|---|
req_data
|
A request dictionary with the following fields: text: the text to analyze language: the language of the text entities: List of PII entities that should be looked for in the text. If entities=None then all entities are looked for. correlation_id: cross call ID for this request score_threshold: A minimum value for which to return an identified entity log_decision_process: Should the decision points within the analysis be logged return_decision_process: Should the decision points within the analysis returned as part of the response
TYPE:
|
Source code in presidio_analyzer/analyzer_request.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
|