Presidio Anonymizer API Reference
Anonymizer root module.
AnonymizerEngine
Bases: EngineBase
AnonymizerEngine class.
Handles the entire logic of the Presidio-anonymizer. Gets the original text and replaces the PII entities with the desired anonymizers.
Source code in presidio_anonymizer/anonymizer_engine.py
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 |
|
add_anonymizer(anonymizer_cls)
Add a new anonymizer to the engine.
anonymizer_cls: The anonymizer class to add to the engine.
Source code in presidio_anonymizer/anonymizer_engine.py
103 104 105 106 107 108 109 110 |
|
anonymize(text, analyzer_results, operators=None, conflict_resolution=ConflictResolutionStrategy.MERGE_SIMILAR_OR_CONTAINED)
Anonymize method to anonymize the given text.
:example:
from presidio_anonymizer import AnonymizerEngine from presidio_anonymizer.entities import RecognizerResult, OperatorConfig
Initialize the engine with logger.
engine = AnonymizerEngine()
Invoke the anonymize function with the text, analyzer results and
Operators to define the anonymization type.
result = engine.anonymize( text="My name is Bond, James Bond", analyzer_results=[RecognizerResult(entity_type="PERSON", start=11, end=15, score=0.8), RecognizerResult(entity_type="PERSON", start=17, end=27, score=0.8)], operators={"PERSON": OperatorConfig("replace", {"new_value": "BIP"})} )
print(result) text: My name is BIP, BIP. items: [ {'start': 16, 'end': 19, 'entity_type': 'PERSON', 'text': 'BIP', 'operator': 'replace'}, {'start': 11, 'end': 14, 'entity_type': 'PERSON', 'text': 'BIP', 'operator': 'replace'} ]
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text
|
str
|
the text we are anonymizing |
required |
analyzer_results
|
List[RecognizerResult]
|
A list of RecognizerResult class -> The results we received from the analyzer |
required |
operators
|
Optional[Dict[str, OperatorConfig]]
|
The configuration of the anonymizers we would like to use for each entity e.g.: {"PHONE_NUMBER":OperatorConfig("redact", {})} received from the analyzer |
None
|
conflict_resolution
|
ConflictResolutionStrategy
|
The configuration designed to handle conflicts among entities |
MERGE_SIMILAR_OR_CONTAINED
|
Returns:
Type | Description |
---|---|
EngineResult
|
the anonymized text and a list of information about the anonymized entities. |
Source code in presidio_anonymizer/anonymizer_engine.py
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 |
|
get_anonymizers()
Return a list of supported anonymizers.
Source code in presidio_anonymizer/anonymizer_engine.py
224 225 226 227 |
|
remove_anonymizer(anonymizer_cls)
Remove an anonymizer from the engine.
anonymizer_cls: The anonymizer class to remove from the engine.
Source code in presidio_anonymizer/anonymizer_engine.py
112 113 114 115 116 117 118 119 |
|
BatchAnonymizerEngine
BatchAnonymizerEngine class.
A class that provides functionality to anonymize in batches.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
anonymizer_engine
|
Optional[AnonymizerEngine]
|
An instance of the AnonymizerEngine class. |
None
|
Source code in presidio_anonymizer/batch_anonymizer_engine.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 |
|
anonymize_dict(analyzer_results, **kwargs)
Anonymize values in a dictionary.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
analyzer_results
|
Iterable[DictRecognizerResult]
|
Iterator of |
required |
kwargs
|
Additional kwargs for the |
{}
|
Source code in presidio_anonymizer/batch_anonymizer_engine.py
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 |
|
anonymize_list(texts, recognizer_results_list, **kwargs)
Anonymize a list of strings.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
texts
|
List[Optional[Union[str, bool, int, float]]]
|
List containing the texts to be anonymized (original texts). Items with a |
required |
recognizer_results_list
|
List[List[RecognizerResult]]
|
A list of lists of RecognizerResult, the output of the AnalyzerEngine on each text in the list. |
required |
kwargs
|
Additional kwargs for the |
{}
|
Source code in presidio_anonymizer/batch_anonymizer_engine.py
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
|
ConflictResolutionStrategy
Bases: Enum
Conflict resolution strategy.
The strategy to use when there is a conflict between two entities.
MERGE_SIMILAR_OR_CONTAINED: This default strategy resolves conflicts between similar or contained entities. REMOVE_INTERSECTIONS: Effectively resolves both intersection conflicts among entities and default strategy conflicts. NONE: No conflict resolution will be performed.
Source code in presidio_anonymizer/entities/conflict_resolution_strategy.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
DeanonymizeEngine
Bases: EngineBase
Deanonymize text that was previously anonymized.
Source code in presidio_anonymizer/deanonymize_engine.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
|
add_deanonymizer(deanonymizer_cls)
Add a new deanonymizer to the engine.
anonymizer_cls: The deanonymizer class to add to the engine.
Source code in presidio_anonymizer/deanonymize_engine.py
37 38 39 40 41 42 43 44 |
|
deanonymize(text, entities, operators)
Receive the text, entities and operators to perform deanonymization over.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
operators
|
Dict[str, OperatorConfig]
|
the operators to apply on the anonymizer result entities |
required |
text
|
str
|
the full text with the encrypted entities |
required |
entities
|
List[OperatorResult]
|
list of encrypted entities |
required |
Returns:
Type | Description |
---|---|
EngineResult
|
EngineResult - the new text and data about the deanonymized entities. |
Source code in presidio_anonymizer/deanonymize_engine.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
|
get_deanonymizers()
Return a list of supported deanonymizers.
Source code in presidio_anonymizer/deanonymize_engine.py
32 33 34 35 |
|
remove_deanonymizer(deanonymizer_cls)
Remove a deanonymizer from the engine.
deanonymizer_cls: The deanonymizer class to remove from the engine.
Source code in presidio_anonymizer/deanonymize_engine.py
46 47 48 49 50 51 52 53 |
|
DictRecognizerResult
dataclass
Data class for holding the output of the Presidio Analyzer on dictionaries.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
key
|
str
|
key in dictionary |
required |
value
|
Union[str, List[str], dict]
|
value to run analysis on (either string or list of strings) |
required |
recognizer_results
|
Union[List[RecognizerResult], List[List[RecognizerResult]], Iterator[DictRecognizerResult]]
|
Analyzer output for one value. Could be either: - A list of recognizer results if the input is one string - A list of lists of recognizer results, if the input is a list of strings. - An iterator of a DictRecognizerResult, if the input is a dictionary. In this case the recognizer_results would be the iterator of the DictRecognizerResult next level in the dictionary. |
required |
Source code in presidio_anonymizer/entities/engine/dict_recognizer_result.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
|
EngineResult
Engine result.
Source code in presidio_anonymizer/entities/engine/result/engine_result.py
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 |
|
__eq__(other)
Verify two instances are equal.
Returns true if the two instances are equal, false otherwise.
Source code in presidio_anonymizer/entities/engine/result/engine_result.py
54 55 56 57 58 59 60 61 |
|
__init__(text=None, items=None)
Create EngineResult entity.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text
|
str
|
The anonymized text. |
None
|
items
|
List[OperatorResult]
|
List of PII entities and the indices of their replacements in the anonymized text. |
None
|
Source code in presidio_anonymizer/entities/engine/result/engine_result.py
12 13 14 15 16 17 18 19 20 21 22 |
|
__repr__()
Return a string representation of the object.
Source code in presidio_anonymizer/entities/engine/result/engine_result.py
46 47 48 49 50 51 52 |
|
add_item(item)
Add an item.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
item
|
OperatorResult
|
an item to add to the list. |
required |
Source code in presidio_anonymizer/entities/engine/result/engine_result.py
28 29 30 31 32 33 |
|
normalize_item_indexes()
Normalize the indexes to be index from start.
Source code in presidio_anonymizer/entities/engine/result/engine_result.py
35 36 37 38 39 40 |
|
set_text(text)
Set a text.
Source code in presidio_anonymizer/entities/engine/result/engine_result.py
24 25 26 |
|
to_json()
Return a json string serializing this instance.
Source code in presidio_anonymizer/entities/engine/result/engine_result.py
42 43 44 |
|
InvalidParamError
Bases: Exception
Throw exception with error when user input is not valid.
param msg: Message to be added to the exception
Source code in presidio_anonymizer/entities/invalid_exception.py
4 5 6 7 8 9 10 11 12 |
|
OperatorConfig
Hold the data of the required operator.
Source code in presidio_anonymizer/entities/engine/operator_config.py
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 |
|
__eq__(other)
Verify two OperatorConfigs are equal.
Source code in presidio_anonymizer/entities/engine/operator_config.py
47 48 49 50 |
|
__init__(operator_name, params=None)
Create an operator config instance.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
operator_name
|
str
|
the name of the operator we want to work with |
required |
params
|
Dict
|
the parameters the operator needs in order to work |
None
|
Source code in presidio_anonymizer/entities/engine/operator_config.py
12 13 14 15 16 17 18 19 20 21 22 23 |
|
__repr__()
Return a string representation of the object.
Source code in presidio_anonymizer/entities/engine/operator_config.py
25 26 27 |
|
from_json(params)
classmethod
Create OperatorConfig from json.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
params
|
Dict
|
json e.g.: { "type": "mask", "masking_char": "*", "chars_to_mask": 4, "from_end": true } |
required |
Returns:
Type | Description |
---|---|
OperatorConfig
|
OperatorConfig |
Source code in presidio_anonymizer/entities/engine/operator_config.py
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
|
OperatorResult
Bases: PIIEntity
A class to hold data for engines results either anonymize or deanonymize.
Source code in presidio_anonymizer/entities/engine/result/operator_result.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 |
|
__eq__(other)
Verify two OperatorResults are equal.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
other
|
OperatorResult
|
OperatorResult |
required |
Returns:
Type | Description |
---|---|
bool
|
bool |
Source code in presidio_anonymizer/entities/engine/result/operator_result.py
33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
|
__repr__()
Return a string representation of the object.
Source code in presidio_anonymizer/entities/engine/result/operator_result.py
21 22 23 |
|
__str__()
Return a string representation of the object.
Source code in presidio_anonymizer/entities/engine/result/operator_result.py
29 30 31 |
|
from_json(json)
classmethod
Create OperatorResult from user json.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
json
|
Dict
|
json representation for this operator result. For example: { "start": 0, "end": 10, "key": "1111111111111111", "entity_type":"PERSON", "text":"resulted_text", "operator":"encrypt", } |
required |
Source code in presidio_anonymizer/entities/engine/result/operator_result.py
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 |
|
to_dict()
Return object as Dict.
Source code in presidio_anonymizer/entities/engine/result/operator_result.py
25 26 27 |
|
PIIEntity
Bases: ABC
Abstract class to hold the text we are going to operate on metadata.
Source code in presidio_anonymizer/entities/engine/pii_entity.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 |
|
__eq__(other)
Check two text metadata entities are equal.
Source code in presidio_anonymizer/entities/engine/pii_entity.py
35 36 37 38 39 40 41 |
|
__gt__(other)
Check one entity is greater then other by the text end index.
Source code in presidio_anonymizer/entities/engine/pii_entity.py
31 32 33 |
|
__repr__()
Return a string representation of the object.
Source code in presidio_anonymizer/entities/engine/pii_entity.py
23 24 25 26 27 28 29 |
|
RecognizerResult
Bases: PIIEntity
Recognizer Result represents the findings of the detected entity.
Result of a recognizer analyzing the text.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
entity_type
|
str
|
the type of the entity |
required |
start
|
int
|
the start location of the detected entity |
required |
end
|
int
|
the end location of the detected entity |
required |
score
|
float
|
the score of the detection |
required |
Source code in presidio_anonymizer/entities/engine/recognizer_result.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 |
|
__eq__(other)
Check two results are equal by using all class fields.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
other
|
another RecognizerResult |
required |
Returns:
Type | Description |
---|---|
bool |
Source code in presidio_anonymizer/entities/engine/recognizer_result.py
63 64 65 66 67 68 69 70 71 72 |
|
__gt__(other)
Check if one result is greater by using the results indices in the text.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
other
|
another RecognizerResult |
required |
Returns:
Type | Description |
---|---|
bool |
Source code in presidio_anonymizer/entities/engine/recognizer_result.py
52 53 54 55 56 57 58 59 60 61 |
|
__hash__()
Hash the result data by using all class fields.
Returns:
Type | Description |
---|---|
int |
Source code in presidio_anonymizer/entities/engine/recognizer_result.py
74 75 76 77 78 79 80 81 82 |
|
__str__()
Return a string representation of the instance.
Source code in presidio_anonymizer/entities/engine/recognizer_result.py
84 85 86 87 88 89 90 91 |
|
contains(other)
Check if one result is contained or equal to another result.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
other
|
another RecognizerResult |
required |
Returns:
Type | Description |
---|---|
bool |
Source code in presidio_anonymizer/entities/engine/recognizer_result.py
108 109 110 111 112 113 114 115 |
|
equal_indices(other)
Check if the indices are equal between two results.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
other
|
another RecognizerResult |
required |
Returns:
Type | Description |
---|---|
|
Source code in presidio_anonymizer/entities/engine/recognizer_result.py
117 118 119 120 121 122 123 124 |
|
from_json(data)
classmethod
Create RecognizerResult from json.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
Dict
|
e.g. { "start": 24, "end": 32, "score": 0.8, "entity_type": "NAME" } |
required |
Returns:
Type | Description |
---|---|
RecognizerResult |
Source code in presidio_anonymizer/entities/engine/recognizer_result.py
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
|
has_conflict(other)
Check if two recognizer results are conflicted or not.
I have a conflict if: 1. My indices are the same as the other and my score is lower. 2. If my indices are contained in another.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
other
|
RecognizerResult |
required |
Returns:
Type | Description |
---|---|
|
Source code in presidio_anonymizer/entities/engine/recognizer_result.py
93 94 95 96 97 98 99 100 101 102 103 104 105 106 |
|
intersects(other)
Check if self intersects with a different RecognizerResult.
Returns:
Type | Description |
---|---|
int
|
If intersecting, returns the number of intersecting characters. If not, returns 0 |
Source code in presidio_anonymizer/entities/engine/recognizer_result.py
126 127 128 129 130 131 132 133 134 135 136 137 138 139 |
|