Presidio Anonymizer API Reference
presidio_anonymizer
Anonymizer root module.
AnonymizerEngine
Bases: EngineBase
AnonymizerEngine class.
Handles the entire logic of the Presidio-anonymizer. Gets the original text and replaces the PII entities with the desired anonymizers.
METHOD | DESCRIPTION |
---|---|
anonymize |
Anonymize method to anonymize the given text. |
add_anonymizer |
Add a new anonymizer to the engine. |
remove_anonymizer |
Remove an anonymizer from the engine. |
get_anonymizers |
Return a list of supported anonymizers. |
Source code in presidio_anonymizer/anonymizer_engine.py
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 |
|
anonymize
anonymize(
text: str,
analyzer_results: List[RecognizerResult],
operators: Optional[Dict[str, OperatorConfig]] = None,
conflict_resolution: ConflictResolutionStrategy = ConflictResolutionStrategy.MERGE_SIMILAR_OR_CONTAINED,
) -> EngineResult
Anonymize method to anonymize the given text.
:example:
from presidio_anonymizer import AnonymizerEngine from presidio_anonymizer.entities import RecognizerResult, OperatorConfig
Initialize the engine with logger.
engine = AnonymizerEngine()
Invoke the anonymize function with the text, analyzer results and
Operators to define the anonymization type.
result = engine.anonymize( text="My name is Bond, James Bond", analyzer_results=[RecognizerResult(entity_type="PERSON", start=11, end=15, score=0.8), RecognizerResult(entity_type="PERSON", start=17, end=27, score=0.8)], operators={"PERSON": OperatorConfig("replace", {"new_value": "BIP"})} )
print(result) text: My name is BIP, BIP. items: [ {'start': 16, 'end': 19, 'entity_type': 'PERSON', 'text': 'BIP', 'operator': 'replace'}, {'start': 11, 'end': 14, 'entity_type': 'PERSON', 'text': 'BIP', 'operator': 'replace'} ]
PARAMETER | DESCRIPTION |
---|---|
text
|
the text we are anonymizing
TYPE:
|
analyzer_results
|
A list of RecognizerResult class -> The results we received from the analyzer
TYPE:
|
operators
|
The configuration of the anonymizers we would like to use for each entity e.g.: {"PHONE_NUMBER":OperatorConfig("redact", {})} received from the analyzer
TYPE:
|
conflict_resolution
|
The configuration designed to handle conflicts among entities
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
EngineResult
|
the anonymized text and a list of information about the anonymized entities. |
Source code in presidio_anonymizer/anonymizer_engine.py
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 |
|
add_anonymizer
add_anonymizer(anonymizer_cls: Type[Operator]) -> None
Add a new anonymizer to the engine.
anonymizer_cls: The anonymizer class to add to the engine.
Source code in presidio_anonymizer/anonymizer_engine.py
103 104 105 106 107 108 109 110 |
|
remove_anonymizer
remove_anonymizer(anonymizer_cls: Type[Operator]) -> None
Remove an anonymizer from the engine.
anonymizer_cls: The anonymizer class to remove from the engine.
Source code in presidio_anonymizer/anonymizer_engine.py
112 113 114 115 116 117 118 119 |
|
get_anonymizers
get_anonymizers() -> List[str]
Return a list of supported anonymizers.
Source code in presidio_anonymizer/anonymizer_engine.py
224 225 226 227 |
|
BatchAnonymizerEngine
BatchAnonymizerEngine class.
A class that provides functionality to anonymize in batches.
PARAMETER | DESCRIPTION |
---|---|
anonymizer_engine
|
An instance of the AnonymizerEngine class.
TYPE:
|
METHOD | DESCRIPTION |
---|---|
anonymize_list |
Anonymize a list of strings. |
anonymize_dict |
Anonymize values in a dictionary. |
Source code in presidio_anonymizer/batch_anonymizer_engine.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 |
|
anonymize_list
anonymize_list(
texts: List[Optional[Union[str, bool, int, float]]],
recognizer_results_list: List[List[RecognizerResult]],
**kwargs
) -> List[Union[str, Any]]
Anonymize a list of strings.
PARAMETER | DESCRIPTION |
---|---|
texts
|
List containing the texts to be anonymized (original texts). Items with a
TYPE:
|
recognizer_results_list
|
A list of lists of RecognizerResult, the output of the AnalyzerEngine on each text in the list.
TYPE:
|
kwargs
|
Additional kwargs for the
DEFAULT:
|
Source code in presidio_anonymizer/batch_anonymizer_engine.py
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
|
anonymize_dict
anonymize_dict(
analyzer_results: Iterable[DictRecognizerResult], **kwargs
) -> Dict[str, str]
Anonymize values in a dictionary.
PARAMETER | DESCRIPTION |
---|---|
analyzer_results
|
Iterator of
TYPE:
|
kwargs
|
Additional kwargs for the
DEFAULT:
|
Source code in presidio_anonymizer/batch_anonymizer_engine.py
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 |
|
DeanonymizeEngine
Bases: EngineBase
Deanonymize text that was previously anonymized.
METHOD | DESCRIPTION |
---|---|
deanonymize |
Receive the text, entities and operators to perform deanonymization over. |
get_deanonymizers |
Return a list of supported deanonymizers. |
add_deanonymizer |
Add a new deanonymizer to the engine. |
remove_deanonymizer |
Remove a deanonymizer from the engine. |
Source code in presidio_anonymizer/deanonymize_engine.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
|
deanonymize
deanonymize(
text: str,
entities: List[OperatorResult],
operators: Dict[str, OperatorConfig],
) -> EngineResult
Receive the text, entities and operators to perform deanonymization over.
PARAMETER | DESCRIPTION |
---|---|
operators
|
the operators to apply on the anonymizer result entities
TYPE:
|
text
|
the full text with the encrypted entities
TYPE:
|
entities
|
list of encrypted entities
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
EngineResult
|
EngineResult - the new text and data about the deanonymized entities. |
Source code in presidio_anonymizer/deanonymize_engine.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
|
get_deanonymizers
get_deanonymizers() -> List[str]
Return a list of supported deanonymizers.
Source code in presidio_anonymizer/deanonymize_engine.py
32 33 34 35 |
|
add_deanonymizer
add_deanonymizer(deanonymizer_cls: Type[Operator]) -> None
Add a new deanonymizer to the engine.
anonymizer_cls: The deanonymizer class to add to the engine.
Source code in presidio_anonymizer/deanonymize_engine.py
37 38 39 40 41 42 43 44 |
|
remove_deanonymizer
remove_deanonymizer(deanonymizer_cls: Type[Operator]) -> None
Remove a deanonymizer from the engine.
deanonymizer_cls: The deanonymizer class to remove from the engine.
Source code in presidio_anonymizer/deanonymize_engine.py
46 47 48 49 50 51 52 53 |
|
ConflictResolutionStrategy
Bases: Enum
Conflict resolution strategy.
The strategy to use when there is a conflict between two entities.
MERGE_SIMILAR_OR_CONTAINED: This default strategy resolves conflicts between similar or contained entities. REMOVE_INTERSECTIONS: Effectively resolves both intersection conflicts among entities and default strategy conflicts. NONE: No conflict resolution will be performed.
Source code in presidio_anonymizer/entities/conflict_resolution_strategy.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
DictRecognizerResult
dataclass
Data class for holding the output of the Presidio Analyzer on dictionaries.
PARAMETER | DESCRIPTION |
---|---|
key
|
key in dictionary
TYPE:
|
value
|
value to run analysis on (either string or list of strings)
TYPE:
|
recognizer_results
|
Analyzer output for one value. Could be either: - A list of recognizer results if the input is one string - A list of lists of recognizer results, if the input is a list of strings. - An iterator of a DictRecognizerResult, if the input is a dictionary. In this case the recognizer_results would be the iterator of the DictRecognizerResult next level in the dictionary.
TYPE:
|
Source code in presidio_anonymizer/entities/engine/dict_recognizer_result.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
|
EngineResult
Engine result.
METHOD | DESCRIPTION |
---|---|
set_text |
Set a text. |
add_item |
Add an item. |
normalize_item_indexes |
Normalize the indexes to be index from start. |
to_json |
Return a json string serializing this instance. |
Source code in presidio_anonymizer/entities/engine/result/engine_result.py
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 |
|
set_text
set_text(text: str)
Set a text.
Source code in presidio_anonymizer/entities/engine/result/engine_result.py
24 25 26 |
|
add_item
add_item(item: OperatorResult)
Add an item.
PARAMETER | DESCRIPTION |
---|---|
item
|
an item to add to the list.
TYPE:
|
Source code in presidio_anonymizer/entities/engine/result/engine_result.py
28 29 30 31 32 33 |
|
normalize_item_indexes
normalize_item_indexes()
Normalize the indexes to be index from start.
Source code in presidio_anonymizer/entities/engine/result/engine_result.py
35 36 37 38 39 40 |
|
to_json
to_json() -> str
Return a json string serializing this instance.
Source code in presidio_anonymizer/entities/engine/result/engine_result.py
42 43 44 |
|
InvalidParamError
Bases: Exception
Throw exception with error when user input is not valid.
param msg: Message to be added to the exception
Source code in presidio_anonymizer/entities/invalid_exception.py
4 5 6 7 8 9 10 11 12 |
|
OperatorConfig
Hold the data of the required operator.
METHOD | DESCRIPTION |
---|---|
from_json |
Create OperatorConfig from json. |
Source code in presidio_anonymizer/entities/engine/operator_config.py
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 |
|
from_json
classmethod
from_json(params: Dict) -> OperatorConfig
Create OperatorConfig from json.
PARAMETER | DESCRIPTION |
---|---|
params
|
json e.g.: { "type": "mask", "masking_char": "*", "chars_to_mask": 4, "from_end": true }
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
OperatorConfig
|
OperatorConfig |
Source code in presidio_anonymizer/entities/engine/operator_config.py
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
|
OperatorResult
Bases: PIIEntity
A class to hold data for engines results either anonymize or deanonymize.
Source code in presidio_anonymizer/entities/engine/result/operator_result.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 |
|
to_dict
to_dict() -> Dict
Return object as Dict.
Source code in presidio_anonymizer/entities/engine/result/operator_result.py
25 26 27 |
|
from_json
classmethod
from_json(json: Dict) -> OperatorResult
Create OperatorResult from user json.
PARAMETER | DESCRIPTION |
---|---|
json
|
json representation for this operator result. For example: { "start": 0, "end": 10, "key": "1111111111111111", "entity_type":"PERSON", "text":"resulted_text", "operator":"encrypt", }
TYPE:
|
Source code in presidio_anonymizer/entities/engine/result/operator_result.py
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 |
|
PIIEntity
Bases: ABC
Abstract class to hold the text we are going to operate on metadata.
Source code in presidio_anonymizer/entities/engine/pii_entity.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 |
|
RecognizerResult
Bases: PIIEntity
Recognizer Result represents the findings of the detected entity.
Result of a recognizer analyzing the text.
PARAMETER | DESCRIPTION |
---|---|
entity_type
|
the type of the entity
TYPE:
|
start
|
the start location of the detected entity
TYPE:
|
end
|
the end location of the detected entity
TYPE:
|
score
|
the score of the detection
TYPE:
|
METHOD | DESCRIPTION |
---|---|
from_json |
Create RecognizerResult from json. |
has_conflict |
Check if two recognizer results are conflicted or not. |
contains |
Check if one result is contained or equal to another result. |
equal_indices |
Check if the indices are equal between two results. |
intersects |
Check if self intersects with a different RecognizerResult. |
Source code in presidio_anonymizer/entities/engine/recognizer_result.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 |
|
from_json
classmethod
from_json(data: Dict)
Create RecognizerResult from json.
PARAMETER | DESCRIPTION |
---|---|
data
|
e.g. { "start": 24, "end": 32, "score": 0.8, "entity_type": "NAME" }
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
RecognizerResult |
Source code in presidio_anonymizer/entities/engine/recognizer_result.py
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
|
has_conflict
has_conflict(other)
Check if two recognizer results are conflicted or not.
I have a conflict if: 1. My indices are the same as the other and my score is lower. 2. If my indices are contained in another.
PARAMETER | DESCRIPTION |
---|---|
other
|
RecognizerResult
|
RETURNS | DESCRIPTION |
---|---|
|
Source code in presidio_anonymizer/entities/engine/recognizer_result.py
93 94 95 96 97 98 99 100 101 102 103 104 105 106 |
|
contains
contains(other)
Check if one result is contained or equal to another result.
PARAMETER | DESCRIPTION |
---|---|
other
|
another RecognizerResult
|
RETURNS | DESCRIPTION |
---|---|
bool |
Source code in presidio_anonymizer/entities/engine/recognizer_result.py
108 109 110 111 112 113 114 115 |
|
equal_indices
equal_indices(other)
Check if the indices are equal between two results.
PARAMETER | DESCRIPTION |
---|---|
other
|
another RecognizerResult
|
RETURNS | DESCRIPTION |
---|---|
|
Source code in presidio_anonymizer/entities/engine/recognizer_result.py
117 118 119 120 121 122 123 124 |
|
intersects
intersects(other) -> int
Check if self intersects with a different RecognizerResult.
RETURNS | DESCRIPTION |
---|---|
int
|
If intersecting, returns the number of intersecting characters. If not, returns 0 |
Source code in presidio_anonymizer/entities/engine/recognizer_result.py
126 127 128 129 130 131 132 133 134 135 136 137 138 139 |
|