RegexFeaturizer¶
APIs¶
- class pyis.python.ops.RegexFeaturizer(self: ops.RegexFeaturizer, regexes: List[str]) None ¶
RegexFeaturizer extracts token spans that match regex patterns specified.
Create a RegexFeaturizer object.
- Parameters
regexes (List[str]) – Regex patterns for matching. Each pattern will be assigned an id. The id is its index in the list, starting from 0.
- add_regex(self: ops.RegexFeaturizer, regex: str) None ¶
Add a new regex pattern.
- Parameters
regex (str) – The new regex pattern.
- transform(self: ops.RegexFeaturizer, tokens: List[str]) List[ops.TextFeature] ¶
Extract token spans that match regex patterns specified.
- Parameters
tokens (List[str]) – The token list.
Example¶
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT license.
from pyis.python import ops
featurizer = ops.RegexFeaturizer(['\d+', r'\d+:\d+'])
features = featurizer.transform(['the', 'answer', 'is', '42'])
# features = [(0, 1.0, 3, 3)]
features = featurizer.transform(['set', 'an', 'alarm', 'at', '7:00', 'tomorrow'])
# features = [(1, 1.0, 4, 4)]