RegexFeaturizer

APIs

class pyis.python.ops.RegexFeaturizer(self: ops.RegexFeaturizer, regexes: List[str]) None

RegexFeaturizer extracts token spans that match regex patterns specified.

Create a RegexFeaturizer object.

Parameters

regexes (List[str]) – Regex patterns for matching. Each pattern will be assigned an id. The id is its index in the list, starting from 0.

add_regex(self: ops.RegexFeaturizer, regex: str) None

Add a new regex pattern.

Parameters

regex (str) – The new regex pattern.

transform(self: ops.RegexFeaturizer, tokens: List[str]) List[ops.TextFeature]

Extract token spans that match regex patterns specified.

Parameters

tokens (List[str]) – The token list.

Example

# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT license.

from pyis.python import ops

featurizer = ops.RegexFeaturizer(['\d+', r'\d+:\d+'])
features = featurizer.transform(['the', 'answer', 'is', '42'])
# features = [(0, 1.0, 3, 3)]
features = featurizer.transform(['set', 'an', 'alarm', 'at', '7:00', 'tomorrow'])
# features = [(1, 1.0, 4, 4)]