Operator 101: WordDict¶
In PyIS, an operator is either a stateless global function, or a member function of a class that holds external states(data members). In either case, an operator is thread-safe and reentrant.
In most cases, we prefer a class member function. The WordDict is such a sample class. It demostrates how an operator is implemented in the form of a class, and its behavior.
APIs¶
- class pyis.python.ops.WordDict(self: ops.WordDict, data_file: str) None ¶
WordDict maps source words(tokens) of a sentence into target words based on a defined dictionary.
Create a WordDict object given the dictionary data file.
The data file should contains two columns, separated by WHITESPACE characters. The first and second columns are source words and target words correspondingly. For example,
suzhou 苏州 beijing 北京
Ensure the file is encoded in utf-8.
- Parameters
data_file (str) – The dictionary data file for translation.
- translate(self: ops.WordDict, tokens: List[str]) List[str] ¶
Given words of a sentence, tranlate each word that appears in the dictionary to the target word.
If the word doesn’t exist in the dictionary, it will be intact and copied to the output sentence.
Ensure words are encoded in utf-8.
- Parameters
tokens (List[str]) – List of words of the sentence.
- Returns
Target words of the sentence.
Example¶
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT license.
import os
from pyis.python import ops
from typing import List
class Model:
def __init__(self, dict_data_file: str):
super().__init__()
self.dictionary = ops.WordDict(dict_data_file)
def run(self, tokens: List[str]) -> List[str]:
res = self.dictionary.translate(tokens)
return res
dict_data_file = os.path.join(os.path.dirname(__file__), 'word_dict.data.txt')
m = Model(dict_data_file)
res = m.run(["life", "in", "suzhou"])
print(res)