Operator 101: WordDict

In PyIS, an operator is either a stateless global function, or a member function of a class that holds external states(data members). In either case, an operator is thread-safe and reentrant.

In most cases, we prefer a class member function. The WordDict is such a sample class. It demostrates how an operator is implemented in the form of a class, and its behavior.

APIs

class pyis.python.ops.WordDict(self: ops.WordDict, data_file: str) None

WordDict maps source words(tokens) of a sentence into target words based on a defined dictionary.

Create a WordDict object given the dictionary data file.

The data file should contains two columns, separated by WHITESPACE characters. The first and second columns are source words and target words correspondingly. For example,

suzhou  苏州
beijing 北京

Ensure the file is encoded in utf-8.

Parameters

data_file (str) – The dictionary data file for translation.

translate(self: ops.WordDict, tokens: List[str]) List[str]

Given words of a sentence, tranlate each word that appears in the dictionary to the target word.

If the word doesn’t exist in the dictionary, it will be intact and copied to the output sentence.

Ensure words are encoded in utf-8.

Parameters

tokens (List[str]) – List of words of the sentence.

Returns

Target words of the sentence.

Example

# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT license.

import os
from pyis.python import ops
from typing import List

class Model:
    def __init__(self, dict_data_file: str):
        super().__init__()
        self.dictionary = ops.WordDict(dict_data_file)

    def run(self, tokens: List[str]) -> List[str]:
        res = self.dictionary.translate(tokens)
        return res

dict_data_file = os.path.join(os.path.dirname(__file__), 'word_dict.data.txt')

m = Model(dict_data_file)
res = m.run(["life", "in", "suzhou"])
print(res)