LinearChainCRF¶

APIs¶

class pyis.python.ops.LinearChainCRF(self: ops.LinearChainCRF, model_file: str) → None¶

A linear chain conditional random field(CRF) implementation.

Create a LinearChainCRF object given the model file.

Parameters: model_file (str) – The model file.

predict(self: ops.LinearChainCRF, len: int, features: List[Tuple[int, int, float]]) → List[int]¶

Given a list of features triggered by the input sample, return the label for each token of the input.

Parameters

len (int) – token number of the input.
features (List[Tuple[int, int, float]]) – List of features, each is represented by a tuple of (token index, feature id, feature value).

Returns

List of labels.

static train(data_file: str, model_file: str, alg: str = 'l1sgd', max_iter: int = 150) → None¶

Train a crf model.

Parameters

data_file (str) – Training data file.
model_file (str) – Target file for the generated model file.
alg (str) – The training algorithm. perceptron: Structured Perceptron, l1sgd: Stochastic Gradient Descent Training for L1-regularized Log-linear Models
max_iter (int) – Maximun iterations.

Example¶

# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT license.

import os
from pyis.python import ops
from pyis.python.offline import SequenceTagging as SeqTag

tmp_dir = 'tmp/doc_linear_chain_crf/'
os.makedirs(tmp_dir, exist_ok=True)

# training
xs = [
        [ops.TextFeature(0, 1.0, 0, 0), ops.TextFeature(1, 1.0, 1, 1)], # hello tom
        [ops.TextFeature(0, 1.0, 0, 0), ops.TextFeature(2, 1.0, 1, 1)], # hello jerry
    ]
ys = [
        [0, 1], # O NAME
        [0, 1], # O NAME
    ]

data_file = os.path.join(tmp_dir, 'lccrf.data.txt')
SeqTag.text_features_to_lccrf(xs, ys, data_file)

model_file = os.path.join(tmp_dir, 'lccrf.model.bin')
ops.LinearChainCRF.train(data_file, model_file, 'l1sgd')
lccrf = ops.LinearChainCRF(model_file)

# inference
values = lccrf.predict(2, [(0, 0, 1.0), (1, 1, 1.0)]) # hello tom
print(values)