Foma(Finite State Compiler)

FomaFst is ported from https://fomafst.github.io/, which is an open-sourced library for constructing finite-state automata and transducers for various uses including pattern matching, morphological analysis and word corrections.

APIs

class pyis.python.ops.FomaFst(self: ops.FomaFst, fst_path: str) None

FomaFst implements finite state transducer based on open-sourced library foma.

Create a FomaFst instance

Parameters

fst_path (str) – path to the pre-built foma fst file

apply_down(self: ops.FomaFst, query: str) str

Apply input query to the pre-built foma fst

Parameters

query (str) – input query to be applied to fst

Returns

transduced query (str)

static compile_from_file(infile: str) None

Compile a foma script from a file

Parameters

infile (str) – foma script path to be compiled.

static compile_from_str(infile_str: str) None

Compile a foma script from a string

Parameters

infile_str (str) – foma script to be compiled.

Example

# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT license.

from pyis.python import ops
from typing import List

# build foma fst
# build foma fst is via its script language, the language spec can be found:
fst_infile_src = \
r"""
define Number ["0"|1|2|3|4|5|6|7|8|9] ;
define Day [(1|2) Number | 3 "0" | 3 1];
define Year Number (Number) (Number) (Number); # Could use Number^<5

define WeekDay ["Monday"|"Tuesday"|"Wednesday"|"Thursday"|"Friday"|"Saturday"|"Sunday"];
define Month ["January"|"February"|"March"|"April"|"May"|"June"|"July"|"August"|"September"|"October"|"November"|"December"];
define RegDates [WeekDay | Month " " Day (", " Year)];

define DateParser [RegDates @-> "<DATE>" ... "</DATE>"];

regex DateParser;
apply down April 14, 2010 – Nelson Mandela was honoured
save stack temp.fst
"""

# you can also save the script into a file, and call `compile_from_file` instead
ops.FomaFst.compile_from_str(fst_infile_src) 

# instantiate a FomaFst object by loading a compiled fst
fst = ops.FomaFst("temp.fst") 

# use foma fst
# apply_down(input_str) will apply the input_str to the fst on top of the stack
# note: `regex DateParser` push the DateParser to the top of the stack
assert(fst.apply_down("April 14, 2010 – Nelson Mandela was honoured") == \
    "<DATE>April 14, 2010</DATE> – Nelson Mandela was honoured")