Foma(Finite State Compiler)¶
FomaFst is ported from https://fomafst.github.io/, which is an open-sourced library for constructing finite-state automata and transducers for various uses including pattern matching, morphological analysis and word corrections.
APIs¶
- class pyis.python.ops.FomaFst(self: ops.FomaFst, fst_path: str) None ¶
FomaFst implements finite state transducer based on open-sourced library foma.
Create a FomaFst instance
- Parameters
fst_path (str) – path to the pre-built foma fst file
- apply_down(self: ops.FomaFst, query: str) str ¶
Apply input query to the pre-built foma fst
- Parameters
query (str) – input query to be applied to fst
- Returns
transduced query (str)
- static compile_from_file(infile: str) None ¶
Compile a foma script from a file
- Parameters
infile (str) – foma script path to be compiled.
- static compile_from_str(infile_str: str) None ¶
Compile a foma script from a string
- Parameters
infile_str (str) – foma script to be compiled.
Example¶
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT license.
from pyis.python import ops
from typing import List
# build foma fst
# build foma fst is via its script language, the language spec can be found:
fst_infile_src = \
r"""
define Number ["0"|1|2|3|4|5|6|7|8|9] ;
define Day [(1|2) Number | 3 "0" | 3 1];
define Year Number (Number) (Number) (Number); # Could use Number^<5
define WeekDay ["Monday"|"Tuesday"|"Wednesday"|"Thursday"|"Friday"|"Saturday"|"Sunday"];
define Month ["January"|"February"|"March"|"April"|"May"|"June"|"July"|"August"|"September"|"October"|"November"|"December"];
define RegDates [WeekDay | Month " " Day (", " Year)];
define DateParser [RegDates @-> "<DATE>" ... "</DATE>"];
regex DateParser;
apply down April 14, 2010 – Nelson Mandela was honoured
save stack temp.fst
"""
# you can also save the script into a file, and call `compile_from_file` instead
ops.FomaFst.compile_from_str(fst_infile_src)
# instantiate a FomaFst object by loading a compiled fst
fst = ops.FomaFst("temp.fst")
# use foma fst
# apply_down(input_str) will apply the input_str to the fst on top of the stack
# note: `regex DateParser` push the DateParser to the top of the stack
assert(fst.apply_down("April 14, 2010 – Nelson Mandela was honoured") == \
"<DATE>April 14, 2010</DATE> – Nelson Mandela was honoured")