Edit distance error evaluation node with the option of specifying penalty of substitution, deletion and insertion, as well as squashing the input sequences and ignoring certain samples. Using the classic DP algorithm as described in https://en.wikipedia.org/wiki/Edit_distance, adjusted to take into account the penalties.

edit_distance_error(input_a, input_b, subPen = 1, delPen = 1,
  squashInputs = FALSE, tokensToIgnore = c(), name = "")

Arguments

name

Details

Each sequence in the inputs is expected to be a matrix. Prior to computation of the edit distance, the operation extracts the indices of maximum element in each column. For example, a sequence matrix 1 2 9 1 3 0 3 2 will be represented as the vector of labels (indices) as [1, 0, 0, 1], on which edit distance will be actually evaluated.

The node allows to squash sequences of repeating labels and ignore certain labels. For example, if squashInputs is true and tokensToIgnore contains label ‘-‘ then given first input sequence as s1=”1-12-” and second as s2=”-11–122” the edit distance will be computed against s1’ = “112” and s2’ = “112”.

The returned error is computed as: EditDistance(s1,s2) * length(s1’) / length(s1)

Just like ClassificationError and other evaluation nodes, when used as an evaluation criterion, the SGD process will aggregate all values over an epoch and report the average, i.e. the error rate. Primary objective of this node is for error evaluation of CTC training, see formula (1) in “Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks”, ftp://ftp.idsia.ch/pub/juergen/icml2006.pdf