Normalized Sum-over-Paths Edit Distances

Description

In this paper, normalized SoP string-edit distances, taking into account all possible alignments between two sequences, are investigated. These normalized distances are variants of the Sum-over-Paths (SoP) distances which compute the expected cost on all sequence alignments by favoring low-cost ones – therefore favoring good alignment. Such distances consider two sequences tied by many optimal or nearly-optimal alignments as more similar than two sequences sharing only one, optimal, alignment. They depend on a parameter, θ, and reduce to the standard distances – the edit-distance or the longest common subsequence – when θ → 0, while having the same time complexity. This paper puts the emphasis on applying some type of normalization of the expectation of the cost. Experimental results for clustering and classification tasks performed on four OCR data sets show that (i) the applied normalization generally improves the existing results, and (ii) as for the SoP edit-distances, the normalized SoP edit-distances clearly outperform the non-randomized measures, i.e. the standard edit distance and longest common subsequence.

Journal

Details 詳細情報について

Report a problem

Back to top