- 【Updated on May 12, 2025】 Integration of CiNii Dissertations and CiNii Books into CiNii Research
- Trial version of CiNii Research Knowledge Graph Search feature is available on CiNii Labs
- 【Updated on June 30, 2025】Suspension and deletion of data provided by Nikkei BP
- Regarding the recording of “Research Data” and “Evidence Data”
Normalized Sum-over-Paths Edit Distances
Description
In this paper, normalized SoP string-edit distances, taking into account all possible alignments between two sequences, are investigated. These normalized distances are variants of the Sum-over-Paths (SoP) distances which compute the expected cost on all sequence alignments by favoring low-cost ones – therefore favoring good alignment. Such distances consider two sequences tied by many optimal or nearly-optimal alignments as more similar than two sequences sharing only one, optimal, alignment. They depend on a parameter, θ, and reduce to the standard distances – the edit-distance or the longest common subsequence – when θ → 0, while having the same time complexity. This paper puts the emphasis on applying some type of normalization of the expectation of the cost. Experimental results for clustering and classification tasks performed on four OCR data sets show that (i) the applied normalization generally improves the existing results, and (ii) as for the SoP edit-distances, the normalized SoP edit-distances clearly outperform the non-randomized measures, i.e. the standard edit distance and longest common subsequence.
Journal
-
- 2010 20th International Conference on Pattern Recognition
-
2010 20th International Conference on Pattern Recognition 1044-1047, 2010-08-01
IEEE