Compressed pattern matching for SEQUITUR
説明
SEQUITUR due to Nevill-Manning and Witten (see Journal of Artificial Intelligence Research, vol.7, p.67-82, 1997) is a powerful program to infer a phrase hierarchy from the input text, that also provides extremely effective compression of large quantities of semi-structured text. In this paper, we address the problem of searching in SEQUITUR compressed text directly. We show a compressed pattern matching algorithm that finds a pattern in compressed text without explicit decompression. We show that our algorithm is approximately 1.27 times faster than a decompression followed by an ordinal search.
収録刊行物
-
- Proceedings DCC 2001. Data Compression Conference
-
Proceedings DCC 2001. Data Compression Conference 469-478, 2002-11-13
IEEE Comput. Soc