Pattern Discovery from Distributions of String Frequency
-
- Ikeda, Daisuke
- Computing and Communications Center, Kyushu University
-
- Yamada, Yasuhiro
- Graduate School of Information Science and Electrical Engineering, Kyushu University
-
- Hirokawa, Sachio
- Computing and Communications Center, Kyushu University
Bibliographic Information
- Other Title
-
- 文字列の頻度分布による共通パタン発見
- モジレツ ノ ヒンド ブンプ ニ ヨル キョウツウ パタン ハッケン
Search this article
Abstract
A pattern is a string over constant and variable symbols. A string generated by a pattern is one obtained by replacing all variables by some constant strings. In this paper, we consider the template discovery problem which is, given a set of strings generated by some fixed but unknown pattern, to find all constant parts of the pattern. If any constant part is long enough and replacing variables follows some natural probabilistic distributions, we show that there exist an efficient algorithm for the problem, using disparity of string frequencies among constant parts and replaced parts. We also show accuracy and effectiveness by experiments by using HTML files collected from the Web.
Journal
-
- 情報処理学会研究報告 : 自然言語処理
-
情報処理学会研究報告 : 自然言語処理 2003 (98), 25-32, 2003-09
東京 : 情報処理学会
- Tweet
Keywords
Details 詳細情報について
-
- CRID
- 1050580007681075072
-
- NII Article ID
- 120006655053
- 110002948741
-
- NII Book ID
- AN10539294
-
- ISSN
- 09196072
-
- HANDLE
- 2324/2968
-
- NDL BIB ID
- 6734372
-
- Text Lang
- ja
-
- Article Type
- conference paper
-
- Data Source
-
- IRDB
- NDL
- CiNii Articles