Pattern Discovery from Distributions of String Frequency

Bibliographic Information

Other Title
  • 文字列の頻度分布による共通パタン発見
  • モジレツ ノ ヒンド ブンプ ニ ヨル キョウツウ パタン ハッケン

Search this article

Abstract

A pattern is a string over constant and variable symbols. A string generated by a pattern is one obtained by replacing all variables by some constant strings. In this paper, we consider the template discovery problem which is, given a set of strings generated by some fixed but unknown pattern, to find all constant parts of the pattern. If any constant part is long enough and replacing variables follows some natural probabilistic distributions, we show that there exist an efficient algorithm for the problem, using disparity of string frequencies among constant parts and replaced parts. We also show accuracy and effectiveness by experiments by using HTML files collected from the Web.

Journal

Citations (1)*help

See more

References(23)*help

See more

Details 詳細情報について

Report a problem

Back to top