WWW検索エンジンのためのインクリメンタルな全文検索インデックス更新方式

吉原, 潤, Jun, Yoshiwara

書誌事項

タイトル別名

Incremental Updating Scheme of a Full-Text Index Structure for WWW Text Search Engines
WWW ケンサクエンジンノタメノインクリメンタルナゼンブンケンサクインデックスコウシンホウシキ

この論文をさがす

説明

suffix arrayはテキストの接尾辞のポインタを接尾辞の辞書順に並べたもので任意の部分文字列検索を高速に行うことができる.しかし更新のオーバーヘッドが大きく日本語の全文検索を行うWWW検索エンジンにsuffix array を用いるには頻繁に起こる更新によるオーバーヘッドが問題となる.本論文ではsuffix array の効率的な更新方法を提案する.この方式では既存の巨大なsuffix array はすぐには更新せず差分のテキストを元に差分suffix array を作る.検索は複数のsuffix array 全てに対して行いそれらの結果をマージする.もとのsuffix array と差分suffix array をマージすることにより検索処理の際のマージオーバヘッドを軽減する.suffix array を使ったWWW検索エンジンを実装して実験を行い提案方式の有効性を検証した.

A suffix array is a full-text index data structure, and is an array of pointers to lexicographic-ordered suffixes of text data. A suffix array is efficient for retrieving any substring of text, but requires a lot of overhead for updating it. In general, constant update operations are required in Web search engines, so the development of an efficient updating technique for suffix arrays is an important research issue. In this paper, we propose an efficient updating scheme of suffix arrays. The scheme is based on the following two ideas. First, instead of maintaining a single large suffix array, several small differential suffix arrays are created when new text data are obtained. At the processing of a retrieval operation, all the suffix arrays are retrieved and the retrieved results are merged into one. Second, when the number or the amount of differential suffix arrays exceeds the predetermined threshold, the differential suffix arrays are merged into one. We conducted experiments using an implementation and real WWW data and we verified the effectiveness of the proposed scheme.

収録刊行物

情報処理学会論文誌データベース（TOD）

情報処理学会論文誌データベース（TOD） 40 (SIG08(TOD4)), 112-126, 1999-11-15

情報処理学会

詳細情報詳細情報について

CRID: 1050845762822103552

NII論文ID: 10010357155

NII書誌ID: AA11464847

ISSN: 18827799; 03875806

NDL書誌ID: 5696028

Web Site: https://ipsj.ixsq.nii.ac.jp/records/17761; http://id.ndl.go.jp/bib/5696028; https://ndlsearch.ndl.go.jp/books/R000000004-I5696028

本文言語コード: ja

資料種別: journal article

データソース種別

IRDB
NDLサーチ
CiNii Articles

書き出し

問題の指摘

WWW検索エンジンのためのインクリメンタルな全文検索インデックス更新方式

書誌事項

この論文をさがす

説明

収録刊行物

被引用文献 (4)*注記

参考文献 (9)*注記

キーワード

詳細情報詳細情報について

書き出し

問題の指摘

WWW検索エンジンのためのインクリメンタルな全文検索インデックス更新方式

書誌事項

この論文をさがす

説明

収録刊行物

被引用文献 (4)*注記

参考文献 (9)*注記

キーワード

詳細情報 詳細情報について

書き出し

問題の指摘

詳細情報詳細情報について