キーワードの遅延抽出を考慮した文書検索構造の効率的構成法

岡田, 真, 安藤, 一秋, 森田, 和宏, 青江, 順一

書誌事項

タイトル別名

キーワードノチエンチュウシュツオコウリョシタブンショケンサクコウゾウノコウリツテキコウセイホウ
An Efficient Construction of Text Retrieval Structures Considering Delayed Keyword Extraction
データベース

この論文をさがす

抄録

文書から抽出されたキーワードを索引表のキー（見出し）とする文書検索技術は非常によく利用されているが，複合語キーワードの的確な抽出技術は，依然として重要な課題となっている．しかし，抽出条件は利用目的に依存するので，キーワード候補の決定を検索段階まで遅延できれば，目的に応じたキーワード抽出と検索が実現できる．この課題に対して，本論文では，複数キーワードの文字列照合マシンAC（Aho and Corasick）を拡張することで，複合語キーワードのすべての候補をコンパクトに格納できる検索構造を提案し，検索構造上で目的に合ったキーワードを決定できる手法を提案する．9万ファイル（キーワード496 837語）による実験により，提案手法によるキーワード抽出遅延の有効性，検索構造の高速性が実証される．

Although extracting keywords efficiently is an important task in text retrieval systems,it is very difficult to determine suitable keywords for arbitrary purposesbecause there are many compound words.This paper presents a retrieval structure that can delay keyword extraction untila retrieval stage.It needs to integrate the keyword extraction stage into the keyword retrieval stage.A string pattern matching machine by Aho and Corasick (AC) is extended tothe usage of a delayed extraction and retrieval structure.The approach is evaluated by the experimental estimation that is supportedby the simulation results for 90,000 Japanese text files.

収録刊行物

情報処理学会論文誌

情報処理学会論文誌 41 (4), 1171-1179, 2000-04-15

東京 : 情報処理学会

詳細情報詳細情報について

CRID: 1050845762815421952

NII論文ID: 110002725324

NII書誌ID: AN00116647

ISSN: 18827764; 03875806

NDL書誌ID: 5345028

Web Site: http://id.nii.ac.jp/1001/00012358/; http://id.ndl.go.jp/bib/5345028; https://ndlsearch.ndl.go.jp/books/R000000004-I5345028

本文言語コード: ja

資料種別: journal article

データソース種別

IRDB
NDL
CiNii Articles

キーワードの遅延抽出を考慮した文書検索構造の効率的構成法

書誌事項

この論文をさがす

抄録

収録刊行物

被引用文献 (1)*注記

参考文献 (17)*注記