ＷＥＢ文書を対象にしたＫＷＩＣシステム

関根 聡, 武田 善行, 吉平 健治

doi:10.5715/jnlp.12.4_245

書誌事項

タイトル別名

KWIC System on WEB Documents

説明

A KWIC (KeyWord In Context) system is a useful tool to investigate the usage oflanguage.We developed a KWIC system for a huge WEB text.The text data isextracted from about 350 giga byte WEB pages and contains more than 10 billioncharacters.It was done by a crawler for about 2month period.The amount of thetext data exceeds 4 giga bytes which can be expressed in 32 bits.We developed asuffix array indexer which can handle 40 bits and the system searches sentences withdesired keywords in it.In order to show the usefulness of the system for Japaneselearners as a second language, we collect KWIC data for “TO-ITAMU (painful like)” and analyzed onomatopoeia appear before the expression.

収録刊行物

自然言語処理

自然言語処理 12 (4), 245-252, 2005

一般社団法人　言語処理学会

キーワード

詳細情報詳細情報について

CRID: 1390282679453136000

NII論文ID: 130004101401

DOI: 10.5715/jnlp.12.4_245

ISSN: 21858314; 13407619

Web Site: http://www.jstage.jst.go.jp/article/jnlp1994/12/4/12_4_245/_pdf

本文言語コード: ja

データソース種別

JaLC
Crossref
CiNii Articles

抄録ライセンスフラグ: 使用不可

書き出し

問題の指摘