書誌事項
- タイトル別名
-
- KWIC System on WEB Documents
この論文をさがす
説明
A KWIC (KeyWord In Context) system is a useful tool to investigate the usage oflanguage.We developed a KWIC system for a huge WEB text.The text data isextracted from about 350 giga byte WEB pages and contains more than 10 billioncharacters.It was done by a crawler for about 2month period.The amount of thetext data exceeds 4 giga bytes which can be expressed in 32 bits.We developed asuffix array indexer which can handle 40 bits and the system searches sentences withdesired keywords in it.In order to show the usefulness of the system for Japaneselearners as a second language, we collect KWIC data for “TO-ITAMU (painful like)” and analyzed onomatopoeia appear before the expression.
収録刊行物
-
- 自然言語処理
-
自然言語処理 12 (4), 245-252, 2005
一般社団法人 言語処理学会
- Tweet
詳細情報 詳細情報について
-
- CRID
- 1390282679453136000
-
- NII論文ID
- 130004101401
-
- ISSN
- 21858314
- 13407619
-
- 本文言語コード
- ja
-
- データソース種別
-
- JaLC
- Crossref
- CiNii Articles
-
- 抄録ライセンスフラグ
- 使用不可