- 【Updated on May 12, 2025】 Integration of CiNii Dissertations and CiNii Books into CiNii Research
- Trial version of CiNii Research Knowledge Graph Search feature is available on CiNii Labs
- Suspension and deletion of data provided by Nikkei BP
- Regarding the recording of “Research Data” and “Evidence Data”
On the Effect of Corpus Size in Words Similarity Calculation
-
- Aizawa Akiko
- National Institute of Informatics Graduate School of Advanced Studies
Bibliographic Information
- Other Title
-
- 類語関係抽出タスクにおけるコーパス規模拡大の影響
Search this article
Description
This paper focuses on the effect of corpus size in word similarity calculation. Recently, large-scale text corpora became available for automatic synonyms extraction. And it has been reported that the performance of simple methods adapted to large-scale corpora is sometimes comparable to the one of more elaborative methods such as LDA adapted to traditional linguistic resources. In this paper, we report our experimental results as to how the quantity of the corpus complements the quality of similarity calculation. Our results show that the similarity calculation is sometimes influenced by the absolute word frequencies and that there exists a simple filtering method that can correct the bias.
Journal
-
- IPSJ SIG Notes
-
IPSJ SIG Notes 94 91-98, 2006
Information Processing Society of Japan (IPSJ)
- Tweet
Details 詳細情報について
-
- CRID
- 1573105976776148864
-
- NII Article ID
- 110004824264
-
- NII Book ID
- AN10115061
-
- Text Lang
- ja
-
- Data Source
-
- CiNii Articles