Effect of Singular Value Decomposition and Weighting by Singular Value of Document-Term Matrix, for Large-scale Data Perspective and Targeted Data Extraction

Hirano Mariko, Kobayakawa Takeshi S.

doi:10.5715/jnlp.20.335

Bibliographic Information

Other Title

大規模データの俯瞰とターゲットデータの抽出に対する文書‐単語行列の特異値分解と特異値による重みづけの有効性
大規模データの俯瞰とターゲットデータの抽出に対する文書-単語行列の特異値分解と特異値による重みづけの有効性
ダイキボデータノフカントターゲットデータノチュウシュツニタイスルブンショ-タンゴギョウレツノトクイチブンカイトトクイチニヨルオモミズケノユウコウセイ

Search this article

Description

We analyzed tweets broadcasted until four days after the occurrence of the Great East Japan Earthquake, which are provided by the Project 311. After obtaining a general view from tweets clustering, we created a set of targeted extraction categories from them and constructed a tweet extractor tailored to the target. In a sequence of such processes, improvement of the clustering, which is used to discover the target category for extraction, becomes very important. A method is proposed that utilizes the Singular Value as weights for features, while the well-known conventional use of Singular Value Decomposition is limited to reducing its dimension. In addition, we proposed an evaluation criterion for a human-aided clustering task, and conducted experiments to compare these criteria, including commonly-used ones, with the actual time spent by humans for performing such a task. The experiments show the effectiveness of the proposed weighting method and the competency of our criterion, mainly from the perspective of time efficiency of the task. As for the targeted data-extraction task, which is also a classification problem, some improvement in accuracy is observed although the training process itself involves a weighting mechanism.

Journal

Journal of Natural Language Processing

Journal of Natural Language Processing 20 (3), 335-365, 2013

The Association for Natural Language Processing

Keywords

Details 詳細情報について

Export

Report a problem