A Method of Refining Topic Models Based on Term and Document Frequencies.

HIGASHI Kazuyuki, TAKAHASHI Hitoshi, NAKAGAWA Hiroyuki, TSUCHIYA Tatsuhiro

doi:10.11309/jssst.36.4_25

Bibliographic Information

Other Title

単語の出現頻度と類似性に基づいたトピックモデル洗練化手法
タンゴノシュツゲンヒンドトルイジセイニモトズイタトピックモデルセンレンカシュホウ

Search this article

Description

<p>Software developers have made increasing use of natural language documents in many cases. Documents may contain useful information for software developers; however, it is difficult to extract such information when the number of the documents is considerably large. Latent Dirichlet Allocation (LDA) is a promising way of topic modeling. LDA-based topic modeling can be useful in facilitating comprehension of such documents. In LDA, a stop word list is used to filter general words for accurate topic classification. However, when using an existing stop word list, it is difficult to filter words that are not general but frequently appear in the target documents. In this paper, we propose a method that consists of two steps: stop word extraction from target documents and similar topic merging. We experimentally evaluate the method by applying it to mailing list. The experimental results demonstrate that our method constructs a topic model more accurately than the existing method.</p>

Journal

Computer Software

Computer Software 36 (4), 4_25-4_31, 2019-10-25

Japan Society for Software Science and Technology

Details 詳細情報について

CRID: 1390283659833300992

NII Article ID: 130007772583

NII Book ID: AN10075819

DOI: 10.11309/jssst.36.4_25

NDL BIB ID: 030076870

ISSN: 02896540

Web Site: http://id.ndl.go.jp/bib/030076870; https://ndlsearch.ndl.go.jp/books/R000000004-I030076870

Text Lang: ja

Article Type: journal article

Data Source

JaLC
NDL Search
CiNii Articles
KAKEN

Abstract License Flag: Disallowed

Export

Report a problem