書誌事項
- タイトル別名
-
- A Comparative Study of Text Mining Techniques for Content Extraction : Using Analysis of Policy Speeches by Japanese Prime Ministers as an Example
- ナイヨウ チュウシュツ ノ タメ ノ テキストマイニング シュホウ ノ ヒカク ケンキュウ : ニホン ノ レキダイ シュショウ ノ ショシン ヒョウメイ エンゼツ ノ ナイヨウ ブンセキ オ レイ ニ
この論文をさがす
抄録
This paper applies TF-IDF values, correspondence analysis, and topic modeling to analyze the speeches of Japanese prime ministers since 2000, grouping them and summarizing their contents while also examining the characteristics of each analysis method. As a result, it was found that there is continuity in Japanese government policy, and regardless of the prime minister’s party affiliation, they continue to address issues inherited from previous cabinets, adapting to domestic and international situations and formulating new policies. These policies consistently prioritize the people, economy, and society. Over time, the focus has shifted through issues such as “education,” “structural reform,” “regional (revitalization),” “(robust) fiscal policy,” “reconstruction (from the great earthquake),” “the world and the future,” and “(responding to) new coronavirus and digitalization.” A review of the distinct characteristics of each analytical method revealed that all of them have their own unique strengths and limitations. Hence, when attempting to extract content from a text, it is advisable to employ these analytical methods in a comprehensive manner. For smaller data sets, the use of correspondence analysis is recommended, whereas for larger sets, topic modeling should be utilized to categorize texts and compile common points of view. Moreover, calculating TF-IDF values enables the identification of distinctive words for each text, thereby facilitating the summarization of document themes and content.
収録刊行物
-
- 名古屋大学人文学研究論集
-
名古屋大学人文学研究論集 7 85-100, 2024-03-31
名古屋大学人文学研究科
- Tweet
詳細情報 詳細情報について
-
- CRID
- 1390299826874151168
-
- NII書誌ID
- AA12814467
-
- HANDLE
- 2237/0002009955
-
- NDL書誌ID
- 033486040
-
- ISSN
- 2433233X
-
- 本文言語コード
- ja
-
- データソース種別
-
- JaLC
- IRDB
- NDL
-
- 抄録ライセンスフラグ
- 使用可