-
- ISHITA Emi
- Kyushu University
-
- AGATA Teru
- Asia University
-
- MIYATA Yosuke
- Teikyo University
-
- IKEUCHI Atsushi
- University of Tsukuba
-
- UEDA Shuichi
- Rikkyo University
Bibliographic Information
- Other Title
-
- 構造と構成要素に基づく学術論文の自動判定
- コウゾウ ト コウセイ ヨウソ ニ モトズク ガクジュツ ロンブン ノ ジドウ ハンテイ
Search this article
Description
In this paper, we develop rules for the automatic detection of scientific papers from PDF files on the Web. We inspected the structure and elements of scientific papers and observed that scientific papers typically have certain basic elements and an IMRAD format. We examined 1,172 scientific papers on the Web. The results indicate that the papers share common elements such as title, authors, keyword, and references and 40% of the papers, which have an explicit structure, have an IMRAD or a similar format. We develop rules for automatic detection of scientific papers using information based on their structure and elements obtained from the inspection process. The rules are evaluated using English and Japanese PDF collections, which were compiled by random sampling from the Web and consisted of 20,000 files each. Random forest classifier is performed and an F-value of 0.74 is obtained for English PDF files and 0.53 for Japanese PDF files. These results indicate that the rules developed using the approach given in this study can detect scientific papers from PDF files on the Web.
Journal
-
- Journal of Japan Society of Library and Information Science
-
Journal of Japan Society of Library and Information Science 60 (1), 18-34, 2014
Japan Society of Library and Information Science
- Tweet
Details 詳細情報について
-
- CRID
- 1390282679545022720
-
- NII Article ID
- 110009816038
-
- NII Book ID
- AA11333306
-
- ISSN
- 24324027
- 13448668
-
- NDL BIB ID
- 025453669
-
- Text Lang
- ja
-
- Data Source
-
- JaLC
- NDL Search
- CiNii Articles
-
- Abstract License Flag
- Disallowed