[Updated on Apr. 18] Integration of CiNii Articles into CiNii Research

Automatic Detection of Scientific Papers Based on Their Structure and Elements


Bibliographic Information

Other Title
  • 構造と構成要素に基づく学術論文の自動判定
  • コウゾウ ト コウセイ ヨウソ ニ モトズク ガクジュツ ロンブン ノ ジドウ ハンテイ

Search this article


In this paper, we develop rules for the automatic detection of scientific papers from PDF files on the Web. We inspected the structure and elements of scientific papers and observed that scientific papers typically have certain basic elements and an IMRAD format. We examined 1,172 scientific papers on the Web. The results indicate that the papers share common elements such as title, authors, keyword, and references and 40% of the papers, which have an explicit structure, have an IMRAD or a similar format. We develop rules for automatic detection of scientific papers using information based on their structure and elements obtained from the inspection process. The rules are evaluated using English and Japanese PDF collections, which were compiled by random sampling from the Web and consisted of 20,000 files each. Random forest classifier is performed and an F-value of 0.74 is obtained for English PDF files and 0.53 for Japanese PDF files. These results indicate that the rules developed using the approach given in this study can detect scientific papers from PDF files on the Web.


Citations (0)*help

See more


See more

Related Articles

See more

Related Data

See more

Related Books

See more

Related Dissertations

See more

Related Projects

See more

Related Products

See more


Report a problem

Back to top