Converting PDF files to XML files

Wende Zhang

doi:10.1108/02640470810851743

<jats:sec><jats:title content-type="abstract-heading">Purpose</jats:title><jats:p>The purpose of this paper is to develop a system that can convert PDF files to XML files.</jats:p></jats:sec><jats:sec><jats:title content-type="abstract-heading">Design/methodology/approach</jats:title><jats:p>The system works with XML as an information display model and XSLT as an information extraction rule. The process is illustrated by converting a scientific and technological paper in PDF to a valid XML file.</jats:p></jats:sec><jats:sec><jats:title content-type="abstract-heading">Findings</jats:title><jats:p>Because the PDF file adopts the self‐descriptive definition, its content information and the display information exists in different objects; therefore, it is not easy to directly extract information from the PDF source file. The undirected way to solve this problem in the system design was to convert the PDF source file to a relatively easy processing intermediate format, which can then be automatically converted to the target file in accordance with relevant rules.</jats:p></jats:sec><jats:sec><jats:title content-type="abstract-heading">Originality/value</jats:title><jats:p>It is important to be able to easily and conveniently extract information from PDF files and this paper shows how it can be done. The design ideas contained in the paper can also be applied to information extraction from other types of files.</jats:p></jats:sec>

Converting PDF files to XML files

抄録

収録刊行物

被引用文献 (2)*注記

キーワード

詳細情報詳細情報について

書き出し

問題の指摘

Converting PDF files to XML files

抄録

収録刊行物

被引用文献 (2)*注記

キーワード

詳細情報 詳細情報について

書き出し

問題の指摘

参加プロジェクトリスト

詳細情報詳細情報について