繰返し構造認識によるXMLパーサ高速化技術

太田, 智也, 西山, 博泰, 田村, 清朗, 宗近, 秀生

Bibliographic Information

Other Title

クリカエシコウゾウニンシキニヨル XML パーサコウソクカギジュツ
A High Performance XML Parser Using Repetition Pattern Recognition
プログラミング言語の実装技術

Search this article

Abstract

データ交換のための標準的な形式としてXMLが広く利用されている．XMLはテキスト形式で表現された柔軟性の高いフォーマットであるが，一方で，データ処理のオーバヘッドが高いという問題をかかえている．そこで，本稿では，あらかじめ解析対象のXML文書に頻出する構造を学習し，文書の解析時には学習した構造に対して予測的に解析処理を適用することで処理時間の短縮を図る高速化手法を提案する．本手法を適用することにより，解析対象のXML文書と類似の構造を持つXML文書を学習文書として与えた場合，SAXパーサの処理性能が，平均31%，最大67%向上することを確認した．

XML is widely used as a standard format for data exchange. XML has high flexibility of representation because of its adoption of a text format. However, this flexibility costs higher data processing overhead than ordinaly data formats. In this paper, we propose a high performance XML processing method using repetition pattern recognition of XML documents. In the method, first, training XML documents are pre-analyzed in order to detect frequently appearing constructs in the document. XML parser uses the result of the pre-analyzed to make its parsing faster with speculative input matching. The results of experiments shows that the introduced method improve the performance of XML parsing up to 67% (31% on average) compared with a conventional SAX parser under the condition that the target XML documents are similar to the preanalyzed XML documents.

Journal

情報処理学会論文誌

情報処理学会論文誌 49 (7), 2604-2613, 2008-07-15

東京 : 情報処理学会

Details 詳細情報について

CRID: 1050845762811395200

NII Article ID: 110007970149

NII Book ID: AN00116647

ISSN: 18827764; 18827837; 03875806

NDL BIB ID: 024269578

Web Site: http://id.nii.ac.jp/1001/00009517/; http://id.ndl.go.jp/bib/024269578; https://ndlsearch.ndl.go.jp/books/R000000004-I024269578

Text Lang: ja

Article Type: journal article

Data Source

IRDB
NDL
CiNii Articles

Export

繰返し構造認識によるXMLパーサ高速化技術

Bibliographic Information

Search this article

Abstract

Journal

Keywords

Details 詳細情報について

Export

Report a problem

繰返し構造認識によるXMLパーサ高速化技術

Bibliographic Information

Search this article

Abstract

Journal

Keywords

Details 詳細情報について

Export

Report a problem

Project list