- 【Updated on May 12, 2025】 Integration of CiNii Dissertations and CiNii Books into CiNii Research
- Trial version of CiNii Research Knowledge Graph Search feature is available on CiNii Labs
- 【Updated on June 30, 2025】Suspension and deletion of data provided by Nikkei BP
- Regarding the recording of “Research Data” and “Evidence Data”
XML Documents Searching Combining Structure and Keywords Similarities
-
- Apichaya Auvattanasombat
- Tokyo Institute of Technology | Chulalongkorn University
-
- Yousuke Watanabe
- Tokyo Institute of Technology
-
- Haruo Yokota
- Tokyo Institute of Technology
Search this article
Description
In recent years, XML has been increasingly become an emerging standard and widely used in many applications. For example, office documents which are more and more popular used at this time, are also stored in multiple parts of XML archive formats. It is known that the structure and content of XML files play different roles depending on kind of documents. Therefore, achievement similarity search of an XML file should base on both structure and content. In previous work, LAX+ is an algorithm for reckoning a similarity value from structure and contents of XML files in the office documents. However, since LAX+ used exactly matching method between corresponding leaves, similar words in the leaf-nodes are considered as different. To solve the problem, we propose to combine LAX+ with keyword similarity in leaf-nodes. We use docx, xlsx and pptx file formats as experimental data set. The evaluation shows that our approach can be used to improve the precision and recall.
Journal
-
- IPSJ SIG Notes
-
IPSJ SIG Notes 2013 (14), 1-6, 2013-07-15
Information Processing Society of Japan (IPSJ)
- Tweet
Keywords
Details 詳細情報について
-
- CRID
- 1571417127858836224
-
- NII Article ID
- 110009585856
-
- NII Book ID
- AN10114171
-
- Text Lang
- en
-
- Data Source
-
- CiNii Articles