Transfer Learning for Bibliographic Information Extraction
説明
This paper discusses the problems of analyzing title page layouts and extracting bibliographic information from academic papers. Information extraction is an important task for easily using digital libraries. Sequence analyzers are usually used to extract information from pages. Because we often receive new layouts and the layouts also usually change, it is necessary to have a machenism for self-trainning a new analyzer to achieve a good extraction accuracy. This also makes the management becomes easier. For example, when the new layout is inputed, There is a problem of how we can learn automatically and efficiently to create a new analyzer. This paper focuses on learning a new sequence analyzer automatically by using transfer learning approach. We evaluated the efficiency by testing three academic journals. The results show that the proposed method is effective to self-train a new sequence analyer.
収録刊行物
-
- Proceedings of the International Conference on Pattern Recognition Applications and Methods
-
Proceedings of the International Conference on Pattern Recognition Applications and Methods 374-379, 2015-01-01
SCITEPRESS - Science and and Technology Publications