Labor saving for reprinting Japanese rare classical books : The development of the new method for OCR technology including kana and kanji characters in cursive style

Bibliographic Information

Other Title
  • 古典籍翻刻の省力化:くずし字を含む新方式OCR技術の開発
  • コテンセキ ホンコク ノ ショウリョクカ : クズシ ジ オ フクム シン ホウシキ OCR ギジュツ ノ カイハツ

Search this article

Abstract

Most modern Japanese people can't read Japanese rare classical books written in kana and kanji characters in cursive style, and felt it more difficult to understand contents of a large quantity of existing them. Therefore we developed a new method OCR for the purpose of the labor saving for a heavy reprint load, and demonstrated that it is possible to make the automatic text data having more than 80% precision under a constant condition as a result of principle validation tests for their books including kana and kanji characters in cursive style. In the new method OCR, character images were extracted with position information and a ideographic variation database was constructed, from which the character codes of the rare classical books for reprinting are identified by the similar kanji retrieval method. In addition, we make an effort to reduce loads to reprint generally by the working process design combined automatic processing with manpower without the full automation. We report the structure of the new method OCR and the present reprint situation using this.

Journal

Details 詳細情報について

Report a problem

Back to top