Character Soptting of Historical Documents Using Pattern Segmentation Aided by Recognition Processing

Bibliographic Information

Other Title
  • 認識処理を援用した文字切り出しによる古文書のキャラクタスポッティング
  • ニンシキ ショリ オ エンヨウ シタ モジ キリダシ ニ ヨル コモンジョ ノ キャラクタスポッティング

Search this article


This paper proposes a character segmentation and spotting method of historical documents. In the segmentation method, the result of character recognition process is utilized to cope with the cursive scripts and the mutual encroachment of characters which are peculiar to the historical documents. In the spotting method, the previously designated characters are only extracted from the characters string. As an early segmentation, the characters string pattern is divided into the same connected component by using the labelling processing. The area composed of the same component is surrounded with a rectangle and each character pattern is segmented each other by using the shape of rectangle such as height and width. Next, the individual character recognition is applied to the segmented pattern. From the recognition result, the rectangle failed in the segmentation is picked up and the resegmentation is applied to the string contains this rectangle. Therefore, it is expected that the string is divided at the best position. On the other hand the neural network which corresponds to the previously designated character is prepared. The error between input and output of the network applied to the segmented pattern is calculated and the pattern which satisfies the condition is extracted as a spotting result. From the extraction experiment applied to 615 characters strings, the correct spotting rate of 94.22% was obtained to 5 designated characters by using the resegmentation process, but the rate was 87.58% without the resegmentation process.


Citations (3)*help

See more


See more

Related Projects

See more

Details 詳細情報について

Report a problem

Back to top