Character Soptting of Historical Documents Using Pattern Segmentation Aided by Recognition Processing
-
- Umeda Michio
- Osaka Electro-Communication University
-
- Hashimoto Tomohiro
- Osaka Electro-Communication University
Bibliographic Information
- Other Title
-
- 認識処理を援用した文字切り出しによる古文書のキャラクタスポッティング
- ニンシキ ショリ オ エンヨウ シタ モジ キリダシ ニ ヨル コモンジョ ノ キャラクタスポッティング
Search this article
Description
This paper proposes a character segmentation and spotting method of historical documents. In the segmentation method, the result of character recognition process is utilized to cope with the cursive scripts and the mutual encroachment of characters which are peculiar to the historical documents. In the spotting method, the previously designated characters are only extracted from the characters string. As an early segmentation, the characters string pattern is divided into the same connected component by using the labelling processing. The area composed of the same component is surrounded with a rectangle and each character pattern is segmented each other by using the shape of rectangle such as height and width. Next, the individual character recognition is applied to the segmented pattern. From the recognition result, the rectangle failed in the segmentation is picked up and the resegmentation is applied to the string contains this rectangle. Therefore, it is expected that the string is divided at the best position. On the other hand the neural network which corresponds to the previously designated character is prepared. The error between input and output of the network applied to the segmented pattern is calculated and the pattern which satisfies the condition is extracted as a spotting result. From the extraction experiment applied to 615 characters strings, the correct spotting rate of 94.22% was obtained to 5 designated characters by using the resegmentation process, but the rate was 87.58% without the resegmentation process.
Journal
-
- IEEJ Transactions on Electronics, Information and Systems
-
IEEJ Transactions on Electronics, Information and Systems 122 (11), 1876-1884, 2002
The Institute of Electrical Engineers of Japan
- Tweet
Keywords
Details 詳細情報について
-
- CRID
- 1390282679587868928
-
- NII Article ID
- 10010454224
- 130006845876
- 10011461855
-
- NII Book ID
- AN10065950
-
- ISSN
- 13488155
- 03854221
-
- NDL BIB ID
- 6345311
-
- Data Source
-
- JaLC
- NDL
- Crossref
- CiNii Articles
- KAKEN
-
- Abstract License Flag
- Disallowed