背景領域の細線化に基づく古文書の文字切り出しと認識

梅田, 三千雄, 橋本, 智広

本論文では，古文書文字列を対象として，古文書特有の文字の接触や食い込みに対処するために，背景領域に着目した文字切り出し手法を提案する．まず，対象文字列とその鏡像パターンを結合した合成パターンの背景領域に対して細線化処理を施し，基本パターンを生成する．次に，基本パターンに対してラベリング処理によりパターン内で区分けされている各々の領域を求め，これらに対し個別文字認識する．この認識結果から，文字領域と判断できない領域を検出し，認識処理を援用した領域確定処理を適用する．領域確定処理では，2 段階で分割経路を変更し，隣接する複数の領域を組み合わせながら認識処理を繰り返すことで，最適な文字領域を求める．そして，得られた各領域から抽出した特徴量を自己想起型ニューラルネットワークに入力することで認識結果を得る．「天保郷帳」を例とした615 個の文字列に対する認識実験により，本手法によって個別文字認識率は98.52%，文字列認識率は90.24%が得られ，文字部の画素に着目した従来手法と比較して，その有効性が確認された．

This paper proposes a character segmentation and recognition method of ancient documents. The segmentation method is based on thinning the background region of a compound pattern in order to cope with the cursive scripts and the mutual encroachment of characters which are peculiar to the ancient documents. The compound pattern is generated from the original characters string pattern and two mirror patterns. In the segmentation process,candidate dividing points are extracted from the thinning pattern and the segmented regions are gradually determined by using a recognition processing.In the recognition process, autoasso ciative neural networks are used for flexibility and efficiency. From the recognition experiment applied to 615 character strings which appear in the local Tenpo era records of rice crops, the correct character recognition rate of 98.52%and the correct string recognition rate of 90.24% were obtained by the proposed method. Therefore it is clarified that the method is effective in the recognition of characters such as ancient documents.

背景領域の細線化に基づく古文書の文字切り出しと認識

Bibliographic Information

Search this article

Description

Journal

Citations (1)*help

References(15)*help

Related Projects

Keywords

Details 詳細情報について

Export

Report a problem