Skeleton extraction: Comparison of five methods on the Arabic IFN/ENIT database

Abstract

Thinning “Skeletonization” is a very crucial stage in the Arabic Character Recognition (ACR) system. It simplifies the text shape and reduces the amount of data that needs to be handled and it is usually used as a pre-processing stage for recognition and storage systems. The skeleton of Arabic text can be used for: baseline detection, character segmentation, and features extraction, and ultimately supporting the classification. In this paper, five of the state of the art thinning algorithms are selected and implemented. The five algorithms are: SPTA, Zhang-Suen parallel thinning algorithm, Huang-Wan-Liu thinning algorithm, thinning and skeletonization based morphological operation algorithms. The five selected algorithms are applied on the IFN/ENIT dataset. The results obtained by the five methods are discussed and analyzed against the IFN/ENIT dataset based on preserving shape and the text connectivity, preventing spurious tails, maintaining one pixel width skeleton and avoiding the necking problem as well as running time efficiently. In addition to that some performance measurement for checking text connectivity, spurious tails and calculating the stroke thickness are proposed and carried out.

Topics

    14 Figures and Tables

    Download Full PDF Version (Non-Commercial Use)