6533b7ddfe1ef96bd1273ddb

RESEARCH PRODUCT

An Improved Skew Angle Detection and Correction Technique for Historical Scanned Documents Using Morphological Skeleton and Progressive Probabilistic Hough Transform

Dominique MichelucciWalid Khaled HidouciOmar Boudraa

subject

Computer science[SPI] Engineering Sciences [physics]ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISIONDocument image analysis02 engineering and technology01 natural sciencesElectronic mail[SPI]Engineering Sciences [physics]Robustness (computer science)HistogramOrientation0103 physical sciencesMorphological skeleton0202 electrical engineering electronic engineering information engineering010306 general physicsMorphological SkeletonProbabilistic hough transformPixelbusiness.industrySkewProbabilistic logicPattern recognitionProgressive Probabilistic Hough Transform[SPI.TRON] Engineering Sciences [physics]/Electronics[SPI.TRON]Engineering Sciences [physics]/ElectronicsSkew correctionAlgorithmImages020201 artificial intelligence & image processingArtificial intelligencebusinessSkew angle detection

description

International audience; Skew detection is a crucial step for document analysis systems. Indeed, it represents one of the basic challenges, especially in case of historical documents analysis. In this paper, we propose a novel robust skew angle detection and correction technique. Morphological Skeleton is introduced to significantly reduce the amount of data to treat by removing the redundant pixels and keeping only the central curves of the image components. The proposed method then uses Progressive Probabilistic Hough Transform (PPHT) to identify image lines. A special procedure is finally applied in order to estimate the global skew angle of the document image from these detected lines. Experimental results prove the accuracy and the efficiency of our approach on skew angle detection over three popular datasets containing various types of document of different linguistic writings (such as Chinese, English and Greek) and diverse styles (multi-columns, with figures and tables, vertical or horizontal orientations).

https://u-bourgogne.hal.science/hal-01858577