6533b7d4fe1ef96bd12628ad

RESEARCH PRODUCT

An Efficient Cooperative Smearing Technique for Degraded Historical Documents Images Segmentation

Omar BoudraaWalid Khaled HidouciDominique Michelucci

subject

050101 languages & linguisticsComputer sciencemedia_common.quotation_subject02 engineering and technologyImage (mathematics)Interpretation (model theory)[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]0202 electrical engineering electronic engineering information engineering0501 psychology and cognitive sciencesSegmentationQuality (business)ComputingMilieux_MISCELLANEOUSmedia_commonbusiness.industrySmearing technique05 social sciencesPattern recognitionImage segmentationHybrid approachComputer Graphics and Computer-Aided DesignComputer Science Applications020201 artificial intelligence & image processingComputer Vision and Pattern RecognitionArtificial intelligencebusinessHistorical document

description

Segmentation is one of the critical steps in historical document image analysis systems that determines the quality of the search, understanding, recognition and interpretation processes. It allows isolating the objects to be considered and separating the regions of interest (paragraphs, lines, words and characters) from other entities (figures, graphs, tables, etc.). This stage follows the thresholding, which aims to improve the quality of the document and to extract its background from its foreground, also for detecting and correcting the skew that leads to redress the document. Here, a hybrid method is proposed in order to locate words and characters in both handwritten and printed documents. Numerical results prove the robustness and the high precision of our approach applied on old degraded document images over four common datasets, in which the pair (Recall, Precision) reaches approximately 97.7% and 97.9%.

10.1142/s0219467821500121https://hal.archives-ouvertes.fr/hal-03058799