6533b7d4fe1ef96bd12630d4
RESEARCH PRODUCT
Reducing the Human Effort in Text Line Segmentation for Historical Documents
Joan Andreu SánchezVerónica RomeroEmilio GranellLorenzo Quiróssubject
Iterative and incremental developmentTraining setInformation retrievalComputer sciencemedia_common.quotation_subjectQuality (business)SegmentationLine (text file)Document layout analysisHistorical documentmedia_commonTask (project management)description
Labeling the layout in historical documents for preparing training data for machine learning techniques is an arduous task that requires great human effort. A draft of the layout can be obtained by using a document layout analysis (DLA) system that later can be corrected by the user with less effort than doing it from scratch. We research in this paper an iterative process in which the user only supervises and corrects the given draft for the pages automatically selected by the DLA system with the aim of reducing the required human effort. The results obtained show that similar DLA quality can be achieved by reducing the number of pages that the user has to annote and that the accumulated human effort required to obtain the layout of the pages used to train the models can be reduced more than \(95\%\).
year | journal | country | edition | language |
---|---|---|---|---|
2021-01-01 |