0000000000014099

AUTHOR

Joan Andreu Sánchez

showing 2 related works from this author

The HisClima database: historical weather logs for automatic transcription and information extraction

2021

Knowing the weather and atmospheric conditions from the past can help weather researchers to generate models like the ones used to predict how weather conditions are likely to change as global temperatures continue to rise. Many historical weather records are available from the past registered on a systemic basis. Historical weather logs were registered in ships, when they were on the high seas, recording daily weather conditions such as: wind speed, temperature, coordinates, etc. These historical documents represent an important source of knowledge with valuable information to extract climatic information of several centuries ago. This paper presents a database for researching about the ca…

DatabaseComputer science05 social sciences050301 education02 engineering and technologyText recognitionAtmospheric modelcomputer.software_genreWind speedInformation extraction0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processingTranscription (software)Baseline (configuration management)0503 educationRelevant informationcomputer2020 25th International Conference on Pattern Recognition (ICPR)
researchProduct

Reducing the Human Effort in Text Line Segmentation for Historical Documents

2021

Labeling the layout in historical documents for preparing training data for machine learning techniques is an arduous task that requires great human effort. A draft of the layout can be obtained by using a document layout analysis (DLA) system that later can be corrected by the user with less effort than doing it from scratch. We research in this paper an iterative process in which the user only supervises and corrects the given draft for the pages automatically selected by the DLA system with the aim of reducing the required human effort. The results obtained show that similar DLA quality can be achieved by reducing the number of pages that the user has to annote and that the accumulated h…

Iterative and incremental developmentTraining setInformation retrievalComputer sciencemedia_common.quotation_subjectQuality (business)SegmentationLine (text file)Document layout analysisHistorical documentmedia_commonTask (project management)
researchProduct