6533b85efe1ef96bd12c0880
RESEARCH PRODUCT
EHRtemporalVariability: delineating temporal dataset shifts in electronic health records
Alba Gutiérrez-sacristánIsaac S. KohaneJuan M. García-gómezCarlos SáezCarlos SáezPaul Avillachsubject
0303 health scienceseducation.field_of_studybusiness.industryComputer sciencePopulationReuseHealth recordsData science3. Good health03 medical and health sciencesIdentification (information)0302 clinical medicineSoftwareData qualityRange (statistics)030212 general & internal medicineUser interfacebusinesseducation030304 developmental biologydescription
AbstractBackgroundTemporal variability in healthcare processes or protocols is intrinsic to medicine. Such variability can potentially introduce dataset shifts, a data quality issue when reusing electronic health records (EHRs) for secondary purposes. Temporal dataset shifts can present as trends, abrupt or seasonal changes in the statistical distributions of data over time, being particularly complex to address in multi-modal and highly coded data. These changes, if not delineated, can harm population and data-driven research, such as machine learning. Given that biomedical research repositories are increasingly being populated with large historical data from EHRs, there is a need for specific software methods to help delineate temporal dataset shifts to ensure reliable data reuse.FindingsEHRtemporalVariability is an Open Source R-package and Shiny-app designed to explore and identify temporal dataset shifts. EHRtemporalVariability estimates the statistical distributions of coded and numerical data over time, projects their temporal-evolution through non-parametric Information Geometric Temporal plots, and enables the exploration of changes in variables through Data Temporal Heatmaps. We demonstrate the capability of EHRtemporalVariability to delineate dataset shifts in three impact case studies, one of them available for reproducibility.ConclusionsEHRtemporalVariability enables the exploration and identification of dataset shifts, contributing to broadly examine and repurpose large, longitudinal datasets. Our goal is to help ensure reliable data reuse to a wide range of biomedical data users. EHRtemporalVariability is suited to technical users programmatically using the R-package and to those users not familiar with programming using the Shiny user interface.Availabilityhttps://github.com/hms-dbmi/EHRtemporalVariability/ Reproducible vignette: https://cran.r-project.org/web/packages/EHRtemporalVariability/vignettes/EHRtemporalVariability.html On-line demo: http://ehrtemporalvariability.upv.es/
year | journal | country | edition | language |
---|---|---|---|---|
2020-04-11 |