Harmonising and linking biomedical and clinical data across disparate data archives to enable integrative cross-biobank research

6533b7d8fe1ef96bd1269ad8

RESEARCH PRODUCT

Harmonising and linking biomedical and clinical data across disparate data archives to enable integrative cross-biobank research

Tõnu Esko Ola Spjuth Ola Spjuth Huei-yi Shen Eco J. C. De Geus Mats-ake Persson Andres Metspalu Jon Heggland Leif Groop Leif Groop Sandra Ose Isabel Fortier Johan Rung Johan Rung Claes Ladenvall Dorret I. Boomsma Cornelia M. Van Duijn Samuli Ripatti Samuli Ripatti Linda Zaharenko Arnulf Langhammer Jouke-jan Hottenga Annette Peters Janis Klovins Christian Gieger Jennifer R. Harris Joern Dietrich Kristian Hveem Inga Prokopenko Inga Prokopenko Inga Prokopenko Juni Palmgren Juni Palmgren Melanie Waldenberger Mark I. Mccarthy Mark I. Mccarthy Mark I. Mccarthy Jani Heikkinen Nancy L. Pedersen Janina S. Ried Janna Hastings Jan-eric Litton Juha Karvanen Juha Karvanen Gonneke Willemsen Maria Krestyaninova

subject

0301 basic medicine Netherlands Twin Register (NTR)Databases Factual Computer science Information Storage and Retrieval Sample (statistics)Ontology (information science)Endocrinology and Diabetes Bioinformatics computer.software_genre data archives Article 03 medical and health sciences SDG 17 - Partnerships for the Goals SDG 3 - Good Health and Well-being Genetics /dk/atira/pure/keywords/cohort_studies/netherlands_twin_register_ntr_Use case biomedical data Genetics (clinical)Biological Specimen Banks Genetics & Heredity 0604 Genetics Bioinformatics (Computational Biology)ta112 ta1184 /dk/atira/pure/sustainabledevelopmentgoals/partnerships Data science Biobank 3. Good health cross-biotank research 030104 developmental biology Project planning Exchange of information Disparate system Privacy Bioinformatik (beräkningsbiologi)/dk/atira/pure/sustainabledevelopmentgoals/good_health_and_well_being clinical data computer Data integration

description

A wealth of biospecimen samples are stored in modern globally distributed biobanks. Biomedical researchers worldwide need to be able to combine the available resources to improve the power of large-scale studies. A prerequisite for this effort is to be able to search and access phenotypic, clinical and other information about samples that are currently stored at biobanks in an integrated manner. However, privacy issues together with heterogeneous information systems and the lack of agreed-upon vocabularies have made specimen searching across multiple biobanks extremely challenging. We describe three case studies where we have linked samples and sample descriptions in order to facilitate global searching of available samples for research. The use cases include the ENGAGE (European Network for Genetic and Genomic Epidemiology) consortium comprising at least 39 cohorts, the SUMMIT (surrogate markers for micro- and macro-vascular hard endpoints for innovative diabetes tools) consortium and a pilot for data integration between a Swedish clinical health registry and a biobank. We used the Sample avAILability (SAIL) method for data linking: first, created harmonised variables and then annotated and made searchable information on the number of specimens available in individual biobanks for various phenotypic categories. By operating on this categorised availability data we sidestep many obstacles related to privacy that arise when handling real values and show that harmonised and annotated records about data availability across disparate biomedical archives provide a key methodological advance in pre-analysis exchange of information between biobanks, that is, during the project planning phase. This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

year	journal	country	edition	language
2015-08-26	European Journal of Human Genetics

10.1038/ejhg.2015.165 http://juuli.fi/Record/0278751416