6533b82afe1ef96bd128ba22
RESEARCH PRODUCT
Integrative analysis of structural variations using short-reads and linked-reads yields highly specific and sensitive predictions.
Riccha SethiMartin SuchanMartin LöwerDavid WeberJulia BeckerJos De GraafUgur Sahinsubject
0301 basic medicineFalse discovery rateComputer scienceArtificial Gene Amplification and ExtensionPolymerase Chain ReactionDatabase and Informatics MethodsSequencing techniques0302 clinical medicineBreast TumorsBasic Cancer ResearchMedicine and Health SciencesDNA sequencingBiology (General)EcologyHigh-Throughput Nucleotide SequencingGenomicsDNA Neoplasm3. Good healthIdentification (information)OncologyComputational Theory and MathematicsModeling and SimulationMCF-7 CellsFemaleSequence AnalysisResearch ArticleBioinformaticsQH301-705.5Breast NeoplasmsGenomicsComputational biologyResearch and Analysis MethodsHuman Genomics03 medical and health sciencesCellular and Molecular NeuroscienceCancer GenomicsGenomic MedicineBreast CancerGeneticsDNA Barcoding TaxonomicHumansMolecular Biology TechniquesMolecular BiologyEcology Evolution Behavior and SystematicsWhole genome sequencingLinkage (software)Whole Genome SequencingGenome HumanDideoxy DNA sequencingGenetic Diseases InbornCancers and NeoplasmsBiology and Life SciencesComputational BiologyStatistical modelSequence Analysis DNARepetitive RegionsLogistic Models030104 developmental biologyGenomic Structural VariationHuman genomeSequence Alignment030217 neurology & neurosurgerydescription
Genetic diseases are driven by aberrations of the human genome. Identification of such aberrations including structural variations (SVs) is key to our understanding. Conventional short-reads whole genome sequencing (cWGS) can identify SVs to base-pair resolution, but utilizes only short-range information and suffers from high false discovery rate (FDR). Linked-reads sequencing (10XWGS) utilizes long-range information by linkage of short-reads originating from the same large DNA molecule. This can mitigate alignment-based artefacts especially in repetitive regions and should enable better prediction of SVs. However, an unbiased evaluation of this technology is not available. In this study, we performed a comprehensive analysis of different types and sizes of SVs predicted by both the technologies and validated with an independent PCR based approach. The SVs commonly identified by both the technologies were highly specific, while validation rate dropped for uncommon events. A particularly high FDR was observed for SVs only found by 10XWGS. To improve FDR and sensitivity, statistical models for both the technologies were trained. Using our approach, we characterized SVs from the MCF7 cell line and a primary breast cancer tumor with high precision. This approach improves SV prediction and can therefore help in understanding the underlying genetics in various diseases.
year | journal | country | edition | language |
---|---|---|---|---|
2020-11-01 | PLoS Computational Biology |