0000000001301817

AUTHOR

Andreas Hildebrandt

showing 44 related works from this author

Next-generation sequencing: big data meets high performance computing

2017

The progress of next-generation sequencing has a major impact on medical and genomic research. This high-throughput technology can now produce billions of short DNA or RNA fragments in excess of a few terabytes of data in a single run. This leads to massive datasets used by a wide range of applications including personalized cancer treatment and precision medicine. In addition to the hugely increased throughput, the cost of using high-throughput technologies has been dramatically decreasing. A low sequencing cost of around US$1000 per genome has now rendered large population-scale projects feasible. However, to make effective use of the produced data, the design of big data algorithms and t…

0301 basic medicineComputer scienceDistributed computingGenomic researchBig dataTerabyteComputing MethodologiesDNA sequencing03 medical and health sciences0302 clinical medicineDatabases GeneticDrug DiscoveryHumansThroughput (business)PharmacologyGenomebusiness.industryHigh-Throughput Nucleotide SequencingGenomicsSequence Analysis DNAPrecision medicineSupercomputerData scienceCancer treatment030104 developmental biology030220 oncology & carcinogenesisbusinessAlgorithmsDrug Discovery Today
researchProduct

DrugTargetInspector: An assistance tool for patient treatment stratification

2016

Cancer is a large class of diseases that are characterized by a common set of features, known as the Hallmarks of cancer. One of these hallmarks is the acquisition of genome instability and mutations. This, combined with high proliferation rates and failure of repair mechanisms, leads to clonal evolution as well as a high genotypic and phenotypic diversity within the tumor. As a consequence, treatment and therapy of malignant tumors is still a grand challenge. Moreover, under selective pressure, e.g., caused by chemotherapy, resistant subpopulations can emerge that then may lead to relapse. In order to minimize the risk of developing multidrug-resistant tumor cell populations, optimal (comb…

0301 basic medicineGenome instabilityCancer ResearchCancerGenomicsBiologyPrecision medicineBioinformaticsOmicsmedicine.diseasePhenotypeSomatic evolution in cancer03 medical and health sciences030104 developmental biologyThe Hallmarks of CancerOncologymedicineInternational Journal of Cancer
researchProduct

Machine learning of reverse transcription signatures of variegated polymerases allows mapping and discrimination of methylated purines in limited tra…

2020

AbstractReverse transcription (RT) of RNA templates containing RNA modifications leads to synthesis of cDNA containing information on the modification in the form of misincorporation, arrest, or nucleotide skipping events. A compilation of such events from multiple cDNAs represents an RT-signature that is typical for a given modification, but, as we show here, depends also on the reverse transcriptase enzyme. A comparison of 13 different enzymes revealed a range of RT-signatures, with individual enzymes exhibiting average arrest rates between 20 and 75%, as well as average misincorporation rates between 30 and 75% in the read-through cDNA. Using RT-signatures from individual enzymes to trai…

AdenosineAcademicSubjects/SCI00010Machine learningcomputer.software_genre[SDV.BBM.BM] Life Sciences [q-bio]/Biochemistry Molecular Biology/Molecular biologyMethylationMachine Learning03 medical and health sciences0302 clinical medicineComplementary DNA[SDV.BBM.GTP]Life Sciences [q-bio]/Biochemistry Molecular Biology/Genomics [q-bio.GN]GeneticsMolecular BiologyPolymerase030304 developmental biologychemistry.chemical_classification0303 health sciencesOligoribonucleotidesGuanosinebiologybusiness.industryRNA-Directed DNA PolymeraseRNARNA-Directed DNA Polymerase[SDV.BBM.BM]Life Sciences [q-bio]/Biochemistry Molecular Biology/Molecular biologyReverse TranscriptionMethylationReverse transcriptaseEnzymechemistryTransfer RNAbiology.protein[SDV.BBM.GTP] Life Sciences [q-bio]/Biochemistry Molecular Biology/Genomics [q-bio.GN]Artificial intelligenceTranscriptomebusinesscomputer030217 neurology & neurosurgery
researchProduct

A fast solver for nonlocal electrostatic theory in biomolecular science and engineering

2011

Biological molecules perform their functions surrounded by water and mobile ions, which strongly influence molecular structure and behavior. The electrostatic interactions between a molecule and solvent are particularly difficult to model theoretically, due to the forces' long range and the collective response of many thousands of solvent molecules. The dominant modeling approaches represent the two extremes of the trade-off between molecular realism and computational efficiency: all-atom molecular dynamics in explicit solvent, and macroscopic continuum theory (the Poisson or Poisson--Boltzmann equation). We present the first fast-solver implementation of an advanced nonlocal continuum theo…

PhysicsMolecular dynamicsReciprocity (electromagnetism)Molecular biophysicsNanofluidicsStatistical physicsPoisson's equationSolverPoisson–Boltzmann equationBoltzmann equationComputational physicsProceedings of the 48th Design Automation Conference
researchProduct

CUDA-enabled hierarchical ward clustering of protein structures based on the nearest neighbour chain algorithm

2015

Clustering of molecular systems according to their three-dimensional structure is an important step in many bioinformatics workflows. In applications such as docking or structure prediction, many algorithms initially generate large numbers of candidate poses (or decoys), which are then clustered to allow for subsequent computationally expensive evaluations of reasonable representatives. Since the number of such candidates can easily range from thousands to millions, performing the clustering on standard central processing units (CPUs) is highly time consuming. In this paper, we analyse and evaluate different approaches to parallelize the nearest neighbour chain algorithm to perform hierarc…

0301 basic medicineSpeedupComputer scienceCorrelation clusteringParallel computingTheoretical Computer Science03 medical and health sciencesCUDA030104 developmental biologyHardware and ArchitectureCluster analysisAlgorithmSoftwareWard's methodThe International Journal of High Performance Computing Applications
researchProduct

Graph Rewriting Based Search for Molecular Structures: Definitions, Algorithms, Hardness

2018

We define a graph rewriting system that is easily understandable by humans, but rich enough to allow very general queries to molecule databases. It is based on the substitution of a single node in a node- and edge-labeled graph by an arbitrary graph, explicitly assigning new endpoints to the edges incident to the replaced node. For these graph rewriting systems, we are interested in the subgraph-matching problem. We show that the problem is NP-complete, even on graphs that are stars. As a positive result, we give an algorithm which is polynomial if both rules and query graph have bounded degree and bounded cut size. We demonstrate that molecular graphs of practically relevant molecules in d…

0301 basic medicine010404 medicinal & biomolecular chemistry03 medical and health sciencesSingle nodeGraph rewriting030104 developmental biologyComputer scienceBounded function01 natural sciencesAlgorithmGraphMathematicsofComputing_DISCRETEMATHEMATICS0104 chemical sciences
researchProduct

ballaxy: web services for structural bioinformatics.

2014

Abstract Motivation: Web-based workflow systems have gained considerable momentum in sequence-oriented bioinformatics. In structural bioinformatics, however, such systems are still relatively rare; while commercial stand-alone workflow applications are common in the pharmaceutical industry, academic researchers often still rely on command-line scripting to glue individual tools together. Results: In this work, we address the problem of building a web-based system for workflows in structural bioinformatics. For the underlying molecular modelling engine, we opted for the BALL framework because of its extensive and well-tested functionality in the field of structural bioinformatics. The large …

Statistics and ProbabilityModels MolecularComputer sciencecomputer.software_genreBiochemistryWorkflowStructural bioinformaticsUser-Computer InterfaceHumansMolecular Biologybusiness.industryComputational BiologySequence Analysis DNAData structureComputer Science ApplicationsVisualizationSystems IntegrationComputational MathematicsWorkflowComputational Theory and MathematicsScripting languageWeb serviceSoftware engineeringbusinesscomputerAlgorithmsSoftwareBioinformatics (Oxford, England)
researchProduct

MetaCache: context-aware classification of metagenomic reads using minhashing.

2017

Abstract Motivation Metagenomic shotgun sequencing studies are becoming increasingly popular with prominent examples including the sequencing of human microbiomes and diverse environments. A fundamental computational problem in this context is read classification, i.e. the assignment of each read to a taxonomic label. Due to the large number of reads produced by modern high-throughput sequencing technologies and the rapidly increasing number of available reference genomes corresponding software tools suffer from either long runtimes, large memory requirements or low accuracy. Results We introduce MetaCache—a novel software for read classification using the big data technique minhashing. Our…

0301 basic medicineStatistics and ProbabilityComputer scienceSequence analysisContext (language use)BiochemistryGenome03 medical and health scienceschemistry.chemical_compound0302 clinical medicineRefSeqHumansMolecular BiologyInformation retrievalShotgun sequencingHigh-Throughput Nucleotide SequencingSequence Analysis DNAComputer Science ApplicationsComputational Mathematics030104 developmental biologyComputational Theory and MathematicschemistryMetagenomicsMetagenomics030217 neurology & neurosurgeryDNAAlgorithmsSoftwareReference genomeBioinformatics (Oxford, England)
researchProduct

A novel automated segmentation method for retinal layers in OCT images proves retinal degeneration after optic neuritis.

2015

Aim The evaluation of inner retinal layer thickness can serve as a direct biomarker for monitoring the course of inflammatory diseases of the central nervous system such as multiple sclerosis (MS). Using optical coherence tomography (OCT), thinning of the retinal nerve fibre layer and changes in deeper retinal layers have been observed in patients with MS. Here, we first compare a novel method for automated segmentation of OCT images with manual segmentation using two cohorts of patients with MS. Using this method, we also aimed to reproduce previous findings showing retinal degeneration following optic neuritis (ON) in MS. Methods Based on a 5×5 expansion of the Prewitt operator to efficie…

Retinal degenerationAdultMalePathologymedicine.medical_specialtyMultiple SclerosisOptic Neuritisgenetic structuresDiagnostic Techniques Ophthalmological03 medical and health sciencesCellular and Molecular Neurosciencechemistry.chemical_compoundYoung Adult0302 clinical medicineNerve FibersOptical coherence tomographyOphthalmologyMedicineHumansSegmentationOptic neuritisGanglion cell layerRetinamedicine.diagnostic_testbusiness.industryRetinal DegenerationReproducibility of ResultsRetinalMiddle Agedmedicine.diseaseeye diseasesSensory SystemsOphthalmologymedicine.anatomical_structurechemistry030221 ophthalmology & optometryOptic nerveFemalesense organsbusiness030217 neurology & neurosurgeryAlgorithmsBiomarkersTomography Optical CoherenceRetinal NeuronsThe British journal of ophthalmology
researchProduct

Deep learning in next-generation sequencing

2020

Highlights • Machine learning increasingly important for NGS. • Deep learning can improve many NGS applications.

0301 basic medicineBiomedical ResearchComputer scienceContext (language use)ComputerApplications_COMPUTERSINOTHERSYSTEMSReviewMachine learningcomputer.software_genre03 medical and health sciences0302 clinical medicineDeep LearningGene to ScreenDrug DiscoveryHumansPharmacologyFeature detection (web development)Network architectureArtificial neural networkbusiness.industryDeep learningHigh-Throughput Nucleotide SequencingMedical research030104 developmental biologyMetagenomics030220 oncology & carcinogenesisUnsupervised learningArtificial intelligenceMetagenomicsNeural Networks ComputerbusinesscomputerDrug Discovery Today
researchProduct

Learning Molecular Classes from Small Numbers of Positive Examples Using Graph Grammars

2021

We consider the following problem: A researcher identified a small number of molecules with a certain property of interest and now wants to find further molecules sharing this property in a database. This can be described as learning molecular classes from small numbers of positive examples. In this work, we propose a method that is based on learning a graph grammar for the molecular class. We consider the type of graph grammars proposed by Althaus et al. [2], as it can be easily interpreted and allows relatively efficient queries. We identify rules that are frequently encountered in the positive examples and use these to construct a graph grammar. We then classify a molecule as being conta…

Class (set theory)Property (philosophy)Theoretical computer scienceGrammarRule-based machine translationComputer scienceSmall numbermedia_common.quotation_subjectGraph (abstract data type)Construct (python library)Type (model theory)media_common
researchProduct

Evaluating the microscopic effect of brushing stone tools as a cleaning procedure

2020

Cleaning stone tool surfaces is a common procedure in lithic studies. The first step widely applied at any archeological site (and/or at field laboratories) is the gross removal of sediment from the surfaces of artifacts. Lithic surface alterations due to mechanical action applied in wet or dry cleaning regimes have never been examined at a microscopic scale. This could have important implications in traceology, as any modern surface modifications inflicted on archeological artifacts might compromise their functional interpretations. The current trend toward quantification of use-wear traces makes the testing even more important, as even slight, apparently invisible surface alterations migh…

Stone toolbepress|Social and Behavioral Sciences|AnthropologyBrushing010506 paleontologyCleaning protocolsSocArXiv|Social and Behavioral Sciences|AnthropologyDry cleaningengineering.material010502 geochemistry & geophysicsUse-wear analysis01 natural sciencesMicroscopic scaleSocArXiv|Social and Behavioral Sciences|Anthropology|Archaeological AnthropologyStone toolsConfocal microscopyMining engineeringengineeringSurface roughnessbepress|Social and Behavioral Sciences|Anthropology|Archaeological Anthropologybepress|Social and Behavioral SciencesSocArXiv|Social and Behavioral SciencesControlled experimentGeology0105 earth and related environmental sciencesEarth-Surface Processes
researchProduct

Instruction of haematopoietic lineage choices, evolution of transcriptional landscapes and cancer stem cell hierarchies derived from an AML1-ETO mous…

2013

The t(8;21) chromosomal translocation activates aberrant expression of the AML1-ETO (AE) fusion protein and is commonly associated with core binding factor acute myeloid leukaemia (CBF AML). Combining a conditional mouse model that closely resembles the slow evolution and the mosaic AE expression pattern of human t(8;21) CBF AML with global transcriptome sequencing, we find that disease progression was characterized by two principal pathogenic mechanisms. Initially, AE expression modified the lineage potential of haematopoietic stem cells (HSCs), resulting in the selective expansion of the myeloid compartment at the expense of normal erythro- and lymphopoiesis. This lineage skewing was foll…

cancer stem cellsCancer stem cells; Core binding factor acute myeloid leukaemia; Preclinical mouse model; Therapy target validation; Whole transcriptome sequencingMyeloidtherapy target validationOncogene Proteins FusionCloseupsBiologyGranulocyte-Macrophage Progenitor CellsTranslocation Geneticwhole transcriptome sequencingImmunophenotypingMiceGranulocyte-Macrophage Progenitor CellsCancer stem cellhemic and lymphatic diseasesmedicineAML1-ETOAnimalsCell Lineageacute myeloid leukaemiaLymphopoiesisProgenitor cellt(8;21)Research Articlespreclinical mouse modelGeneticsRegulation of gene expressionAntibiotics AntineoplasticSequence Analysis RNAcore binding factor acute myeloid leukaemiainducible mouse-modelHematopoietic Stem CellsMice Inbred C57BLDisease Models AnimalLeukemia Myeloid AcuteHaematopoiesisPhenotypemedicine.anatomical_structureGene Expression RegulationDoxorubicinCancer researchNeoplastic Stem CellsMolecular MedicineStem cell
researchProduct

CoverageAnalyzer (CAn): A Tool for Inspection of Modification Signatures in RNA Sequencing Profiles

2016

Combination of reverse transcription (RT) and deep sequencing has emerged as a powerful instrument for the detection of RNA modifications, a field that has seen a recent surge in activity because of its importance in gene regulation. Recent studies yielded high-resolution RT signatures of modified ribonucleotides relying on both sequence-dependent mismatch patterns and reverse transcription arrests. Common alignment viewers lack specialized functionality, such as filtering, tailored visualization, image export and differential analysis. Consequently, the community will profit from a platform seamlessly connecting detailed visual inspection of RT signatures and automated screening for modifi…

0301 basic medicineRNA modifications; reverse transcription; reverse transcription (RT) signature; RNA sequencing (RNA-Seq); Next-Generation Sequencing (NGS); candidate screening; alignment viewerNext-Generation Sequencing (NGS)lcsh:QR1-502[ SDV.BBM.BM ] Life Sciences [q-bio]/Biochemistry Molecular Biology/Molecular biologyBiologycomputer.software_genre01 natural sciencesBiochemistryField (computer science)Differential analysisDeep sequencinglcsh:MicrobiologyArticleWorld Wide Web03 medical and health sciencesUser-Computer InterfaceRNA modificationsRNA sequencing (RNA-Seq)[SDV.BBM.GTP]Life Sciences [q-bio]/Biochemistry Molecular Biology/Genomics [q-bio.GN]candidate screeningMolecular BiologyComputingMilieux_MISCELLANEOUS010405 organic chemistrySequence Analysis RNAGene Expression ProfilingRNAComputational BiologyHigh-Throughput Nucleotide Sequencing[SDV.BBM.BM]Life Sciences [q-bio]/Biochemistry Molecular Biology/Molecular biologyreverse transcription (RT) signaturereverse transcriptionFile formatalignment viewer0104 chemical sciencesVisualizationVisual inspection030104 developmental biology[ SDV.BBM.GTP ] Life Sciences [q-bio]/Biochemistry Molecular Biology/Genomics [q-bio.GN]Data miningcomputerSoftwareBiomolecules
researchProduct

A Greedy Algorithm for Hierarchical Complete Linkage Clustering

2014

We are interested in the greedy method to compute an hierarchical complete linkage clustering. There are two known methods for this problem, one having a running time of \({\mathcal O}(n^3)\) with a space requirement of \({\mathcal O}(n)\) and one having a running time of \({\mathcal O}(n^2 \log n)\) with a space requirement of Θ(n 2), where n is the number of points to be clustered. Both methods are not capable to handle large point sets. In this paper, we give an algorithm with a space requirement of \({\mathcal O}(n)\) which is able to cluster one million points in a day on current commodity hardware.

CombinatoricsCURE data clustering algorithmSUBCLUNearest-neighbor chain algorithmCorrelation clusteringSingle-linkage clusteringHierarchical clustering of networksGreedy algorithmComplete-linkage clusteringMathematics
researchProduct

Graphical Workflow System for Modification Calling by Machine Learning of Reverse Transcription Signatures

2019

Modification mapping from cDNA data has become a tremendously important approach in epitranscriptomics. So-called reverse transcription signatures in cDNA contain information on the position and nature of their causative RNA modifications. Data mining of, e.g. Illumina-based high-throughput sequencing data, is therefore fast growing in importance, and the field is still lacking effective tools. Here we present a versatile user-friendly graphical workflow system for modification calling based on machine learning. The workflow commences with a principal module for trimming, mapping, and postprocessing. The latter includes a quantification of mismatch and arrest rates with single-nucleotide re…

0301 basic medicinelcsh:QH426-470Downstream (software development)Computer scienceRT signatureMachine learningcomputer.software_genre[SDV.BBM.BM] Life Sciences [q-bio]/Biochemistry Molecular Biology/Molecular biologyField (computer science)m1A03 medical and health sciencesRNA modifications0302 clinical medicineEpitranscriptomics[SDV.BBM.GTP]Life Sciences [q-bio]/Biochemistry Molecular Biology/Genomics [q-bio.GN]GeneticsTechnology and CodeGalaxy platformGenetics (clinical)ComputingMilieux_MISCELLANEOUSbusiness.industryPrincipal (computer security)[SDV.BBM.BM]Life Sciences [q-bio]/Biochemistry Molecular Biology/Molecular biologyAutomationWatson–Crick faceVisualizationlcsh:Geneticsmachine learningComputingMethodologies_PATTERNRECOGNITION030104 developmental biologyWorkflow030220 oncology & carcinogenesisMolecular Medicine[SDV.BBM.GTP] Life Sciences [q-bio]/Biochemistry Molecular Biology/Genomics [q-bio.GN]TrimmingArtificial intelligencebusinesscomputer
researchProduct

The impact of isolated lesions on white-matter fiber tracts in multiple sclerosis patients

2015

Infratentorial lesions have been assigned an equivalent weighting to supratentorial plaques in the new McDonald criteria for diagnosing multiple sclerosis. Moreover, their presence has been shown to have prognostic value for disability. However, their spatial distribution and impact on network damage is not well understood. As a preliminary step in this study, we mapped the overall infratentorial lesion pattern in relapsing–remitting multiple sclerosis patients (N = 317) using MRI, finding the pons (lesion density, 14.25/cm3) and peduncles (13.38/cm3) to be predilection sites for infratentorial lesions. Based on these results, 118 fiber bundles from 15 healthy controls and a subgroup of 23 …

AdultPathologymedicine.medical_specialtyWallerian degenerationCognitive Neurosciencelcsh:Computer applications to medicine. Medical informaticsArticlelcsh:RC346-429LesionWhite matterMultiple sclerosisMultiple Sclerosis Relapsing-RemittingNerve FibersLSAF left superior arcuate fasciculusFractional anisotropymedicineHumansRadiology Nuclear Medicine and imagingFA fractional anisotropyNAWM normal-appearing white matterLD lesion densitylcsh:Neurology. Diseases of the nervous systemEAE experimental autoimmune encephalomyelitisMD mean diffusivitybusiness.industryMultiple sclerosisWhite matterMcDonald criteriaMiddle Agedmedicine.diseaseRD radial diffusivitymedicine.anatomical_structureDiffusion tensor imagingNeurologylcsh:R858-859.7Neurology (clinical)Brainstemmedicine.symptomFunction and Dysfunction of the Nervous SystembusinessBrainstemAD axial diffusivityDiffusion MRIBrain StemICP inferior cerebellar peduncleFractional anisotropyNeuroImage: Clinical
researchProduct

NOseq: amplicon sequencing evaluation method for RNA m6A sites after chemical deamination

2020

Abstract Methods for the detection of m6A by RNA-Seq technologies are increasingly sought after. We here present NOseq, a method to detect m6A residues in defined amplicons by virtue of their resistance to chemical deamination, effected by nitrous acid. Partial deamination in NOseq affects all exocyclic amino groups present in nucleobases and thus also changes sequence information. The method uses a mapping algorithm specifically adapted to the sequence degeneration caused by deamination events. Thus, m6A sites with partial modification levels of ∼50% were detected in defined amplicons, and this threshold can be lowered to ∼10% by combination with m6A immunoprecipitation. NOseq faithfully d…

AdenosineSequence analysisAcademicSubjects/SCI00010Bisulfite sequencingDeaminationAdenosine/analogs & derivatives; Adenosine/analysis; Algorithms; Animals; Chromatography Liquid; Deamination; Drosophila melanogaster/genetics; HEK293 Cells; HeLa Cells; High-Throughput Nucleotide Sequencing/methods; Humans; RNA/chemistry; RNA Long Noncoding/chemistry; RNA Messenger/chemistry; RNA Ribosomal 18S/chemistry; Sequence Alignment; Sequence Analysis RNA/methods; Tandem Mass SpectrometrySequence alignmentComputational biologyBiology010402 general chemistry[SDV.BBM.BM] Life Sciences [q-bio]/Biochemistry Molecular Biology/Molecular biology01 natural sciencesTranscriptome03 medical and health sciencesNarese/13Tandem Mass Spectrometry[SDV.BBM.GTP]Life Sciences [q-bio]/Biochemistry Molecular Biology/Genomics [q-bio.GN]GeneticsRNA Ribosomal 18SAnimalsHumansRNA MessengerComputingMilieux_MISCELLANEOUS030304 developmental biology0303 health sciencesSequence Analysis RNARNAHigh-Throughput Nucleotide Sequencing[SDV.BBM.BM]Life Sciences [q-bio]/Biochemistry Molecular Biology/Molecular biologyAmpliconRibosomal RNA0104 chemical sciencesDrosophila melanogasterHEK293 CellsDeaminationMethods OnlineRNA[SDV.BBM.GTP] Life Sciences [q-bio]/Biochemistry Molecular Biology/Genomics [q-bio.GN]RNA Long NoncodingSequence AlignmentAlgorithmsChromatography LiquidHeLa Cells
researchProduct

String kernels and high-quality data set for improved prediction of kinked helices in α-helical membrane proteins.

2011

The reasons for distortions from optimal α-helical geometry are widely unknown, but their influences on structural changes of proteins are significant. Hence, their prediction is a crucial problem in structural bioinformatics. For the particular case of kink prediction, we generated a data set of 132 membrane proteins containing 1014 manually labeled helices and examined the environment of kinks. Our sequence analysis confirms the great relevance of proline and reveals disproportionately high occurrences of glycine and serine at kink positions. The structural analysis shows significantly different solvent accessible surface area mean values for kinked and nonkinked helices. More important, …

Models MolecularSupport Vector MachineProlineGeneral Chemical EngineeringGlycineLibrary and Information SciencesProtein Structure SecondaryAccessible surface areaSet (abstract data type)Structural bioinformaticsC++ string handlingSerineAnimalsHumansDatabases ProteinQuantitative Biology::BiomoleculesModels StatisticalChemistryComputational BiologyMembrane ProteinsGeneral ChemistryComputer Science ApplicationsData setCrystallographyMembrane proteinα helicalResearch Designlipids (amino acids peptides and proteins)Biological systemJournal of chemical information and modeling
researchProduct

CorCast: A Distributed Architecture for Bayesian Epidemic Nowcasting and its Application to District-Level SARS-CoV-2 Infection Numbers in Germany

2021

Timely information on current infection numbers during an epidemic is of crucial importance for decision makers in politics, medicine, and businesses. As information about local infection risk can guide public policy as well as individual behavior, such as the wearing of personal protective equipment or voluntary social distancing, statistical models providing such insights should be transparent and reproducible as well as accurate. Fulfilling these requirements is drastically complicated by the large amounts of data generated during exponential growth of infection numbers, and by the complexity of common inference pipelines. Here, we present CorCast – a stable and scalable distributed arch…

EstimationNowcastingComputer sciencePandemicBayesian probabilityInferencePublic policyStatistical modelData sciencePersonal protective equipment
researchProduct

Locality-sensitive hashing enables signal classification in high-throughput mass spectrometry raw data at scale

2021

Mass spectrometry is an important experimental technique in the field of proteomics. However, analysis of certain mass spectrometry data faces a combination of two challenges: First, even a single experiment produces a large amount of multi-dimensional raw data and, second, signals of interest are not single peaks but patterns of peaks that span along the different dimensions. The rapidly growing amount of mass spectrometry data increases the demand for scalable solutions. Existing approaches for signal detection are usually not well suited for processing large amounts of data in parallel or rely on strong assumptions concerning the signals properties. In this study, it is shown that locali…

business.industryComputer scienceScalabilityHash functionPattern recognitionDetection theoryArtificial intelligenceMass spectrometrybusinessRaw dataThresholdingSynthetic dataLocality-sensitive hashing
researchProduct

A dynamic program analysis to find floating-point accuracy problems

2012

Programs using floating-point arithmetic are prone to accuracy problems caused by rounding and catastrophic cancellation. These phenomena provoke bugs that are notoriously hard to track down: the program does not necessarily crash and the results are not necessarily obviously wrong, but often subtly inaccurate. Further use of these values can lead to catastrophic errors.In this paper, we present a dynamic program analysis that supports the programmer in finding accuracy problems. Our analysis uses binary translation to perform every floating-point computation side by side in higher precision. Furthermore, we use a lightweight slicing approach to track the evolution of errors.We evaluate our…

Floating pointComputer engineeringComputer scienceComputationRoundingReal-time computingBinary translationDynamic program analysisBenchmark (computing)ProgrammerProceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation
researchProduct

Parallelized Clustering of Protein Structures on CUDA-Enabled GPUs

2014

Estimation of the pose in which two given molecules might bind together to form a potential complex is a crucial task in structural biology. To solve this so-called "docking problem", most algorithms initially generate large numbers of candidate poses (or decoys) which are then clustered to allow for subsequent computationally expensive evaluations of reasonable representatives. Since the number of such candidates ranges from thousands to millions, performing the clustering on standard CPUs is highly time consuming. In this paper we analyze and evaluate different approaches to parallelize the nearest neighbor chain algorithm to perform hierarchical Ward clustering of protein structures usin…

CUDASpeedupComputer scienceNearest-neighbor chain algorithmParallel computingCluster analysisRoot-mean-square deviationPoseWard's methodHierarchical clustering2014 22nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing
researchProduct

SKINK: a web server for string kernel based kink prediction in α-helices

2014

Abstract Motivation: The reasons for distortions from optimal α-helical geometry are widely unknown, but their influences on structural changes of proteins are significant. Hence, their prediction is a crucial problem in structural bioinformatics. Here, we present a new web server, called SKINK, for string kernel based kink prediction. Extending our previous study, we also annotate the most probable kink position in a given α-helix sequence. Availability and implementation: The SKINK web server is freely accessible at http://biows-inf.zdv.uni-mainz.de/skink. Moreover, SKINK is a module of the BALL software, also freely available at www.ballview.org. Contact:  benny.kneissl@roche.com

Statistics and ProbabilitySkinkWeb serverTheoretical computer scienceComputer scienceReal-time computingcomputer.software_genreBiochemistryProtein Structure SecondaryStructural bioinformaticsSoftwareSequence Analysis ProteinString kernelPosition (vector)Ball (mathematics)Molecular BiologyInternetSequencebiologybusiness.industryComputational BiologyProteinsbiology.organism_classificationComputer Science ApplicationsComputational MathematicsComputational Theory and MathematicsbusinesscomputerSoftwareBioinformatics
researchProduct

On the Applicability of Elastic Network Normal Modes in Small-Molecule Docking

2012

Incorporating backbone flexibility into protein-ligand docking is still a challenging problem. In protein-protein docking, normal mode analysis (NMA) has become increasingly popular as it can be used to describe the collective motions of a biological system, but the question of whether NMA can also be useful in predicting the conformational changes observed upon small-molecule binding has only been addressed in a few case studies. Here, we describe a large-scale study on the applicability of NMA for protein-ligand docking using 433 apo/holo pairs of the Astex data sets. On the basis of sets of the first normal modes from the apo structure, we first generated for each paired holo structure a…

Models MolecularProtein ConformationComputer scienceGeneral Chemical Engineeringfood and beveragesGeneral ChemistryLibrary and Information SciencesElastic networkSmall moleculeElasticityComputer Science ApplicationsSmall Molecule LibrariesProtein–ligand dockingNormal modeDocking (molecular)Searching the conformational space for dockingComputational chemistryApoproteinsBiological systemJournal of Chemical Information and Modeling
researchProduct

AnySeq: A High Performance Sequence Alignment Library based on Partial Evaluation

2020

Sequence alignments are fundamental to bioinformatics which has resulted in a variety of optimized implementations. Unfortunately, the vast majority of them are hand-tuned and specific to certain architectures and execution models. This not only makes them challenging to understand and extend, but also difficult to port to other platforms. We present AnySeq - a novel library for computing different types of pairwise alignments of DNA sequences. Our approach combines high performance with an intuitively understandable implementation, which is achieved through the concept of partial evaluation. Using the AnyDSL compiler framework, AnySeq enables the compilation of algorithmic variants that ar…

FOS: Computer and information sciences0301 basic medicineScheme (programming language)Computer Science - PerformanceComputer science0206 medical engineeringSequence alignment02 engineering and technologyParallel computingcomputer.software_genreMetaprogrammingDNA sequencingPartial evaluationPerformance (cs.PF)03 medical and health sciences030104 developmental biologyComputer Science - Distributed Parallel and Cluster ComputingFunction composition (computer science)MultithreadingDistributed Parallel and Cluster Computing (cs.DC)Compilercomputer020602 bioinformaticscomputer.programming_languageCodebase
researchProduct

ProteinScanAR - An Augmented Reality Web Application for High School Education in Biomolecular Life Sciences

2012

Understanding protein structures is a crucial step in creating molecular insight for researchers as well as students and pupils. The enormous scaling gap between an atomic point of view and objects in daily life hampers developing an intuitive relation between them. Especially for high school students, it can be difficult to understand the spatial relations of a protein structure. Due to lack of direct imaging techniques, molecules can only be explored by studying abstract molecular models. Here, the use of Augmented reality (AR) techniques has proven to strongly improve structural perception. In this work we present ProteinScanAR, an augmented reality framework for biomolecular education t…

Structure (mathematical logic)Spatial relationHTML5Relation (database)business.industryHuman–computer interactionComputer scienceWeb applicationThe InternetAugmented realitybusinessLicense2012 16th International Conference on Information Visualisation
researchProduct

Algorithms for the Maximum Weight Connected $$k$$-Induced Subgraph Problem

2014

Finding differentially regulated subgraphs in a biochemical network is an important problem in bioinformatics. We present a new model for finding such subgraphs which takes the polarity of the edges (activating or inhibiting) into account, leading to the problem of finding a connected subgraph induced by \(k\) vertices with maximum weight. We present several algorithms for this problem, including dynamic programming on tree decompositions and integer linear programming. We compare the strength of our integer linear program to previous formulations of the \(k\)-cardinality tree problem. Finally, we compare the performance of the algorithms and the quality of the results to a previous approac…

Dynamic programmingDiscrete mathematicsCombinatoricsLinear programmingInduced subgraphHeuristicsInteger programmingAlgorithmTree (graph theory)Tree decompositionMathematicsofComputing_DISCRETEMATHEMATICSMathematicsInteger (computer science)
researchProduct

CellLineNavigator: a workbench for cancer cell line analysis

2012

The CellLineNavigator database, freely available at http://www.medicalgenomics.org/celllinenavigator, is a web-based workbench for large scale comparisons of a large collection of diverse cell lines. It aims to support experimental design in the fields of genomics, systems biology and translational biomedical research. Currently, this compendium holds genome wide expression profiles of 317 different cancer cell lines, categorized into 57 different pathological states and 28 individual tissues. To enlarge the scope of CellLineNavigator, the database was furthermore closely linked to commonly used bioinformatics databases and knowledge repositories. To ensure easy data access and search abili…

GeneticsInternetInterface (Java)Systems biologyGenomicsArticlesComputational biologyBiologyGenomeGene nomenclatureAnnotationComputingMethodologies_PATTERNRECOGNITIONData accessCell Line TumorNeoplasmsDatabases GeneticGeneticsHumansWorkbenchTranscriptomeNucleic Acids Research
researchProduct

Integrated quantitative proteomic and transcriptomic analysis of lung tumor and control tissue: a lung cancer showcase

2015

Proteomics analysis of paired cancer and control tissue can be applied to investigate pathological processes in tumors. Advancements in data-independent acquisition mass spectrometry allow for highly reproducible quantitative analysis of complex proteomic patterns. Optimized sample preparation workflows enable integrative multi-omics studies from the same tissue specimens. We performed ion mobility enhanced, data-independent acquisition MS to characterize the proteome of 21 lung tumor tissues including adenocarcinoma and squamous cell carcinoma (SCC) as compared to control lung tissues of the same patient each. Transcriptomic data were generated for the same specimens. The quantitative prot…

Proteomics0301 basic medicinePathologymedicine.medical_specialtyLung NeoplasmsProteomeSystems biologyProteomicsTranscriptometranscriptomics03 medical and health sciencesBiomarkers TumormedicineHumansLung cancerNeoplasm Stagingmass spectrometryadenocarcinomabusiness.industryGene Expression Profilingproteomics analysisPrognosismedicine.diseaseGene expression profiling030104 developmental biologyOncologyCase-Control StudiesProteomeCarcinoma Squamous CellAdenocarcinomalung tumorsTranscriptomebusinessQuantitative analysis (chemistry)Follow-Up StudiesResearch PaperOncotarget
researchProduct

Competing salt effects on phase behavior of protein solutions: tailoring of protein interaction by the binding of multivalent ions and charge screeni…

2014

The phase behavior of protein solutions is affected by additives such as crowder molecules or salts. In particular, upon addition of multivalent counterions, a reentrant condensation can occur; i.e., protein solutions are stable for low and high multivalent ion concentrations but aggregating at intermediate salt concentrations. The addition of monovalent ions shifts the phase boundaries to higher multivalent ion concentrations. This effect is found to be reflected in the protein interactions, as accessed via small-angle X-ray scattering. Two simulation schemes (a Monte Carlo sampling of the counterion binding configurations using the detailed protein structure and an analytical coarse-grain…

chemistry.chemical_classificationIonsCondensationOsmolar ConcentrationSurfaces Coatings and FilmsIonProtein–protein interactionProtein structurechemistryX-Ray DiffractionIonic strengthComputational chemistryPhase (matter)Scattering Small AngleMaterials ChemistryMoleculeHumansSaltsPhysical and Theoretical ChemistryCounterionSerum AlbuminThe journal of physical chemistry. B
researchProduct

Automatic shape detection of ice crystals

2021

Abstract Clouds have a crucial impact on the energy balance of the Earth-Atmosphere system. They can cool the system by partly reflecting or scattering of the incoming solar radiation (albedo effect); moreover, thermal radiation as emitted from the Earth's surface can be absorbed and partly re-emitted by clouds leading to a warming of the atmosphere (greenhouse effect). The effectiveness of both effects crucially depends on the size and the shape of a cloud's particulate constituents, i.e. liquid water droplets or solid ice crystals. For studying cloud microphysics, in situ measurements on board of aircraft are commonly used. An important class of measurement techniques comprises optical ar…

General Computer ScienceIce crystalsComputer scienceScatteringbusiness.industryLead (sea ice)Cloud computingFilter (signal processing)RadiationTheoretical Computer ScienceThermal radiationModeling and SimulationParticleBiological systembusinessJournal of Computational Science
researchProduct

Efficient computation of root mean square deviations under rigid transformations

2013

The computation of root mean square deviations (RMSD) is an important step in many bioinformatics applications. If approached naively, each RMSD computation takes time linear in the number of atoms. In addition, a careful implementation is required to achieve numerical stability, which further increases runtimes. In practice, the structural variations under consideration are often induced by rigid transformations of the protein, or are at least dominated by a rigid component. In this work, we show how RMSD values resulting from rigid transformations can be computed in constant time from the protein's covariance matrix, which can be precomputed in linear time. As a typical application scenar…

Protein ConformationCovariance matrixComputationComputational BiologyProteinsGeometryGeneral ChemistryRoot mean squareComputational MathematicsComputer SimulationStatistical physicsCluster analysisConstant (mathematics)Time complexityRigid transformationMathematicsNumerical stabilityJournal of Computational Chemistry
researchProduct

NESSie.jl – Efficient and intuitive finite element and boundary element methods for nonlocal protein electrostatics in the Julia language

2018

Abstract The development of scientific software can be generally characterized by an initial phase of rapid prototyping and the subsequent transition to computationally efficient production code. Unfortunately, most programming languages are not well-suited for both tasks at the same time, commonly resulting in a considerable extension of the development time. The cross-platform and open-source Julia language aims at closing the gap between prototype and production code by providing a usability comparable to Python or MATLAB alongside high-performance capabilities known from C and C++ in a single programming language. In this paper, we present efficient protein electrostatics computations a…

0301 basic medicineRapid prototypingGeneral Computer Sciencebusiness.industryComputer scienceComputationUsabilityPython (programming language)Finite element methodTheoretical Computer ScienceNESSIEComputational science03 medical and health sciences030104 developmental biologyModeling and SimulationbusinessMATLABBoundary element methodcomputercomputer.programming_languageJournal of Computational Science
researchProduct

Polish is quantitatively different on quartzite flakes used on different worked materials.

2020

Metrology has been successfully used in the last decade to quantify use-wear on stone tools. Such techniques have been mostly applied to fine-grained rocks (chert), while studies on coarse-grained raw materials have been relatively infrequent. In this study, confocal microscopy was employed to investigate polished surfaces on a coarse-grained lithology, quartzite. Wear originating from contact with five different worked materials were classified in a data-driven approach using machine learning. Two different classifiers, a decision tree and a support-vector machine, were used to assign the different textures to a worked material based on a selected number of parameters (Mean density of furr…

Future studiesConfocal MicroscopyDecision AnalysisLithologyRaw MaterialsAntlersBone imagingPlant Science01 natural sciencesDiagnostic RadiologyMedicine and Health Sciences0601 history and archaeologyElectron MicroscopyAnimal AnatomyMaterialsMicroscopyMultidisciplinary060102 archaeologyPlant AnatomyRadiology and ImagingQRLight Microscopy06 humanities and the artsQuartzWoodBone ImagingProcess EngineeringPhysical SciencesMedicineEngineering and TechnologyScanning Electron MicroscopyAnatomyManagement EngineeringGeologyResearch Article010506 paleontologyImaging TechniquesScienceMaterials ScienceMineralogyIndustrial ProcessesResearch and Analysis MethodsDiagnostic MedicineIndustrial Engineering0105 earth and related environmental sciencesSurface TreatmentsDecision TreesBiology and Life SciencesManufacturing ProcessesSample size determinationZoologyPloS one
researchProduct

CARE: context-aware sequencing read error correction.

2020

Abstract Motivation Error correction is a fundamental pre-processing step in many Next-Generation Sequencing (NGS) pipelines, in particular for de novo genome assembly. However, existing error correction methods either suffer from high false-positive rates since they break reads into independent k-mers or do not scale efficiently to large amounts of sequencing reads and complex genomes. Results We present CARE—an alignment-based scalable error correction algorithm for Illumina data using the concept of minhashing. Minhashing allows for efficient similarity search within large sequencing read collections which enables fast computation of high-quality multiple alignments. Sequencing errors ar…

Statistics and ProbabilityMultiple sequence alignmentComputer scienceSequence assemblyHigh-Throughput Nucleotide SequencingContext (language use)Sequence Analysis DNAcomputer.software_genreBiochemistryGenomeComputer Science ApplicationsComputational MathematicsComputational Theory and MathematicsHumansHuman genomeData miningError detection and correctionMolecular BiologycomputerSequence AlignmentAlgorithmsSoftwareBioinformatics (Oxford, England)
researchProduct

NightShift: NMR shift inference by general hybrid model training - a framework for NMR chemical shift prediction

2013

004 InformatikBiochemistryMolecular Biology004 Data processingComputer Science ApplicationsBMC Bioinformatics
researchProduct

The reverse transcription signature of N-1-methyladenosine in RNA-Seq is sequence dependent

2015

The combination of Reverse Transcription (RT) and high-throughput sequencing has emerged as a powerful combination to detect modified nucleotides in RNA via analysis of either abortive RT-products or of the incorporation of mismatched dNTPs into cDNA. Here we simultaneously analyze both parameters in detail with respect to the occurrence of N-1-methyladenosine (m1A) in the template RNA. This naturally occurring modification is associated with structural effects, but it is also known as a mediator of antibiotic resistance in ribosomal RNA. In structural probing experiments with dimethylsulfate, m1A is routinely detected by RT-arrest. A specifically developed RNA-Seq protocol was tailored to …

AdenosineSequence Analysis RNAHigh-Throughput Nucleotide SequencingReverse TranscriptionL1Sciences bio-médicales et agricoles13570 Life sciencesMachine LearningMiceSequence Homology Nucleic AcidRNAAnimalsHumans[SDV.BBM]Life Sciences [q-bio]/Biochemistry Molecular Biology[SDV.MHEP]Life Sciences [q-bio]/Human health and pathology570 Biowissenschaften
researchProduct

Evaluating the microscopic effect of brushing stone tools as a cleaning procedure [Python analysis]

2020

This upload includes the following files related to the Python analysis: Raw data as a XLSX table (brushing_v2.xlsx), i.e. results from R Script #1 (see https://doi.org/10.5281/zenodo.3632517) Python script of the whole analysis (RunEveryParameter.py) Convenience script for running RunEveryParameter.py in background and logging all output (RunSingleParametesBash.sh) Log file for output of sampling from the model for each parameter in a loop (logAll.txt) Jupyter notebooks of the analysis run on epLsar as an example (Notebook_SingleParameter.inpyb) and of a summary of the whole analysis (Notebook_Overview.ipynb), plus associated HTML output files (*.html) For each parameter: Full samples of p…

researchProduct

Polish is quantitatively different on quartzite flakes used on different worked materials [ConfoMap analysis]

2020

Each surface has been processed with two templates: 1) Extract two 50x50 µm sub-areas and extract topography layer from each sub-area. Export sub-areas as SUR files. File names start with "A35" or "VSH4". 2) Process all extracted sub-areas for quantitative analysis. File names start with "processing-quartzite-final". All ConfoMap templates are saved in MNT format (including all original and processed surfaces, as well as results). Each template has also been exported to a PDF file. Instructions to download all files at once are given here: https://doi.org/10.5281/zenodo.4011952 Additionally, the results of the second template are collated into "proce…

researchProduct

Evaluating the microscopic effect of brushing stone tools as a cleaning procedure [R analysis]

2020

This upload includes the following files related to the R analysis: - Raw data as a CSV table (brushing_v2.csv), i.e. results from the ConfoMap analysis (see https://doi.org/10.5281/zenodo.3632490) - RStudio project (Brushing_project.Rproj) - R scripts as R Markdown files (*.Rmd) - Output from R scripts knitted to HTML files (*.html) - A text file containing the version of RStudio used (RStudioVersion.txt) Instructions to download all files at once are given here: https://doi.org/10.5281/zenodo.4011952

researchProduct

Polish is quantitatively different on quartzite flakes used on different worked materials [Python analysis]

2020

This upload includes the following files related to the Python analysis: 1. Raw data as a XLSX table (processing-quartzite-final-2020-04-29.xlsx) is the output from R Script #1 (see https://doi.org/10.5281/zenodo.3979139), even though the filename is slightly different. Plus, for each analysis (full and restricted datasets), included in the corresponding ZIP archive: 2. Jupyter notebooks of the analysis (Classification_RandSplitFeature_Revision_VXX.ipynb) rendered to HTML file (Classification_RandSplitFeature_Revision_VXX.html) 3. Dataframe including the artificially filled datapoints 4. Output of the analysis as PDF: • Confusion matrices ("CM&qu…

researchProduct

Evaluating the microscopic effect of brushing stone tools as a cleaning procedure [ConfoMap analysis]

2020

ConfoMap templates for each surface in MNT format (including all original and processed surfaces, as well as results). Each template has also been exported to a PDF file. Additionally, results are collated into 'brushing_v2.csv' Instructions to download all files at once are given here: https://doi.org/10.5281/zenodo.4011952

researchProduct

Polish is quantitatively different on quartzite flakes used on different worked materials [R analysis]

2020

This upload includes the following files related to the R analysis: - Raw data as a CSV table (processing-quartzite-final.csv), i.e. results from the ConfoMap analysis (see https://doi.org/10.5281/zenodo.3979116) - RStudio project (Quantification quartzite final.Rproj) - R scripts as R Markdown files (*.Rmd) - R scripts knitted to HTML files (*.html) - An R script (RStudioVersion.R) to write the used version of RStudio to a text file (RStudioVersion.txt) - Output from script #1: processing-quartzite-final.Rbin and processing-quartzite-final.xlsx - Output from script #2: processing-quartzite-final_summary-stats.xlsx - Output from script #3: all plots as PDF files. Note that for running the s…

researchProduct