Search results for "algorithm."

showing 10 items of 4617 documents

MetaCache: context-aware classification of metagenomic reads using minhashing.

2017

Abstract Motivation Metagenomic shotgun sequencing studies are becoming increasingly popular with prominent examples including the sequencing of human microbiomes and diverse environments. A fundamental computational problem in this context is read classification, i.e. the assignment of each read to a taxonomic label. Due to the large number of reads produced by modern high-throughput sequencing technologies and the rapidly increasing number of available reference genomes corresponding software tools suffer from either long runtimes, large memory requirements or low accuracy. Results We introduce MetaCache—a novel software for read classification using the big data technique minhashing. Our…

0301 basic medicineStatistics and ProbabilityComputer scienceSequence analysisContext (language use)BiochemistryGenome03 medical and health scienceschemistry.chemical_compound0302 clinical medicineRefSeqHumansMolecular BiologyInformation retrievalShotgun sequencingHigh-Throughput Nucleotide SequencingSequence Analysis DNAComputer Science ApplicationsComputational Mathematics030104 developmental biologyComputational Theory and MathematicschemistryMetagenomicsMetagenomics030217 neurology & neurosurgeryDNAAlgorithmsSoftwareReference genomeBioinformatics (Oxford, England)
researchProduct

Reactome diagram viewer: data structures and strategies to boost performance

2017

Abstract Motivation Reactome is a free, open-source, open-data, curated and peer-reviewed knowledgebase of biomolecular pathways. For web-based pathway visualization, Reactome uses a custom pathway diagram viewer that has been evolved over the past years. Here, we present comprehensive enhancements in usability and performance based on extensive usability testing sessions and technology developments, aiming to optimize the viewer towards the needs of the community. Results The pathway diagram viewer version 3 achieves consistently better performance, loading and rendering of 97% of the diagrams in Reactome in less than 1 s. Combining the multi-layer html5 canvas strategy with a space partit…

0301 basic medicineStatistics and ProbabilityDatabases FactualComputer scienceKnowledge BasesDatabases and OntologiesBiochemistryWorld Wide Web03 medical and health sciences0302 clinical medicineHumansMolecular BiologyInternetComputational BiologyData structureOriginal PapersComputer Science ApplicationsVisualizationComputational Mathematics030104 developmental biologyComputational Theory and Mathematics030220 oncology & carcinogenesisScalabilityAlgorithmsMetabolic Networks and PathwaysSoftwareBioinformatics
researchProduct

ParDRe: faster parallel duplicated reads removal tool for sequencing studies

2016

This is a pre-copyedited, author-produced version of an article accepted for publication in Bioinformatics following peer review. The version of record [insert complete citation information here] is available online at: https://doi.org/10.1093/bioinformatics/btw038 [Abstract] Summary: Current next generation sequencing technologies often generate duplicated or near-duplicated reads that (depending on the application scenario) do not provide any interesting biological information but can increase memory requirements and computational time of downstream analysis. In this work we present ParDRe , a de novo parallel tool to remove duplicated and near-duplicated reads through the clustering of S…

0301 basic medicineStatistics and ProbabilityFASTQ formatDNA stringsSource codeDownstream (software development)Computer sciencemedia_common.quotation_subjectParallel computingcomputer.software_genreBiochemistryDNA sequencing03 medical and health scienceschemistry.chemical_compound0302 clinical medicineHybrid MPI/multithreadingCluster AnalysisParDReMolecular BiologyGenemedia_commonHigh-Throughput Nucleotide SequencingSequence Analysis DNAParallel toolComputer Science ApplicationsComputational Mathematics030104 developmental biologyComputational Theory and MathematicschemistryData miningcomputerAlgorithms030217 neurology & neurosurgeryDNABioinformatics
researchProduct

L1-Penalized Censored Gaussian Graphical Model

2018

Graphical lasso is one of the most used estimators for inferring genetic networks. Despite its diffusion, there are several fields in applied research where the limits of detection of modern measurement technologies make the use of this estimator theoretically unfounded, even when the assumption of a multivariate Gaussian distribution is satisfied. Typical examples are data generated by polymerase chain reactions and flow cytometer. The combination of censoring and high-dimensionality make inference of the underlying genetic networks from these data very challenging. In this article, we propose an $\ell_1$-penalized Gaussian graphical model for censored data and derive two EM-like algorithm…

0301 basic medicineStatistics and ProbabilityFOS: Computer and information sciencesgraphical lassoComputer scienceGaussianNormal DistributionInferenceMultivariate normal distribution01 natural sciencesMethodology (stat.ME)010104 statistics & probability03 medical and health sciencessymbols.namesakeGraphical LassoExpectation–maximization algorithmHumansComputer SimulationGene Regulatory NetworksGraphical model0101 mathematicsStatistics - MethodologyEstimation theoryReverse Transcriptase Polymerase Chain ReactionEstimatorexpectation-maximization algorithmGeneral MedicineCensoring (statistics)High-dimensional datahigh-dimensional dataGaussian graphical model030104 developmental biologysymbolscensored dataCensored dataExpectation-Maximization algorithmStatistics Probability and UncertaintySettore SECS-S/01 - StatisticaAlgorithmAlgorithms
researchProduct

Model selection for factorial Gaussian graphical models with an application to dynamic regulatory networks.

2016

Abstract Factorial Gaussian graphical Models (fGGMs) have recently been proposed for inferring dynamic gene regulatory networks from genomic high-throughput data. In the search for true regulatory relationships amongst the vast space of possible networks, these models allow the imposition of certain restrictions on the dynamic nature of these relationships, such as Markov dependencies of low order – some entries of the precision matrix are a priori zeros – or equal dependency strengths across time lags – some entries of the precision matrix are assumed to be equal. The precision matrix is then estimated by l 1-penalized maximum likelihood, imposing a further constraint on the absolute value…

0301 basic medicineStatistics and ProbabilityFactorialDependency (UML)Computer scienceGaussianNormal Distributionpenalized inferencesparse networkscomputer.software_genreMachine learning01 natural sciencesNormal distribution010104 statistics & probability03 medical and health sciencessymbols.namesakeSparse networksGeneticsComputer SimulationGene Regulatory NetworksGraphical model0101 mathematicsgene-regulatory systemMolecular BiologyProbabilityMarkov chainModels GeneticPenalized inferencebusiness.industryModel selectiongraphical modelGene-regulatory systemsComputational Mathematics030104 developmental biologysymbolsA priori and a posterioriData miningArtificial intelligenceGraphical modelsSettore SECS-S/01 - StatisticabusinesscomputerNeisseriaAlgorithmsStatistical applications in genetics and molecular biology
researchProduct

A generalization of Kingman's model of selection and mutation and the Lenski experiment.

2017

Kingman’s model of selection and mutation studies the limit type value distribution in an asexual population of discrete generations and infinite size undergoing selection and mutation. This paper generalizes the model to analyze the long-term evolution of Escherichia. coli in Lenski experiment. Weak assumptions for fitness functions are proposed and the mutation mechanism is the same as in Kingman’s model. General macroscopic epistasis are designable through fitness functions. Convergence to the unique limit type distribution is obtained.

0301 basic medicineStatistics and ProbabilityGeneralizationPopulationBiology01 natural sciencesModels BiologicalGeneral Biochemistry Genetics and Molecular Biology010104 statistics & probability03 medical and health sciencesStatisticsEscherichia coliApplied mathematicsQuantitative Biology::Populations and EvolutionLimit (mathematics)0101 mathematicsSelection GeneticeducationSelection (genetic algorithm)education.field_of_studyFitness functionGeneral Immunology and MicrobiologyApplied MathematicsGeneral MedicineQuantitative Biology::GenomicsBiological Evolution030104 developmental biologyDistribution (mathematics)Modeling and SimulationMutation (genetic algorithm)MutationEpistasisGeneral Agricultural and Biological SciencesMathematical biosciences
researchProduct

The latent geometry of the human protein interaction network

2017

Abstract Motivation A series of recently introduced algorithms and models advocates for the existence of a hyperbolic geometry underlying the network representation of complex systems. Since the human protein interaction network (hPIN) has a complex architecture, we hypothesized that uncovering its latent geometry could ease challenging problems in systems biology, translating them into measuring distances between proteins. Results We embedded the hPIN to hyperbolic space and found that the inferred coordinates of nodes capture biologically relevant features, like protein age, function and cellular localization. This means that the representation of the hPIN in the two-dimensional hyperboli…

0301 basic medicineStatistics and ProbabilityGeometric analysisComputer scienceHyperbolic geometrySystems biologyComplex systemContext (language use)GeometryBiochemistryProtein–protein interaction03 medical and health sciencesInteraction networkHumansProtein Interaction MapsRepresentation (mathematics)Cluster analysisMolecular BiologySystems BiologyHyperbolic spaceProteinsFunction (mathematics)Original PapersComputer Science ApplicationsComputational Mathematics030104 developmental biologyComputational Theory and MathematicsEmbeddingSignal transductionAlgorithmsSignal Transduction
researchProduct

Variance component analysis to assess protein quantification in biomarker discovery. Application to MALDI-TOF mass spectrometry.

2017

International audience; Controlling the technological variability on an analytical chain is critical for biomarker discovery. The sources of technological variability should be modeled, which calls for specific experimental design, signal processing, and statistical analysis. Furthermore, with unbalanced data, the various components of variability cannot be estimated with the sequential or adjusted sums of squares of usual software programs. We propose a novel approach to variance component analysis with application to the matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) technology and use this approach for protein quantification by a classical signal processing algori…

0301 basic medicineStatistics and ProbabilityMALDI-TOFexperimental designBiometryprotein quantificationQuantitative proteomicsVariance component analysis[ CHIM ] Chemical Sciences01 natural sciencesSignaltechnological variability010104 statistics & probability03 medical and health sciencesstatistical analysis[INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing[CHIM.ANAL]Chemical Sciences/Analytical chemistryComponent (UML)[SDV.BBM.GTP]Life Sciences [q-bio]/Biochemistry Molecular Biology/Genomics [q-bio.GN]biomarker discoverysum of squares type0101 mathematicsBiomarker discoverysignal processingMathematicsSignal processingAnalysis of Variance[ PHYS ] Physics [physics]Noise (signal processing)ProteinsGeneral MedicineVariance (accounting)[SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM]030104 developmental biologySpectrometry Mass Matrix-Assisted Laser Desorption-IonizationLinear Modelsvariance components[ CHIM.ANAL ] Chemical Sciences/Analytical chemistryStatistics Probability and UncertaintyBiological systemAlgorithmsBiomarkersBiometrical journal. Biometrische Zeitschrift
researchProduct

A graphical model selection tool for mixed models

2017

Model selection can be defined as the task of estimating the performance of different models in order to choose the most parsimonious one, among a potentially very large set of candidate statistical models. We propose a graphical representation to be considered as an extension to the class of mixed models of the deviance plot proposed in the literature within the framework of classical and generalized linear models. This graphical representation allows, once a reduced number of models have been selected, to identify important covariates focusing only on the fixed effects component, assuming the random part properly specified. Nevertheless, we suggest also a standalone figure representing th…

0301 basic medicineStatistics and ProbabilityMixed modelModel selectionFeature selection01 natural sciencesTask (project management)Deviance plot Penalized Weighted Residual Sum of Squares Variable selection010104 statistics & probability03 medical and health sciences030104 developmental biologyModeling and SimulationStatisticsGraphical model0101 mathematicsSelection (genetic algorithm)Mathematics
researchProduct

LEGO-based generalized set of two linear algebraic 3D bio-macro-molecular descriptors: Theory and validation by QSARs

2019

Abstract Novel 3D protein descriptors based on bilinear, quadratic and linear algebraic maps in R n are proposed. The latter employs the kth 2-tuple (dis) similarity matrix to codify information related to covalent and non-covalent interactions in these biopolymers. The calculation of the inter-amino acid distances is generalized by using several dis-similarity coefficients, where normalization procedures based on the simple stochastic and mutual probability schemes are applied. A new local-fragment approach based on amino acid-types and amino acid-groups is proposed to characterize regions of interest in proteins. Topological and geometric macromolecular cutoffs are defined using local and…

0301 basic medicineStatistics and ProbabilityNormalization (statistics)GeneralizationQuantitative Structure-Activity RelationshipGeneral Biochemistry Genetics and Molecular Biology03 medical and health sciences0302 clinical medicineLinear regressionAmino AcidsMathematicsGeneral Immunology and MicrobiologyApplied MathematicsStatistical parameterProteinsGeneral MedicineCollinearityStructural Classification of Proteins databaseSupport vector machine030104 developmental biologyModeling and SimulationTest setLinear ModelsGeneral Agricultural and Biological SciencesAlgorithmSoftware030217 neurology & neurosurgeryJournal of Theoretical Biology
researchProduct