Search results for "Computational Mathematic"

showing 10 items of 987 documents

ParDRe: faster parallel duplicated reads removal tool for sequencing studies

2016

This is a pre-copyedited, author-produced version of an article accepted for publication in Bioinformatics following peer review. The version of record [insert complete citation information here] is available online at: https://doi.org/10.1093/bioinformatics/btw038 [Abstract] Summary: Current next generation sequencing technologies often generate duplicated or near-duplicated reads that (depending on the application scenario) do not provide any interesting biological information but can increase memory requirements and computational time of downstream analysis. In this work we present ParDRe , a de novo parallel tool to remove duplicated and near-duplicated reads through the clustering of S…

0301 basic medicineStatistics and ProbabilityFASTQ formatDNA stringsSource codeDownstream (software development)Computer sciencemedia_common.quotation_subjectParallel computingcomputer.software_genreBiochemistryDNA sequencing03 medical and health scienceschemistry.chemical_compound0302 clinical medicineHybrid MPI/multithreadingCluster AnalysisParDReMolecular BiologyGenemedia_commonHigh-Throughput Nucleotide SequencingSequence Analysis DNAParallel toolComputer Science ApplicationsComputational Mathematics030104 developmental biologyComputational Theory and MathematicschemistryData miningcomputerAlgorithms030217 neurology & neurosurgeryDNABioinformatics

researchProduct

Model selection for factorial Gaussian graphical models with an application to dynamic regulatory networks.

2016

Abstract Factorial Gaussian graphical Models (fGGMs) have recently been proposed for inferring dynamic gene regulatory networks from genomic high-throughput data. In the search for true regulatory relationships amongst the vast space of possible networks, these models allow the imposition of certain restrictions on the dynamic nature of these relationships, such as Markov dependencies of low order – some entries of the precision matrix are a priori zeros – or equal dependency strengths across time lags – some entries of the precision matrix are assumed to be equal. The precision matrix is then estimated by l 1-penalized maximum likelihood, imposing a further constraint on the absolute value…

0301 basic medicineStatistics and ProbabilityFactorialDependency (UML)Computer scienceGaussianNormal Distributionpenalized inferencesparse networkscomputer.software_genreMachine learning01 natural sciencesNormal distribution010104 statistics & probability03 medical and health sciencessymbols.namesakeSparse networksGeneticsComputer SimulationGene Regulatory NetworksGraphical model0101 mathematicsgene-regulatory systemMolecular BiologyProbabilityMarkov chainModels GeneticPenalized inferencebusiness.industryModel selectiongraphical modelGene-regulatory systemsComputational Mathematics030104 developmental biologysymbolsA priori and a posterioriData miningArtificial intelligenceGraphical modelsSettore SECS-S/01 - StatisticabusinesscomputerNeisseriaAlgorithmsStatistical applications in genetics and molecular biology

researchProduct

Identification and visualization of differential isoform expression in RNA-seq time series

2018

Abstract Motivation As sequencing technologies improve their capacity to detect distinct transcripts of the same gene and to address complex experimental designs such as longitudinal studies, there is a need to develop statistical methods for the analysis of isoform expression changes in time series data. Results Iso-maSigPro is a new functionality of the R package maSigPro for transcriptomics time series data analysis. Iso-maSigPro identifies genes with a differential isoform usage across time. The package also includes new clustering and visualization functions that allow grouping of genes with similar expression patterns at the isoform level, as well as those genes with a shift in major …

0301 basic medicineStatistics and ProbabilityGene isoformIdentificationComputer scienceSequence analysisGene ExpressionRNA-SeqComputational biologyBiochemistryBioconductorTranscriptomeMice03 medical and health sciences0302 clinical medicineEstadística e Investigación OperativaRNA IsoformsAnimalsMolecular BiologyGeneVisualizationRegulation of gene expressionB-LymphocytesSequence Analysis RNAGene Expression ProfilingCell DifferentiationApplications NotesComputer Science ApplicationsVisualizationComputational Mathematics030104 developmental biologyGene Expression RegulationComputational Theory and MathematicsRNA-seq time seriesSoftware030217 neurology & neurosurgeryIsoform expression

researchProduct

The latent geometry of the human protein interaction network

2017

Abstract Motivation A series of recently introduced algorithms and models advocates for the existence of a hyperbolic geometry underlying the network representation of complex systems. Since the human protein interaction network (hPIN) has a complex architecture, we hypothesized that uncovering its latent geometry could ease challenging problems in systems biology, translating them into measuring distances between proteins. Results We embedded the hPIN to hyperbolic space and found that the inferred coordinates of nodes capture biologically relevant features, like protein age, function and cellular localization. This means that the representation of the hPIN in the two-dimensional hyperboli…

0301 basic medicineStatistics and ProbabilityGeometric analysisComputer scienceHyperbolic geometrySystems biologyComplex systemContext (language use)GeometryBiochemistryProtein–protein interaction03 medical and health sciencesInteraction networkHumansProtein Interaction MapsRepresentation (mathematics)Cluster analysisMolecular BiologySystems BiologyHyperbolic spaceProteinsFunction (mathematics)Original PapersComputer Science ApplicationsComputational Mathematics030104 developmental biologyComputational Theory and MathematicsEmbeddingSignal transductionAlgorithmsSignal Transduction

researchProduct

panISa: ab initio detection of insertion sequences in bacterial genomes from short read sequence data.

2018

Abstract Motivation The advent of next-generation sequencing has boosted the analysis of bacterial genome evolution. Insertion sequence (IS) elements play a key role in prokaryotic genome organization and evolution, but their repetitions in genomes complicate their detection from short-read data. Results PanISa is a software pipeline that identifies IS insertions ab initio in bacterial genomes from short-read data. It is a highly sensitive and precise tool based on the detection of read-mapping patterns at the insertion site. PanISa performs better than existing IS detection systems as it is based on a database-free approach. We applied it to a high-risk clone lineage of the pathogenic spec…

0301 basic medicineStatistics and ProbabilityLineage (genetic)Computer scienceAb initioComputational biologyBacterial genome size[INFO.INFO-SE]Computer Science [cs]/Software Engineering [cs.SE]BiochemistryGenome[INFO.INFO-IU]Computer Science [cs]/Ubiquitous Computing03 medical and health sciences[INFO.INFO-CR]Computer Science [cs]/Cryptography and Security [cs.CR][SDV.BBM.GTP]Life Sciences [q-bio]/Biochemistry Molecular Biology/Genomics [q-bio.GN]Insertion sequenceMolecular BiologyGenomic organizationHigh-Throughput Nucleotide SequencingSequence Analysis DNA[SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM][SDV.MP.BAC]Life Sciences [q-bio]/Microbiology and Parasitology/BacteriologyPipeline (software)[INFO.INFO-MO]Computer Science [cs]/Modeling and SimulationComputer Science ApplicationsComputational Mathematics030104 developmental biologyComputational Theory and Mathematics[INFO.INFO-MA]Computer Science [cs]/Multiagent Systems [cs.MA]DNA Transposable Elements[INFO.INFO-ET]Computer Science [cs]/Emerging Technologies [cs.ET][INFO.INFO-DC]Computer Science [cs]/Distributed Parallel and Cluster Computing [cs.DC]Genome BacterialSoftwareBioinformatics (Oxford, England)

researchProduct

A heuristic, iterative algorithm for change-point detection in abrupt change models

2017

Change-point detection in abrupt change models is a very challenging research topic in many fields of both methodological and applied Statistics. Due to strong irregularities, discontinuity and non-smootheness, likelihood based procedures are awkward; for instance, usual optimization methods do not work, and grid search algorithms represent the most used approach for estimation. In this paper a heuristic, iterative algorithm for approximate maximum likelihood estimation is introduced for change-point detection in piecewise constant regression models. The algorithm is based on iterative fitting of simple linear models, and appears to extend easily to more general frameworks, such as models i…

0301 basic medicineStatistics and ProbabilityMathematical optimizationIterative methodHeuristic (computer science)Linear model01 natural sciencesPiecewise constant model Approximate maximum likelihood Model linearization Grid search limitations010104 statistics & probability03 medical and health sciencesComputational MathematicsDiscontinuity (linguistics)030104 developmental biologyHyperparameter optimizationCovariatePiecewise0101 mathematicsStatistics Probability and UncertaintySettore SECS-S/01 - StatisticaChange detectionMathematics

researchProduct

The intrinsic combinatorial organization and information theoretic content of a sequence are correlated to the DNA encoded nucleosome organization of…

2015

Abstract Motivation: Thanks to research spanning nearly 30 years, two major models have emerged that account for nucleosome organization in chromatin: statistical and sequence specific. The first is based on elegant, easy to compute, closed-form mathematical formulas that make no assumptions of the physical and chemical properties of the underlying DNA sequence. Moreover, they need no training on the data for their computation. The latter is based on some sequence regularities but, as opposed to the statistical model, it lacks the same type of closed-form formulas that, in this case, should be based on the DNA sequence only. Results: We contribute to close this important methodological gap …

0301 basic medicineStatistics and ProbabilityNucleosome organizationComputational biologyBiologyType (model theory)BiochemistryGenomeDNA sequencing03 medical and health sciencesComputational Theory and MathematicNucleosomeMolecular BiologySequence (medicine)GeneticsGenomeSettore INF/01 - InformaticaEukaryotaComputer Science Applications1707 Computer Vision and Pattern RecognitionStatistical modelDNAChromatinNucleosomesComputer Science ApplicationsChromatinSettore BIO/18 - GeneticaComputational Mathematics030104 developmental biologyComputational Theory and MathematicsComputational MathematicBioinformatics

researchProduct

Reference genome assessment from a population scale perspective: an accurate profile of variability and noise.

2017

Abstract Motivation Current plant and animal genomic studies are often based on newly assembled genomes that have not been properly consolidated. In this scenario, misassembled regions can easily lead to false-positive findings. Despite quality control scores are included within genotyping protocols, they are usually employed to evaluate individual sample quality rather than reference sequence reliability. We propose a statistical model that combines quality control scores across samples in order to detect incongruent patterns at every genomic region. Our model is inherently robust since common artifact signals are expected to be shared between independent samples over misassembled regions …

0301 basic medicineStatistics and ProbabilityQuality ControlGenotypeComputer sciencemedia_common.quotation_subjectPopulationGenomicsBioinformaticscomputer.software_genreBiochemistryGenome03 medical and health sciencesGenetic variationAnimalsHumansQuality (business)AlleleeducationMolecular BiologyGenotypingReliability (statistics)media_commonProtocol (science)education.field_of_studyGenomeModels StatisticalGenetic VariationReproducibility of ResultsGenomicsGenome AnalysisOriginal PapersComputer Science ApplicationsComputational Mathematics030104 developmental biologyComputational Theory and MathematicsData miningcomputerSoftwareReference genome

researchProduct

dAPE: a web server to detect homorepeats and follow their evolution.

2016

Abstract Summary Homorepeats are low complexity regions consisting of repetitions of a single amino acid residue. There is no current consensus on the minimum number of residues needed to define a functional homorepeat, nor even if mismatches are allowed. Here we present dAPE, a web server that helps following the evolution of homorepeats based on orthology information, using a sensitive but tunable cutoff to help in the identification of emerging homorepeats. Availability and Implementation dAPE can be accessed from http://cbdm-01.zdv.uni-mainz.de/∼munoz/polyx. Supplementary information Supplementary data are available at Bioinformatics online.

0301 basic medicineStatistics and ProbabilityRepetitive Sequences Amino AcidWeb serverInternetComputer sciencecomputer.software_genreBiochemistryApplications NotesComputer Science ApplicationsWorld Wide WebEvolution Molecular03 medical and health sciencesComputational Mathematics030104 developmental biologyComputational Theory and MathematicsAnimalsHumansData miningMolecular BiologycomputerSequence AlignmentSequence AnalysisSoftwareBioinformatics (Oxford, England)

researchProduct

AFS: identification and quantification of species composition by metagenomic sequencing

2017

Abstract Summary DNA-based methods to detect and quantify taxon composition in biological materials are often based on species-specific polymerase chain reaction, limited to detecting species targeted by the assay. Next-generation sequencing overcomes this drawback by untargeted shotgun sequencing of whole metagenomes at affordable cost. Here we present AFS, a software pipeline for quantification of species composition in food. AFS uses metagenomic shotgun sequencing and sequence read counting to infer species proportions. Using Illumina data from a reference sausage comprising four species, we reveal that AFS is independent of the sequencing assay and library preparation protocol. Cost-sav…

0301 basic medicineStatistics and ProbabilitySequence analysisLibrary preparationComputational biologyBiologyBioinformaticsBiochemistrylaw.invention03 medical and health sciences0404 agricultural biotechnologylawMolecular BiologyPolymerase chain reactionShotgun sequencingHigh-Throughput Nucleotide SequencingSequence Analysis DNA04 agricultural and veterinary sciencesAccession number (bioinformatics)040401 food scienceBiological materialsComputer Science ApplicationsComputational Mathematics030104 developmental biologyComputational Theory and MathematicsMetagenomicsFood MicrobiologyIdentification (biology)MetagenomicsSoftwareBioinformatics

researchProduct