Search results for " Computer"

showing 10 items of 6910 documents

Assessing statistical significance in multivariable genome wide association analysis

2016

Motivation: Although Genome Wide Association Studies (GWAS) genotype a very large number of single nucleotide polymorphisms (SNPs), the data are often analyzed one SNP at a time. The low predictive power of single SNPs, coupled with the high significance threshold needed to correct for multiple testing, greatly decreases the power of GWAS. Results: We propose a procedure in which all the SNPs are analyzed in a multiple generalized linear model, and we show its use for extremely high-dimensional datasets. Our method yields P-values for assessing significance of single SNPs or groups of SNPs while controlling for all other SNPs and the family wise error rate (FWER). Thus, our method tests whe…

0301 basic medicineStatistics and Probability1303 BiochemistryGenotypeOperations researchLibrary sciencePolymorphism Single NucleotideBiochemistryGerman03 medical and health sciences10007 Department of EconomicsPolitical scienceGenome-Wide Association Analysis1312 Molecular Biology1706 Computer Science ApplicationsCluster AnalysisHumansComputer Simulation2613 Statistics and ProbabilityMolecular BiologyEuropean researchGenetics and Population AnalysisComputational BiologyReproducibility of ResultsOriginal Paperslanguage.human_languageComputer Science Applications330 EconomicsComputational MathematicsPhenotype030104 developmental biologyComputational Theory and MathematicsLinear Modelslanguage2605 Computational MathematicsGenome-Wide Association Study1703 Computational Theory and Mathematics
researchProduct

Gene-based and semantic structure of the Gene Ontology as a complex network

2012

The last decade has seen the advent and consolidation of ontology based tools for the identification and biological interpretation of classes of genes, such as the Gene Ontology. The information accumulated time-by-time and included in the GO is encoded in the definition of terms and in the setting up of semantic relations amongst terms. This approach might be usefully complemented by a bottom-up approach based on the knowledge of relationships amongst genes. To this end, we investigate the Gene Ontology from a complex network perspective. We consider the semantic network of terms naturally associated with the semantic relationships provided by the Gene Ontology consortium and a gene-based …

0301 basic medicineStatistics and ProbabilityFOS: Computer and information sciencesPhysics - Physics and SocietyComplex systemComputer scienceMolecular Networks (q-bio.MN)Complex systemFOS: Physical sciencesNetworkCondensed Matter PhysicPhysics and Society (physics.soc-ph)computer.software_genreQuantitative Biology - Quantitative MethodsStatistics - ApplicationsGeneSemantic network03 medical and health sciencesSemantic similarityQuantitative Biology - Molecular NetworksApplications (stat.AP)GeneQuantitative Methods (q-bio.QM)Community detectionGene ontologybusiness.industryOntologyOntology-based data integrationComplex networkCondensed Matter PhysicsBipartite system030104 developmental biologyBipartite system; Community detection; Complex systems; Genes; Networks; Ontology; Condensed Matter Physics; Statistics and ProbabilityFOS: Biological sciencesOntologyWeighted networkData miningArtificial intelligenceComputingMethodologies_GENERALbusinesscomputerNatural language processing
researchProduct

L1-Penalized Censored Gaussian Graphical Model

2018

Graphical lasso is one of the most used estimators for inferring genetic networks. Despite its diffusion, there are several fields in applied research where the limits of detection of modern measurement technologies make the use of this estimator theoretically unfounded, even when the assumption of a multivariate Gaussian distribution is satisfied. Typical examples are data generated by polymerase chain reactions and flow cytometer. The combination of censoring and high-dimensionality make inference of the underlying genetic networks from these data very challenging. In this article, we propose an $\ell_1$-penalized Gaussian graphical model for censored data and derive two EM-like algorithm…

0301 basic medicineStatistics and ProbabilityFOS: Computer and information sciencesgraphical lassoComputer scienceGaussianNormal DistributionInferenceMultivariate normal distribution01 natural sciencesMethodology (stat.ME)010104 statistics & probability03 medical and health sciencessymbols.namesakeGraphical LassoExpectation–maximization algorithmHumansComputer SimulationGene Regulatory NetworksGraphical model0101 mathematicsStatistics - MethodologyEstimation theoryReverse Transcriptase Polymerase Chain ReactionEstimatorexpectation-maximization algorithmGeneral MedicineCensoring (statistics)High-dimensional datahigh-dimensional dataGaussian graphical model030104 developmental biologysymbolscensored dataCensored dataExpectation-Maximization algorithmStatistics Probability and UncertaintySettore SECS-S/01 - StatisticaAlgorithmAlgorithms
researchProduct

The intrinsic combinatorial organization and information theoretic content of a sequence are correlated to the DNA encoded nucleosome organization of…

2015

Abstract Motivation: Thanks to research spanning nearly 30 years, two major models have emerged that account for nucleosome organization in chromatin: statistical and sequence specific. The first is based on elegant, easy to compute, closed-form mathematical formulas that make no assumptions of the physical and chemical properties of the underlying DNA sequence. Moreover, they need no training on the data for their computation. The latter is based on some sequence regularities but, as opposed to the statistical model, it lacks the same type of closed-form formulas that, in this case, should be based on the DNA sequence only. Results: We contribute to close this important methodological gap …

0301 basic medicineStatistics and ProbabilityNucleosome organizationComputational biologyBiologyType (model theory)BiochemistryGenomeDNA sequencing03 medical and health sciencesComputational Theory and MathematicNucleosomeMolecular BiologySequence (medicine)GeneticsGenomeSettore INF/01 - InformaticaEukaryotaComputer Science Applications1707 Computer Vision and Pattern RecognitionStatistical modelDNAChromatinNucleosomesComputer Science ApplicationsChromatinSettore BIO/18 - GeneticaComputational Mathematics030104 developmental biologyComputational Theory and MathematicsComputational MathematicBioinformatics
researchProduct

Simulation-based estimation of branching models for LTR retrotransposons

2017

Abstract Motivation LTR retrotransposons are mobile elements that are able, like retroviruses, to copy and move inside eukaryotic genomes. In the present work, we propose a branching model for studying the propagation of LTR retrotransposons in these genomes. This model allows us to take into account both the positions and the degradation level of LTR retrotransposons copies. In our model, the duplication rate is also allowed to vary with the degradation level. Results Various functions have been implemented in order to simulate their spread and visualization tools are proposed. Based on these simulation tools, we have developed a first method to evaluate the parameters of this propagation …

0301 basic medicineStatistics and ProbabilitySource codeTheoretical computer scienceRetroelementsmedia_common.quotation_subjectRetrotransposon[INFO.INFO-SE]Computer Science [cs]/Software Engineering [cs.SE]BiologyBiochemistryGenomeChromosomesBranching (linguistics)[INFO.INFO-IU]Computer Science [cs]/Ubiquitous Computing03 medical and health sciences[INFO.INFO-CR]Computer Science [cs]/Cryptography and Security [cs.CR]SoftwareAnimalsComputer SimulationMolecular BiologyComputingMilieux_MISCELLANEOUSmedia_commoncomputer.programming_languageGeneticsGenomeModels Geneticbusiness.industry[SDV.BID.EVO]Life Sciences [q-bio]/Biodiversity/Populations and Evolution [q-bio.PE]Python (programming language)[SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM][INFO.INFO-MO]Computer Science [cs]/Modeling and SimulationComputer Science ApplicationsVisualizationComputational Mathematics030104 developmental biologyDrosophila melanogasterComputational Theory and Mathematics[INFO.INFO-MA]Computer Science [cs]/Multiagent Systems [cs.MA]Programming Languages[INFO.INFO-ET]Computer Science [cs]/Emerging Technologies [cs.ET]Mobile genetic elements[INFO.INFO-DC]Computer Science [cs]/Distributed Parallel and Cluster Computing [cs.DC]businesscomputerSoftware
researchProduct

An effective extension of the applicability of alignment-free biological sequence comparison algorithms with Hadoop

2016

Alignment-free methods are one of the mainstays of biological sequence comparison, i.e., the assessment of how similar two biological sequences are to each other, a fundamental and routine task in computational biology and bioinformatics. They have gained popularity since, even on standard desktop machines, they are faster than methods based on alignments. However, with the advent of Next-Generation Sequencing Technologies, datasets whose size, i.e., number of sequences and their total length, is a challenge to the execution of alignment-free methods on those standard machines are quite common. Here, we propose the first paradigm for the computation of k-mer-based alignment-free methods for…

0301 basic medicineTheoretical computer science030102 biochemistry & molecular biologySettore INF/01 - InformaticaComputer scienceComputationExtension (predicate logic)Information SystemHash tableDistributed computingTask (project management)Theoretical Computer Science03 medical and health sciences030104 developmental biologyAlignment-free sequence comparison and analysisHadoopHardware and Architecturealignment-free sequence comparison and analysis; distributed computing; Hadoop; MapReduce; software; theoretical computer science; information systems; hardware and architectureSequence comparisonMapReduceAlignment-free sequence comparison and analysiAlignment-free sequence comparison and analysis; Distributed computing; Hadoop; MapReduce; Theoretical Computer Science; Software; Information Systems; Hardware and ArchitectureSoftwareInformation Systems
researchProduct

Parallel and Space-Efficient Construction of Burrows-Wheeler Transform and Suffix Array for Big Genome Data

2016

Next-generation sequencing technologies have led to the sequencing of more and more genomes, propelling related research into the era of big data. In this paper, we present ParaBWT, a parallelized Burrows-Wheeler transform (BWT) and suffix array construction algorithm for big genome data. In ParaBWT, we have investigated a progressive construction approach to constructing the BWT of single genome sequences in linear space complexity, but with a small constant factor. This approach has been further parallelized using multi-threading based on a master-slave coprocessing model. After gaining the BWT, the suffix array is constructed in a memory-efficient manner. The performance of ParaBWT has b…

0301 basic medicineTheoretical computer scienceBurrows–Wheeler transformComputer scienceGenomicsData_CODINGANDINFORMATIONTHEORYParallel computingGenomelaw.invention03 medical and health scienceslawGeneticsHumansEnsemblMulti-core processorApplied MathematicsLinear spaceSuffix arrayChromosome MappingHigh-Throughput Nucleotide SequencingGenomicsSequence Analysis DNA030104 developmental biologyAlgorithmsBiotechnologyReference genomeIEEE/ACM Transactions on Computational Biology and Bioinformatics
researchProduct

QuBiLS-MAS, open source multi-platform software for atom- and bond-based topological (2D) and chiral (2.5D) algebraic molecular descriptors computati…

2017

Background In previous reports, Marrero-Ponce et al. proposed algebraic formalisms for characterizing topological (2D) and chiral (2.5D) molecular features through atom- and bond-based ToMoCoMD-CARDD (acronym for Topological Molecular Computational Design-Computer Aided Rational Drug Design) molecular descriptors. These MDs codify molecular information based on the bilinear, quadratic and linear algebraic forms and the graph-theoretical electronic-density and edge-adjacency matrices in order to consider atom- and bond-based relations, respectively. These MDs have been successfully applied in the screening of chemical compounds of different therapeutic applications ranging from antimalarials…

0301 basic medicineTheoretical computer scienceComputer scienceBilinear interpolationLibrary and Information SciencesTopologyLinear01 natural scienceslcsh:ChemistryToMoCoMD-CARDDDouble stochastic03 medical and health sciencesMatrix (mathematics)SoftwareQuadratic equationMolecular descriptorAtom/bond-based molecular descriptorPhysical and Theoretical ChemistryAlgebraic numberSimple stochasticFree and open source softwarelcsh:T58.5-58.64lcsh:Information technologybusiness.industryQSARMutual probability matricesComputer Graphics and Computer-Aided DesignRotation formalisms in three dimensions0104 chemical sciencesComputer Science Applications010404 medicinal & biomolecular chemistry030104 developmental biologylcsh:QD1-999CheminformaticsBilinear and quadratic indicesbusinessNon-stochasticSoftwareQuBiLS-MASJournal of cheminformatics
researchProduct

Identification of control targets in Boolean molecular network models via computational algebra

2015

Motivation: Many problems in biomedicine and other areas of the life sciences can be characterized as control problems, with the goal of finding strategies to change a disease or otherwise undesirable state of a biological system into another, more desirable, state through an intervention, such as a drug or other therapeutic treatment. The identification of such strategies is typically based on a mathematical model of the process to be altered through targeted control inputs. This paper focuses on processes at the molecular level that determine the state of an individual cell, involving signaling or gene regulation. The mathematical model type considered is that of Boolean networks. The pot…

0301 basic medicineTheoretical computer scienceComputer scienceProcess (engineering)Molecular Networks (q-bio.MN)Systems biologySystem of polynomial equationsENCODEBoolean networksSet (abstract data type)03 medical and health sciences0302 clinical medicineStructural BiologyModelling and SimulationQuantitative Biology - Molecular NetworksMolecular BiologyEdge deletionsApplied MathematicsComputer Science ApplicationsNetwork controlIdentification (information)030104 developmental biologyBoolean networkBlocking transitionsFOS: Biological sciencesModeling and SimulationAlgebraic controlState (computer science)030217 neurology & neurosurgeryResearch ArticleBMC Systems Biology
researchProduct

Ultra-Fast Detection of Higher-Order Epistatic Interactions on GPUs

2017

Detecting higher-order epistatic interactions in Genome-Wide Association Studies (GWAS) remains a challenging task in the fields of genetic epidemiology and computer science. A number of algorithms have recently been proposed for epistasis discovery. However, they suffer from a high computational cost since statistical measures have to be evaluated for each possible combination of markers. Hence, many algorithms use additional filtering stages discarding potentially non-interacting markers in order to reduce the overall number of combinations to be examined. Among others, Mutual Information Clustering (MIC) is a common pre-processing filter for grouping markers into partitions using K-Means…

0301 basic medicineTheoretical computer scienceComputer sciencebusiness.industryContrast (statistics)Genome-wide association study02 engineering and technologyMutual informationMachine learningcomputer.software_genreReduction (complexity)03 medical and health sciences030104 developmental biologyGenetic epidemiology0202 electrical engineering electronic engineering information engineeringEpistasis020201 artificial intelligence & image processingArtificial intelligenceCluster analysisbusinesscomputerGenetic association
researchProduct