Search results for " data"

showing 10 items of 7516 documents

Variable Ranking Feature Selection for the Identification of Nucleosome Related Sequences

2018

Several recent works have shown that K-mer sequence representation of a DNA sequence can be used for classification or identification of nucleosome positioning related sequences. This representation can be computationally expensive when k grows, making the complexity in spaces of exponential dimension. This issue effects significantly the classification task computed by a general machine learning algorithm used for the purpose of sequence classification. In this paper, we investigate the advantage offered by the so-called Variable Ranking Feature Selection method to select the most informative k − mers associated to a set of DNA sequences, for the final purpose of nucleosome/linker classifi…

0301 basic medicineSequenceSettore INF/01 - InformaticaEpigenomic030102 biochemistry & molecular biologybusiness.industryComputer scienceDeep learningPattern recognitionFeature selectionDNA sequencesNucleosomesRanking (information retrieval)Set (abstract data type)03 medical and health sciencesVariable (computer science)030104 developmental biologyDimension (vector space)Feature selectionDeep learning modelsArtificial intelligenceDeep learning models Feature selection DNA sequences Epigenomic NucleosomesRepresentation (mathematics)business
researchProduct

Block Sorting-Based Transformations on Words: Beyond the Magic BWT

2018

The Burrows-Wheeler Transform (BWT) is a word transformation introduced in 1994 for Data Compression and later results have contributed to make it a fundamental tool for the design of self-indexing compressed data structures. The Alternating Burrows-Wheeler Transform (ABWT) is a more recent transformation, studied in the context of Combinatorics on Words, that works in a similar way, using an alternating lexicographical order instead of the usual one. In this paper we study a more general class of block sorting-based transformations. The transformations in this new class prove to be interesting combinatorial tools that offer new research perspectives. In particular, we show that all the tra…

0301 basic medicineSettore INF/01 - InformaticaComputer scienceData_CODINGANDINFORMATIONTHEORY0102 computer and information sciencesBlock sortingData structureLexicographical order01 natural sciencesUpper and lower bounds03 medical and health sciencesCombinatorics on words030104 developmental biology010201 computation theory & mathematicsArithmeticCompressed Data Structures Block Sorting Combinatorics on Words AlgorithmsData compression
researchProduct

Combining multiple hypothesis testing with machine learning increases the statistical power of genome-wide association studies

2016

Mieth, Bettina et al.

0301 basic medicineStatistical methodsComputer scienceGenome-wide association studyMachine learningcomputer.software_genreGenome-wide association studiesStatistical powerArticle[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]Set (abstract data type)03 medical and health sciences[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG][MATH.MATH-ST]Mathematics [math]/Statistics [math.ST]10007 Department of EconomicsStatistical significanceReplication (statistics)genomeStatistical hypothesis testingGenetic association1000 MultidisciplinaryMultidisciplinarybusiness.industryComputational scienceInstitut für Mathematik330 EconomicsSupport vector machine030104 developmental biologyMultiple comparisons problemwide association studiesstatistical methodsArtificial intelligencebusinesscomputer
researchProduct

SpaceScanner: COPASI wrapper for automated management of global stochastic optimization experiments

2017

Abstract Motivation Due to their universal applicability, global stochastic optimization methods are popular for designing improvements of biochemical networks. The drawbacks of global stochastic optimization methods are: (i) no guarantee of finding global optima, (ii) no clear optimization run termination criteria and (iii) no criteria to detect stagnation of an optimization run. The impact of these drawbacks can be partly compensated by manual work that becomes inefficient when the solution space is large due to combinatorial explosion of adjustable parameters or for other reasons. Results SpaceScanner uses parallel optimization runs for automatic termination of optimization tasks in case…

0301 basic medicineStatistics and ProbabilityComputer science0206 medical engineeringComputational Biology02 engineering and technologycomputer.software_genreModels BiologicalBiochemistryComputer Science ApplicationsSet (abstract data type)03 medical and health sciencesComputational Mathematics030104 developmental biologyComputational Theory and MathematicsStochastic optimizationData miningMolecular BiologycomputerSoftware020602 bioinformaticsCombinatorial explosionBioinformatics
researchProduct

Gene-based and semantic structure of the Gene Ontology as a complex network

2012

The last decade has seen the advent and consolidation of ontology based tools for the identification and biological interpretation of classes of genes, such as the Gene Ontology. The information accumulated time-by-time and included in the GO is encoded in the definition of terms and in the setting up of semantic relations amongst terms. This approach might be usefully complemented by a bottom-up approach based on the knowledge of relationships amongst genes. To this end, we investigate the Gene Ontology from a complex network perspective. We consider the semantic network of terms naturally associated with the semantic relationships provided by the Gene Ontology consortium and a gene-based …

0301 basic medicineStatistics and ProbabilityFOS: Computer and information sciencesPhysics - Physics and SocietyComplex systemComputer scienceMolecular Networks (q-bio.MN)Complex systemFOS: Physical sciencesNetworkCondensed Matter PhysicPhysics and Society (physics.soc-ph)computer.software_genreQuantitative Biology - Quantitative MethodsStatistics - ApplicationsGeneSemantic network03 medical and health sciencesSemantic similarityQuantitative Biology - Molecular NetworksApplications (stat.AP)GeneQuantitative Methods (q-bio.QM)Community detectionGene ontologybusiness.industryOntologyOntology-based data integrationComplex networkCondensed Matter PhysicsBipartite system030104 developmental biologyBipartite system; Community detection; Complex systems; Genes; Networks; Ontology; Condensed Matter Physics; Statistics and ProbabilityFOS: Biological sciencesOntologyWeighted networkData miningArtificial intelligenceComputingMethodologies_GENERALbusinesscomputerNatural language processing
researchProduct

L1-Penalized Censored Gaussian Graphical Model

2018

Graphical lasso is one of the most used estimators for inferring genetic networks. Despite its diffusion, there are several fields in applied research where the limits of detection of modern measurement technologies make the use of this estimator theoretically unfounded, even when the assumption of a multivariate Gaussian distribution is satisfied. Typical examples are data generated by polymerase chain reactions and flow cytometer. The combination of censoring and high-dimensionality make inference of the underlying genetic networks from these data very challenging. In this article, we propose an $\ell_1$-penalized Gaussian graphical model for censored data and derive two EM-like algorithm…

0301 basic medicineStatistics and ProbabilityFOS: Computer and information sciencesgraphical lassoComputer scienceGaussianNormal DistributionInferenceMultivariate normal distribution01 natural sciencesMethodology (stat.ME)010104 statistics & probability03 medical and health sciencessymbols.namesakeGraphical LassoExpectation–maximization algorithmHumansComputer SimulationGene Regulatory NetworksGraphical model0101 mathematicsStatistics - MethodologyEstimation theoryReverse Transcriptase Polymerase Chain ReactionEstimatorexpectation-maximization algorithmGeneral MedicineCensoring (statistics)High-dimensional datahigh-dimensional dataGaussian graphical model030104 developmental biologysymbolscensored dataCensored dataExpectation-Maximization algorithmStatistics Probability and UncertaintySettore SECS-S/01 - StatisticaAlgorithmAlgorithms
researchProduct

LEGO-based generalized set of two linear algebraic 3D bio-macro-molecular descriptors: Theory and validation by QSARs

2019

Abstract Novel 3D protein descriptors based on bilinear, quadratic and linear algebraic maps in R n are proposed. The latter employs the kth 2-tuple (dis) similarity matrix to codify information related to covalent and non-covalent interactions in these biopolymers. The calculation of the inter-amino acid distances is generalized by using several dis-similarity coefficients, where normalization procedures based on the simple stochastic and mutual probability schemes are applied. A new local-fragment approach based on amino acid-types and amino acid-groups is proposed to characterize regions of interest in proteins. Topological and geometric macromolecular cutoffs are defined using local and…

0301 basic medicineStatistics and ProbabilityNormalization (statistics)GeneralizationQuantitative Structure-Activity RelationshipGeneral Biochemistry Genetics and Molecular Biology03 medical and health sciences0302 clinical medicineLinear regressionAmino AcidsMathematicsGeneral Immunology and MicrobiologyApplied MathematicsStatistical parameterProteinsGeneral MedicineCollinearityStructural Classification of Proteins databaseSupport vector machine030104 developmental biologyModeling and SimulationTest setLinear ModelsGeneral Agricultural and Biological SciencesAlgorithmSoftware030217 neurology & neurosurgeryJournal of Theoretical Biology
researchProduct

REGGAE : a novel approach for the identification of key transcriptional regulators

2019

Abstract Motivation Transcriptional regulators play a major role in most biological processes. Alterations in their activities are associated with a variety of diseases and in particular with tumor development and progression. Hence, it is important to assess the effects of deregulated regulators on pathological processes. Results Here, we present REGulator-Gene Association Enrichment (REGGAE), a novel method for the identification of key transcriptional regulators that have a significant effect on the expression of a given set of genes, e.g. genes that are differentially expressed between two sample groups. REGGAE uses a Kolmogorov–Smirnov-like test statistic that implicitly combines assoc…

0301 basic medicineStatistics and ProbabilityTranscription Genetic610Computational biologyBiologyBiochemistry03 medical and health sciencesNeoplasmsHumansTwo sampleMolecular BiologyGeneProbabilitySupplementary dataRegulation of gene expressionSystems Biology500Original PapersComputer Science Applications004Computational Mathematics030104 developmental biologyComputational Theory and MathematicsGene Expression RegulationKey (cryptography)Identification (biology)FemaleSoftware
researchProduct

Screening of potent phytochemical inhibitors against SARS-CoV-2 protease and its two Asian mutants

2021

Abstract Background COVID-19, declared a pandemic in March 2020 by the World Health Organization is caused by Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2). The virus has already killed more than 2.3 million people worldwide. Object The principal intent of this work was to investigate lead compounds by screening natural product library (NPASS) for possible treatment of COVID-19. Methods Pharmacophore features were used to screen a large database to get a small dataset for structure-based virtual screening of natural product compounds. In the structure-based screening, molecular docking was performed to find a potent inhibitor molecule against the main protease (Mpro) of SARS-…

0301 basic medicineStereochemistrymedicine.medical_treatmentPhytochemicalsProtein Data Bank (RCSB PDB)Health Informaticsmedicine.disease_causeMolecular Docking SimulationAntiviral AgentsArticleDocking03 medical and health scienceschemistry.chemical_compound0302 clinical medicinemedicineHumansProtease InhibitorsCoronavirusVirtual screeningNatural productsProteaseChemistrySARS-CoV-2COVID-19Computer Science ApplicationsProteaseCoronavirusMolecular Docking Simulation030104 developmental biologyDocking (molecular)PharmacophoreLead compound030217 neurology & neurosurgeryMproPeptide HydrolasesComputers in Biology and Medicine
researchProduct

Identification of control targets in Boolean molecular network models via computational algebra

2015

Motivation: Many problems in biomedicine and other areas of the life sciences can be characterized as control problems, with the goal of finding strategies to change a disease or otherwise undesirable state of a biological system into another, more desirable, state through an intervention, such as a drug or other therapeutic treatment. The identification of such strategies is typically based on a mathematical model of the process to be altered through targeted control inputs. This paper focuses on processes at the molecular level that determine the state of an individual cell, involving signaling or gene regulation. The mathematical model type considered is that of Boolean networks. The pot…

0301 basic medicineTheoretical computer scienceComputer scienceProcess (engineering)Molecular Networks (q-bio.MN)Systems biologySystem of polynomial equationsENCODEBoolean networksSet (abstract data type)03 medical and health sciences0302 clinical medicineStructural BiologyModelling and SimulationQuantitative Biology - Molecular NetworksMolecular BiologyEdge deletionsApplied MathematicsComputer Science ApplicationsNetwork controlIdentification (information)030104 developmental biologyBoolean networkBlocking transitionsFOS: Biological sciencesModeling and SimulationAlgebraic controlState (computer science)030217 neurology & neurosurgeryResearch ArticleBMC Systems Biology
researchProduct