Search results for "DATA MINING"

showing 10 items of 907 documents

EvalMSA: A Program to Evaluate Multiple Sequence Alignments and Detect Outliers

2016

8 páginas, 3 figuras, 2 tablas.

0301 basic medicineBiologiaComputer sciencemedia_common.quotation_subjectlcsh:EvolutionBinary numberGappinesscomputer.software_genre03 medical and health scienceslcsh:QH359-425GeneticsQuality (business)Relevance (information retrieval)Ecology Evolution Behavior and SystematicsOriginal Researchgappinessoutlier sequencecomputer.programming_languagemedia_commonSequenceMultiple sequence alignmentOutlier sequenceData scienceComputer Science ApplicationsIdentification (information)030104 developmental biologyOutliermultiple sequence alignmentMultiple sequence alignmentData miningPerlcomputerProgrames d'ordinadorEvolutionary Bioinformatics
researchProduct

Discriminating graph pattern mining from gene expression data

2016

We consider the problem of mining gene expression data in order to single out interesting features that characterize healthy/unhealthy samples of an input dataset. We present and approach based on a network model of the input gene expression data, where there is a labelled graph for each sample. To the best of our knowledge, this is the first attempt to build a different graph for each sample and, then, to have a database of graphs for representing a sample set. Out main goal is that of singling out interesting differences between healthy and unhealthy samples, through the extraction of "discriminating patterns" among graphs belonging to the two different sample sets. Differently from the …

0301 basic medicineComputer science0206 medical engineeringOcean Engineering02 engineering and technologycomputer.software_genreGraph03 medical and health sciences030104 developmental biologyData miningcomputer020602 bioinformaticsBiological networkNetwork modelACM SIGAPP Applied Computing Review
researchProduct

Application of Graph Clustering and Visualisation Methods to Analysis of Biomolecular Data

2018

In this paper we present an approach based on integrated use of graph clustering and visualisation methods for semi-supervised discovery of biologically significant features in biomolecular data sets. We describe several clustering algorithms that have been custom designed for analysis of biomolecular data and feature an iterated two step approach involving initial computation of thresholds and other parameters used in clustering algorithms, which is followed by identification of connected graph components, and, if needed, by adjustment of clustering parameters for processing of individual subgraphs.

0301 basic medicineComputer scienceComputationcomputer.software_genreVisualization03 medical and health sciencesIdentification (information)ComputingMethodologies_PATTERNRECOGNITION030104 developmental biology0302 clinical medicineGraph drawingFeature (machine learning)Data miningCluster analysiscomputer030217 neurology & neurosurgeryConnectivityClustering coefficient
researchProduct

EFMviz

2020

Elementary Flux Modes (EFMs) are a tool for constraint-based modeling and metabolic network analysis. However, systematic and automated visualization of EFMs, capable of integrating various data types is still a challenge. In this study, we developed an extension for the widely adopted COBRA Toolbox, EFMviz, for analysis and graphical visualization of EFMs as networks of reactions, metabolites and genes. The analysis workflow offers a platform for EFM visualization to improve EFM interpretability by connecting COBRA toolbox with the network analysis and visualization software Cytoscape. The biological applicability of EFMviz is demonstrated in two use cases on medium (Escherichia coli, iAF1…

0301 basic medicineComputer scienceEndocrinology Diabetes and Metabolismgenome-scale metabolic modelslcsh:QR1-502computer.software_genreBiochemistryData typelcsh:MicrobiologySBML03 medical and health sciences0302 clinical medicineData visualizationGraph drawingProtocolACETATEdata visualizationCELLSBMLCYTOSCAPEMolecular BiologyGENE-EXPRESSIONSoftware visualizationbusiness.industryPATHWAY ANALYSISnetwork visualizationelementary flux modesToolboxVisualization030104 developmental biologyWorkflowDEFINITIONESCHERICHIA-COLIGROWTHData miningbusinesscomputerSET030217 neurology & neurosurgeryMetabolites
researchProduct

Data mining approaches to identify biomineralization related sequences.

2015

Proteomics is an efficient high throughput technique developed to identify proteins from a crude extract using sequence homology. Advances in Next Generation Sequencing (NGS) have led to increase knowledge of several non-model species. In the field of calcium carbonate biomineralization, the paucity of available sequences (such as the ones of mollusc shells) is still a bottleneck in most proteomic studies. Indeed, this technique needs proteins databases to find homology. The aim of this study was to perform different data mining approaches in order to identify novel shell proteins. To this end, we disposed of several publicly non-model molluscs databases. Previously identified molluscan she…

0301 basic medicineComputer scienceMechanical EngineeringProteomicscomputer.software_genre[ SDV.IB.BIO ] Life Sciences [q-bio]/Bioengineering/BiomaterialsBottleneckDNA sequencing[SDV.IB.BIO] Life Sciences [q-bio]/Bioengineering/Biomaterials03 medical and health sciencesAnnotation030104 developmental biologySequence homologyMechanics of Materials[ SDV.BBM.GTP ] Life Sciences [q-bio]/Biochemistry Molecular Biology/Genomics [q-bio.GN]Shell matrix[SDV.BBM.GTP] Life Sciences [q-bio]/Biochemistry Molecular Biology/Genomics [q-bio.GN]General Materials ScienceData miningKEGGcomputerComputingMilieux_MISCELLANEOUSBiomineralization
researchProduct

A new parallel pipeline for DNA methylation analysis of long reads datasets

2017

Background DNA methylation is an important mechanism of epigenetic regulation in development and disease. New generation sequencers allow genome-wide measurements of the methylation status by reading short stretches of the DNA sequence (Methyl-seq). Several software tools for methylation analysis have been proposed over recent years. However, the current trend is that the new sequencers and the ones expected for an upcoming future yield sequences of increasing length, making these software tools inefficient and obsolete. Results In this paper, we propose a new software based on a strategy for methylation analysis of Methyl-seq sequencing data that requires much shorter execution times while…

0301 basic medicineComputer scienceParallel pipelineADN02 engineering and technologycomputer.software_genreBiochemistrySensitivity and SpecificityDNA sequencingEpigenesis Genetic03 medical and health scienceschemistry.chemical_compoundStructural BiologyRNA analysisInformàticaDatabases Genetic0202 electrical engineering electronic engineering information engineeringHumansEpigeneticsMolecular Biology020203 distributed computingDNA methylationGenome HumanApplied MathematicsParallel pipelineMethylationSequence Analysis DNASupercomputerComputer Science ApplicationsGenòmica030104 developmental biologychemistryGene Expression RegulationDNA methylationMutationData miningHigh performance computingDNA microarraycomputerSequence AlignmentDNASoftware
researchProduct

Rocker: Open source, easy-to-use tool for AUC and enrichment calculations and ROC visualization

2016

Receiver operating characteristics (ROC) curve with the calculation of area under curve (AUC) is a useful tool to evaluate the performance of biomedical and chemoinformatics data. For example, in virtual drug screening ROC curves are very often used to visualize the efficiency of the used application to separate active ligands from inactive molecules. Unfortunately, most of the available tools for ROC analysis are implemented into commercially available software packages, or are plugins in statistical software, which are not always the easiest to use. Here, we present Rocker, a simple ROC curve visualization tool that can be used for the generation of publication quality images. Rocker also…

0301 basic medicineComputer scienceautomatic calculationLibrary and Information Sciencescomputer.software_genre01 natural sciences03 medical and health sciencesSoftwareArea under curvePlug-inPhysical and Theoretical ChemistryVirtual screeningReceiver operating characteristicbusiness.industryComputer Graphics and Computer-Aided Design0104 chemical sciencesComputer Science ApplicationsVisualizationreceiver operating characteristics010404 medicinal & biomolecular chemistryIdentification (information)ComputingMethodologies_PATTERNRECOGNITION030104 developmental biologyarea under curvesRockerCheminformaticsData miningbusinesscomputerSoftwaresoftwaresJournal of Cheminformatics
researchProduct

Reactome pathway analysis: a high-performance in-memory approach

2016

Reactome aims to provide bioinformatics tools for visualisation, interpretation and analysis of pathway knowledge to support basic research, genome analysis, modelling, systems biology and education. Pathway analysis methods have a broad range of applications in physiological and biomedical research; one of the main problems, from the analysis methods performance point of view, is the constantly increasing size of the data samples. Here, we present a new high-performance in-memory implementation of the well-established over-representation analysis method. To achieve the target, the over-representation analysis method is divided in four different steps and, for each of them, specific data st…

0301 basic medicineData structuresDatabases FactualPathway analysisComputer scienceInterface (Java)Systems biologycomputer.software_genreGenomeBiochemistry03 medical and health sciences0302 clinical medicineStructural BiologyNucleic AcidsHumansMolecular BiologyApplied MathematicsComputational BiologyProteinsPathway analysisComputer Science ApplicationsTree (data structure)030104 developmental biology030220 oncology & carcinogenesisGraph (abstract data type)Data miningOver-representation analysiscomputerAlgorithmsSoftwareBMC Bioinformatics
researchProduct

Applications of Chemoinformatics in Predictive Toxicology for Regulatory Purposes, Especially in the Context of the EU REACH Legislation

2018

Chemoinformatics methodologies such as QSAR/QSPR have been used for decades in drug discovery projects, especially for the finding of new compounds with therapeutic properties and the optimization of ADME properties on chemical series. The application of computational techniques in predictive toxicology is much more recent, and they are experiencing an increasingly interest because of the new legal requirements imposed by national and international regulations. In the pharmaceutical field, the US Food and Drug Administration (FDA) support the use of predictive models for regulatory decision-making when assessing the genotoxic and carcinogenic potential of drug impurities. In Europe, the REA…

0301 basic medicineEngineeringbusiness.industryManagement scienceLegislationContext (language use)Predictive toxicology010501 environmental sciencescomputer.software_genre01 natural sciences03 medical and health sciences030104 developmental biologyCheminformaticsData miningbusinesscomputer0105 earth and related environmental sciencesInternational Journal of Quantitative Structure-Property Relationships
researchProduct

Q-nexus: a comprehensive and efficient analysis pipeline designed for ChIP-nexus

2016

Background: ChIP-nexus, an extension of the ChIP-exo protocol, can be used to map the borders of protein-bound DNA sequences at nucleotide resolution, requires less input DNA and enables selective PCR duplicate removal using random barcodes. However, the use of random barcodes requires additional preprocessing of the mapping data, which complicates the computational analysis. To date, only a very limited number of software packages are available for the analysis of ChIP-exo data, which have not yet been systematically tested and compared on ChIP-nexus data. Results: Here, we present a comprehensive software package for ChIP-nexus data that exploits the random barcodes for selective removal …

0301 basic medicineFOS: Computer and information sciencesDuplication ratesChromatin ImmunoprecipitationBioinformaticsPipeline (computing)610Biologycomputer.software_genre600 Technik Medizin angewandte Wissenschaften::610 Medizin und Gesundheit03 medical and health sciencesSoftwareChIP-nexusGeneticsPreprocessorNucleotide MotifsLibrary complexityChIP-exoGeneticsProtocol (science)Binding Sitesbusiness.industryfungiComputational BiologyHigh-Throughput Nucleotide SequencingReproducibility of ResultsChipChromatin immunoprecipitationData mappingDNA-Binding ProteinsAlgorithm030104 developmental biologyChIP-exoData miningbusinessPeak callingcomputerAlgorithmsSoftwareProtein BindingTranscription FactorsResearch ArticleBiotechnologyBMC Genomics
researchProduct