Search results for "ComputingMethodologies_PATTERNRECOGNITION"

showing 10 items of 296 documents

Disease–Genes Must Guide Data Source Integration in the Gene Prioritization Process

2019

One of the main issues in detecting the genes involved in the etiology of genetic human diseases is the integration of different types of available functional relationships between genes. Numerous approaches exploited the complementary evidence coded in heterogeneous sources of data to prioritize disease-genes, such as functional profiles or expression quantitative trait loci, but none of them to our knowledge posed the scarcity of known disease-genes as a feature of their integration methodology. Nevertheless, in contexts where data are unbalanced, that is, where one class is largely under-represented, imbalance-unaware approaches may suffer a strong decrease in performance. We claim that …

0301 basic medicineClass (computer programming)Boosting (machine learning)Computer scienceProcess (engineering)media_common.quotation_subjectComputational biologyScarcity03 medical and health sciencesComputingMethodologies_PATTERNRECOGNITION030104 developmental biologyExpression quantitative trait lociKey (cryptography)Feature (machine learning)Gene prioritizationmedia_common

researchProduct

Application of Graph Clustering and Visualisation Methods to Analysis of Biomolecular Data

2018

In this paper we present an approach based on integrated use of graph clustering and visualisation methods for semi-supervised discovery of biologically significant features in biomolecular data sets. We describe several clustering algorithms that have been custom designed for analysis of biomolecular data and feature an iterated two step approach involving initial computation of thresholds and other parameters used in clustering algorithms, which is followed by identification of connected graph components, and, if needed, by adjustment of clustering parameters for processing of individual subgraphs.

0301 basic medicineComputer scienceComputationcomputer.software_genreVisualization03 medical and health sciencesIdentification (information)ComputingMethodologies_PATTERNRECOGNITION030104 developmental biology0302 clinical medicineGraph drawingFeature (machine learning)Data miningCluster analysiscomputer030217 neurology & neurosurgeryConnectivityClustering coefficient

researchProduct

SpCLUST: Towards a fast and reliable clustering for potentially divergent biological sequences

2019

International audience; This paper presents SpCLUST, a new C++ package that takes a list of sequences as input, aligns them with MUSCLE, computes their similarity matrix in parallel and then performs the clustering. SpCLUST extends a previously released software by integrating additional scoring matrices which enables it to cover the clustering of amino-acid sequences. The similarity matrix is now computed in parallel according to the master/slave distributed architecture, using MPI. Performance analysis, realized on two real datasets of 100 nucleotide sequences and 1049 amino-acids ones, show that the resulting library substantially outperforms the original Python package. The proposed pac…

researchProduct

Rocker: Open source, easy-to-use tool for AUC and enrichment calculations and ROC visualization

2016

Receiver operating characteristics (ROC) curve with the calculation of area under curve (AUC) is a useful tool to evaluate the performance of biomedical and chemoinformatics data. For example, in virtual drug screening ROC curves are very often used to visualize the efficiency of the used application to separate active ligands from inactive molecules. Unfortunately, most of the available tools for ROC analysis are implemented into commercially available software packages, or are plugins in statistical software, which are not always the easiest to use. Here, we present Rocker, a simple ROC curve visualization tool that can be used for the generation of publication quality images. Rocker also…

0301 basic medicineComputer scienceautomatic calculationLibrary and Information Sciencescomputer.software_genre01 natural sciences03 medical and health sciencesSoftwareArea under curvePlug-inPhysical and Theoretical ChemistryVirtual screeningReceiver operating characteristicbusiness.industryComputer Graphics and Computer-Aided Design0104 chemical sciencesComputer Science ApplicationsVisualizationreceiver operating characteristics010404 medicinal & biomolecular chemistryIdentification (information)ComputingMethodologies_PATTERNRECOGNITION030104 developmental biologyarea under curvesRockerCheminformaticsData miningbusinesscomputerSoftwaresoftwaresJournal of Cheminformatics

researchProduct

HIPPIE v2.0: Enhancing meaningfulness and reliability of protein-protein interaction networks

2016

The increasing number of experimentally detected interactions between proteins makes it difficult for researchers to extract the interactions relevant for specific biological processes or diseases. This makes it necessary to accompany the large-scale detection of protein-protein interactions (PPIs) with strategies and tools to generate meaningful PPI subnetworks. To this end, we generated the Human Integrated Protein-Protein Interaction rEference or HIPPIE (http://cbdm.uni-mainz.de/hippie/). HIPPIE is a one-stop resource for the generation and interpretation of PPI networks relevant to a specific research question. We provide means to generate highly reliable, context-specific PPI networks …

0301 basic medicineHippieReliability (computer networking)BiologyWeb BrowserBioinformaticsProtein protein interaction networkComputational biology03 medical and health sciences0302 clinical medicineResource (project management)GeneticsHumansDatabase IssueGraph algorithmsProtein Interaction MapsDatabases ProteinResearch questionGraphical user interfacebusiness.industryReproducibility of ResultsData science030104 developmental biologyComputingMethodologies_PATTERNRECOGNITIONProtein interaction mappingbusiness030217 neurology & neurosurgeryProtein Interaction MapSoftware

researchProduct

A multicenter study benchmarks software tools for label-free proteome quantification

2016

The consistent and accurate quantification of proteins by mass spectrometry (MS)-based proteomics depends on the performance of instruments, acquisition methods and data analysis software. In collaboration with the software developers, we evaluated OpenSWATH, SWATH2.0, Skyline, Spectronaut and DIA-Umpire, five of the most widely used software methods for processing data from SWATH-MS (sequential window acquisition of all theoretical fragment ion spectra), a method that uses data-independent acquisition (DIA) for label-free protein quantification. We analyzed high-complexity test datasets from hybrid proteome samples of defined quantitative composition acquired on two different MS instrument…

0301 basic medicineInternationalityProteomeComputer sciencemedia_common.quotation_subjectSoftware toolQuantitative proteomicsBiomedical EngineeringBioengineeringcomputer.software_genreBioinformaticsSensitivity and SpecificityApplied Microbiology and BiotechnologyArticleMass Spectrometry03 medical and health sciencesSoftwareQuality (business)media_commonLabel freeStaining and Labeling030102 biochemistry & molecular biologybusiness.industryReproducibility of ResultsBenchmarkingComputingMethodologies_PATTERNRECOGNITION030104 developmental biologyMulticenter studyProteomeBenchmark (computing)Molecular MedicineData miningbusinesscomputerAlgorithmsSoftwareBiotechnologyNature Biotechnology

researchProduct

A clustering package for nucleotide sequences using Laplacian Eigenmaps and Gaussian Mixture Model.

2018

International audience; In this article, a new Python package for nucleotide sequences clustering is proposed. This package, freely available on-line, implements a Laplacian eigenmap embedding and a Gaussian Mixture Model for DNA clustering. It takes nucleotide sequences as input, and produces the optimal number of clusters along with a relevant visualization. Despite the fact that we did not optimise the computational speed, our method still performs reasonably well in practice. Our focus was mainly on data analytics and accuracy and as a result, our approach outperforms the state of the art, even in the case of divergent sequences. Furthermore, an a priori knowledge on the number of clust…

researchProduct

Automated selection of homologs to track the evolutionary history of proteins

2018

Background The selection of distant homologs of a query protein under study is a usual and useful application of protein sequence databases. Such sets of homologs are often applied to investigate the function of a protein and the degree to which experimental results can be transferred from one organism to another. In particular, a variety of databases facilitates static browsing for orthologs. However, these resources have a limited power when identifying orthologs between taxonomically distant species. In addition, in some situations, for a given query protein, it is advantageous to compare the sets of orthologs from different specific organisms: this recursive step-wise search might give …

0301 basic medicineProteomeComputer scienceComputational biologyWeb toollcsh:Computer applications to medicine. Medical informaticsBiochemistryHomology (biology)Evolution Molecular03 medical and health sciences0302 clinical medicineProtein sequencingStructural BiologyHomologous chromosomeHumansDatabases ProteinMolecular Biologylcsh:QH301-705.5OrganismProtein functionMethodology ArticleApplied MathematicsProteinsA proteinComputer Science ApplicationsHomologyEvolutionary path030104 developmental biologyComputingMethodologies_PATTERNRECOGNITIONlcsh:Biology (General)Proteomelcsh:R858-859.7DNA microarraySoftware030217 neurology & neurosurgeryBMC Bioinformatics

researchProduct

Discovering discriminative graph patterns from gene expression data

2016

We consider the problem of mining gene expression data in order to single out interesting features characterizing healthy/unhealthy samples of an input dataset. We present an approach based on a network model of the input gene expression data, where there is a labelled graph for each sample. To the best of our knowledge, this is the first attempt to build a different graph for each sample and, then, to have a database of graphs for representing a sample set. Our main goal is that of singling out interesting differences between healthy and unhealthy samples, through the extraction of "discriminative patterns" among graphs belonging to the two different sample sets. Differently from the other…

0301 basic medicineSettore INF/01 - Informaticabusiness.industryComputer science0206 medical engineeringpattern discovery subgraph extraction biological networksPattern recognition02 engineering and technologyGraph03 medical and health sciencesComputingMethodologies_PATTERNRECOGNITION030104 developmental biologyDiscriminative modelGraph patternsArtificial intelligencebusiness020602 bioinformaticsBiological networkNetwork modelProceedings of the 31st Annual ACM Symposium on Applied Computing

researchProduct

Partitioned learning of deep Boltzmann machines for SNP data.

2016

Abstract Motivation Learning the joint distributions of measurements, and in particular identification of an appropriate low-dimensional manifold, has been found to be a powerful ingredient of deep leaning approaches. Yet, such approaches have hardly been applied to single nucleotide polymorphism (SNP) data, probably due to the high number of features typically exceeding the number of studied individuals. Results After a brief overview of how deep Boltzmann machines (DBMs), a deep learning approach, can be adapted to SNP data in principle, we specifically present a way to alleviate the dimensionality problem by partitioned learning. We propose a sparse regression approach to coarsely screen…

0301 basic medicineStatistics and ProbabilityComputer scienceMachine learningcomputer.software_genre01 natural sciencesBiochemistryPolymorphism Single NucleotideMachine Learning010104 statistics & probability03 medical and health sciencessymbols.namesakeJoint probability distributionHumans0101 mathematicsMolecular BiologyStatistical hypothesis testingArtificial neural networkbusiness.industryGene Expression Regulation LeukemicDeep learningUnivariateComputational BiologyManifoldComputer Science ApplicationsData setComputational Mathematics030104 developmental biologyComputingMethodologies_PATTERNRECOGNITIONComputational Theory and MathematicsLeukemia MyeloidBoltzmann constantsymbolsData miningArtificial intelligencebusinesscomputerSoftwareCurse of dimensionalityBioinformatics (Oxford, England)

researchProduct