Author: Davide Scaturro

0000000000117677

AUTHOR

Davide Scaturro

showing 6 related works from this author

Computational cluster validation for microarray data analysis: experimental assessment of Clest, Consensus Clustering, Figure of Merit, Gap Statistic…

2008

Abstract Background Inferring cluster structure in microarray datasets is a fundamental task for the so-called -omic sciences. It is also a fundamental question in Statistics, Data Analysis and Classification, in particular with regard to the prediction of the number of clusters in a dataset, usually established via internal validation measures. Despite the wealth of internal measures available in the literature, new ones have been recently proposed, some of them specifically for microarray data. Results We consider five such measures: Clest, Consensus (Consensus Clustering), FOM (Figure of Merit), Gap (Gap Statistics) and ME (Model Explorer), in addition to the classic WCSS (Within Cluster…

clustering microarray dataMicroarrayComputer scienceStatistics as Topiccomputer.software_genrelcsh:Computer applications to medicine. Medical informaticsBiochemistryStructural BiologyDatabases GeneticConsensus clusteringStatisticsCluster (physics)AnimalsCluster AnalysisHumansCluster analysislcsh:QH301-705.5Molecular BiologyOligonucleotide Array Sequence AnalysisStructure (mathematical logic)Microarray analysis techniquesApplied MathematicsComputational BiologyComputer Science ApplicationsBenchmarkingComputingMethodologies_PATTERNRECOGNITIONlcsh:Biology (General)Gene chip analysislcsh:R858-859.7Data miningDNA microarraycomputerAlgorithmsSoftwareResearch ArticleBMC Bioinformatics

researchProduct

ValWorkBench: an open source Java library for cluster validation, with applications to microarray data analysis.

2015

Background: Cluster analysis is one of the most well known activities in scientific investigation and the object of research in many disciplines, ranging from statistics to computer science. It is central to the life sciences due to the advent of high throughput technologies, e.g., classification of tumors. In particular, in cluster analysis, it is of relevance to assess cluster quality and to predict the number of clusters in a dataset, if any. This latter task is usually performed via internal validation measures. Despite their potentially important role, both the use of classic internal validation measures and the design of new ones, specific for microarray data, do not seem to have grea…

Software documentationInformation retrievalSettore INF/01 - Informaticabusiness.industryComputer scienceSoftware developmentAlgorithm engineeringHealth InformaticsPattern discovery in bioinformatics and biomedicinecomputer.software_genreData scienceSoftware metricComputer Science ApplicationsSoftware frameworkMicroarray cluster analysiSoftwareBioinformatics softwareSoftware constructionComponent-based software engineeringCluster AnalysisProgramming LanguagesbusinesscomputerSoftwareAlgorithmsComputer methods and programs in biomedicine

researchProduct

Textual data compression in computational biology: Algorithmic techniques

2012

Abstract In a recent review [R. Giancarlo, D. Scaturro, F. Utro, Textual data compression in computational biology: a synopsis, Bioinformatics 25 (2009) 1575–1586] the first systematic organization and presentation of the impact of textual data compression for the analysis of biological data has been given. Its main focus was on a systematic presentation of the key areas of bioinformatics and computational biology where compression has been used together with a technical presentation of how well-known notions from information theory have been adapted to successfully work on biological data. Rather surprisingly, the use of data compression is pervasive in computational biology. Starting from…

Biological dataData Compression Theory and Practice Alignment-free sequence comparison Entropy Huffman coding Hidden Markov Models Kolmogorov complexity Lempel–Ziv compressors Minimum Description Length principle Pattern discovery in bioinformatics Reverse engineering of biological networks Sequence alignmentSettore INF/01 - InformaticaGeneral Computer ScienceKolmogorov complexityComputer scienceSearch engine indexingComputational biologyInformation theoryInformation scienceTheoretical Computer ScienceTechnical PresentationEntropy (information theory)Data compressionComputer Science Review

researchProduct

Textual data compression in computational biology: a synopsis.

2009

Abstract Motivation: Textual data compression, and the associated techniques coming from information theory, are often perceived as being of interest for data communication and storage. However, they are also deeply related to classification and data mining and analysis. In recent years, a substantial effort has been made for the application of textual data compression techniques to various computational biology tasks, ranging from storage and indexing of large datasets to comparison and reverse engineering of biological networks. Results: The main focus of this review is on a systematic presentation of the key areas of bioinformatics and computational biology where compression has been use…

Statistics and ProbabilityDatabases Factualbusiness.industryComputer sciencemedia_common.quotation_subjectSearch engine indexingcompression dataComputational BiologyInformation Storage and RetrievalComputational biologyBiochemistryData scienceComputer Science ApplicationsComputational MathematicsPresentationSoftwareComputational Theory and MathematicsBenchmark (computing)businessMolecular BiologyBiological networkSoftwareData compressionmedia_commonBioinformatics (Oxford, England)

researchProduct

A Tutorial on Computational Cluster Analysis with Applications to Pattern Discovery in Microarray Data

2008

Background Inferring cluster structure in microarray datasets is a fundamental task for the so-called -omic sciences. It is also a fundamental question in Statistics, Data Analysis and Classification, in particular with regard to the prediction of the number of clusters in a dataset, usually established via internal validation measures. Despite the wealth of internal measures available in the literature, new ones have been recently proposed, some of them specifically for microarray data. Results We consider five such measures: Clest, Consensus (Consensus Clustering), FOM (Figure of Merit), Gap (Gap Statistics) and ME (Model Explorer), in addition to the classic WCSS (Within Cluster Sum-of-S…

Microarray analysis techniquesComputer scienceApplied Mathematicscomputer.software_genreDisease clusterClusteringComputational MathematicsComputingMethodologies_PATTERNRECOGNITIONComputational Theory and MathematicsGene chip analysisMicroarray databasesData miningDNA microarrayCluster analysiscomputerMathematics in Computer Science

researchProduct

GenClust: A genetic algorithm for clustering gene expression data

2005

Abstract Background Clustering is a key step in the analysis of gene expression data, and in fact, many classical clustering algorithms are used, or more innovative ones have been designed and validated for the task. Despite the widespread use of artificial intelligence techniques in bioinformatics and, more generally, data analysis, there are very few clustering algorithms based on the genetic paradigm, yet that paradigm has great potential in finding good heuristic solutions to a difficult optimization problem such as clustering. Results GenClust is a new genetic algorithm for clustering gene expression data. It has two key features: (a) a novel coding of the search space that is simple, …

Clustering high-dimensional dataDNA ComplementaryComputer scienceRand indexCorrelation clusteringOligonucleotidesEvolutionary algorithmlcsh:Computer applications to medicine. Medical informaticscomputer.software_genreBiochemistryPattern Recognition AutomatedBiclusteringOpen Reading FramesStructural BiologyCURE data clustering algorithmConsensus clusteringGenetic algorithmCluster AnalysisCluster analysislcsh:QH301-705.5Molecular BiologyGene expression data Clustering Evolutionary algorithmsOligonucleotide Array Sequence AnalysisModels StatisticalBrown clusteringHeuristicGene Expression ProfilingApplied MathematicsComputational BiologyComputer Science Applicationslcsh:Biology (General)Gene Expression RegulationMutationlcsh:R858-859.7Data miningSequence AlignmentcomputerSoftwareAlgorithmsBMC Bioinformatics

researchProduct