Search results for "DATA MINING"

showing 10 items of 907 documents

A web application for the unspecific detection of differentially expressed DNA regions in strand-specific expression data

2015

Abstract Genomic technologies allow laboratories to produce large-scale data sets, either through the use of next-generation sequencing or microarray platforms. To explore these data sets and obtain maximum value from the data, researchers view their results alongside all the known features of a given reference genome. To study transcriptional changes that occur under a given condition, researchers search for regions of the genome that are differentially expressed between different experimental conditions. In order to identify these regions several algorithms have been developed over the years, along with some bioinformatic platforms that enable their use. However, currently available appli…

Statistics and ProbabilitySequence analysisADNGenomicsComputational biologyBiologycomputer.software_genreBiochemistryGenomeComputer GraphicsExpressió genèticaWeb applicationHumansMolecular BiologyGeneInternetMicroarray analysis techniquesbusiness.industryGenome HumanGene Expression ProfilingComputational BiologyHigh-Throughput Nucleotide SequencingDNAGenomicsSequence Analysis DNAComputer Science ApplicationsGene expression profilingComputational MathematicsGenòmicaComputingMethodologies_PATTERNRECOGNITIONComputational Theory and MathematicsData miningbusinesscomputerAlgorithmsGenèticaReference genome

researchProduct

The Power of Word-Frequency Based Alignment-Free Functions: a Comprehensive Large-Scale Experimental Analysis

2021

Abstract Motivation Alignment-free (AF) distance/similarity functions are a key tool for sequence analysis. Experimental studies on real datasets abound and, to some extent, there are also studies regarding their control of false positive rate (Type I error). However, assessment of their power, i.e. their ability to identify true similarity, has been limited to some members of the D2 family. The corresponding experimental studies have concentrated on short sequences, a scenario no longer adequate for current applications, where sequence lengths may vary considerably. Such a State of the Art is methodologically problematic, since information regarding a key feature such as power is either mi…

Statistics and ProbabilitySequenceSimilarity (geometry)Settore INF/01 - Informaticasequence analysisComputer sciencepower statisticsAlignment-Free Genomic Analysis Big Data Software Platforms Bioinformatics AlgorithmsScale (descriptive set theory)Function (mathematics)computer.software_genreBiochemistryComputer Science ApplicationsSet (abstract data type)Computational MathematicsRange (mathematics)Computational Theory and Mathematicssequence analysis; power statistics; alignment-free functionsalignment-free functionsData miningCompleteness (statistics)Molecular BiologycomputerType I and type II errors

researchProduct

Overlap and diversity in antimicrobial peptide databases: Compiling a non-redundant set of sequences

2015

Abstract Motivation: The large variety of antimicrobial peptide (AMP) databases developed to date are characterized by a substantial overlap of data and similarity of sequences. Our goals are to analyze the levels of redundancy for all available AMP databases and use this information to build a new non-redundant sequence database. For this purpose, a new software tool is introduced. Results: A comparative study of 25 AMP databases reveals the overlap and diversity among them and the internal diversity within each database. The overlap analysis shows that only one database (Peptaibol) contains exclusive data, not present in any other, whereas all sequences in the LAMP_Patent database are inc…

Statistics and ProbabilitySimilarity (geometry)Computer scienceSequence analysisAntimicrobial peptidesPeptaibolPeptidecomputer.software_genreProceduresBiochemistrySet (abstract data type)chemistry.chemical_compoundProtein methodsSequence Analysis ProteinRedundancy (engineering)HumansDatabases ProteinMolecular BiologyAntimicrobial cationic peptideschemistry.chemical_classificationSequenceAntimicrobial cationic peptideDatabaseSequence databaseSequence analysisComputer Science ApplicationsAlgorithmComputational MathematicsChemistryProtein databaseComputational Theory and MathematicschemistryData miningNucleic acid databaseDatabases Nucleic AcidcomputerSoftwareAlgorithmsHuman

researchProduct

ArtiFuse—computational validation of fusion gene detection tools without relying on simulated reads

2019

Abstract Motivation Gene fusions are an important class of transcriptional variants that can influence cancer development and can be predicted from RNA sequencing (RNA-seq) data by multiple existing tools. However, the real-world performance of these tools is unclear due to the lack of known positive and negative events, especially with regard to fusion genes in individual samples. Often simulated reads are used, but these cannot account for all technical biases in RNA-seq data generated from real samples. Results Here, we present ArtiFuse, a novel approach that simulates fusion genes by sequence modification to the genomic reference, and therefore, can be applied to any RNA-seq dataset wit…

Statistics and ProbabilitySource codeSequence analysisComputer sciencemedia_common.quotation_subjectValue (computer science)Genomicscomputer.software_genreBiochemistryFusion gene03 medical and health sciences0302 clinical medicineSoftwareMolecular BiologyGene030304 developmental biologymedia_common0303 health sciencesSequence Analysis RNAbusiness.industryHigh-Throughput Nucleotide SequencingRNAGenomicsComputer Science ApplicationsComputational MathematicsComputational Theory and Mathematics030220 oncology & carcinogenesisBenchmark (computing)RNAData miningGene FusionbusinesscomputerSoftwareBioinformatics

researchProduct

Fully Bayesian Approach to Image Restoration with an Application in Biogeography

1994

SUMMARY A common method of studying biogeographical ranges is an atlas survey, in which the research area is divided into a square grid and the data consist of the squares where observations occur. Often the observations form only an incomplete map of the true range, and a method is required to decide whether the blank squares indicate true absence or merely a lack of study there. This is essentially an image restoration problem, but it has properties that make the common empirical Bayesian procedures inadequate. Most notably, the observed image is heavily degraded, causing difficulties in the estimation of spatial interaction, and the assessment of reliability of the restoration is emphasi…

Statistics and ProbabilitySquare tilingAtlas (topology)Spatial interactionBayesian probabilityCommon methodcomputer.software_genreBlankGeographyData miningStatistics Probability and UncertaintySpatial analysiscomputerImage restorationApplied Statistics

researchProduct

Testing for local structure in spatiotemporal point pattern data

2017

The detection of clustering structure in a point pattern is one of the main focuses of attention in spatiotemporal data mining. Indeed, statistical tools for clustering detection and identification of individual events belonging to clusters are welcome in epidemiology and seismology. Local second-order characteristics provide information on how an event relates to nearby events. In this work, we extend local indicators of spatial association (known as LISA functions) to the spatiotemporal context (which will be then called LISTA functions). These functions are then used to build local tests of clustering to analyse differences in local spatiotemporal structures. We present a simulation stud…

Statistics and ProbabilityStructure (mathematical logic)010504 meteorology & atmospheric sciencesEvent (computing)Ecological ModelingAssociation (object-oriented programming)Context (language use)computer.software_genre01 natural sciences010104 statistics & probabilityIdentification (information)Point (geometry)Data mining0101 mathematicsCluster analysiscomputer0105 earth and related environmental sciencesStatistical hypothesis testingMathematicsEnvironmetrics

researchProduct

RNA-Seq Atlas—a reference database for gene expression profiling in normal tissue by next-generation sequencing

2012

Abstract Motivation: Next-generation sequencing technology enables an entirely new perspective for clinical research and will speed up personalized medicine. In contrast to microarray-based approaches, RNA-Seq analysis provides a much more comprehensive and unbiased view of gene expression. Although the perspective is clear and the long-term success of this new technology obvious, bioinformatics resources making these data easily available especially to the biomedical research community are still evolving. Results: We have generated RNA-Seq Atlas, a web-based repository of RNA-Seq gene expression profiles and query tools. The website offers open and easy access to RNA-Seq gene expression pr…

Statistics and ProbabilitySystems biologyRNA-SeqComputational biologyBiologycomputer.software_genreBiochemistryNeoplasmsGene expressionHumansMicroarray databasesMolecular BiologyGeneOligonucleotide Array Sequence AnalysisInternetSequence Analysis RNAbusiness.industryGene Expression ProfilingHigh-Throughput Nucleotide SequencingComputer Science ApplicationsGene expression profilingComputational MathematicsComputational Theory and MathematicsGene chip analysisData miningPersonalized medicineDatabases Nucleic AcidbusinesscomputerSoftwareBioinformatics

researchProduct

Outlier detection with automatic modelling: TRAMO/SEATS versus X-12-ARIMA

2012

Statistics and Probabilitybusiness.industryComputer scienceApplied MathematicsModeling and SimulationPattern recognitionAnomaly detectionData miningArtificial intelligenceAutoregressive integrated moving averagecomputer.software_genrebusinesscomputerModel Assisted Statistics and Applications

researchProduct

Efficient change point detection in genomic sequences of continuous measurements

2010

Abstract Motivation: Knowing the exact locations of multiple change points in genomic sequences serves several biological needs, for instance when data represent aCGH profiles and it is of interest to identify possibly damaged genes involved in cancer and other diseases. Only a few of the currently available methods deal explicitly with estimation of the number and location of change points, and moreover these methods may be somewhat vulnerable to deviations of model assumptions usually employed. Results: We present a computationally efficient method to obtain estimates of the number and location of the change points. The method is based on a simple transformation of data and it provides re…

Statistics and Probabilitymodel selectionBreast Neoplasmscomputer.software_genreBiochemistryCell LineSimple (abstract algebra)Cell Line TumorHumansComputer Simulationpiecewise constant modelMolecular BiologyMathematicsOligonucleotide Array Sequence AnalysisSupplementary dataComparative Genomic HybridizationModels StatisticalSeries (mathematics)Model selectionGenomicsComputer Science ApplicationsComputational MathematicsR packageTransformation (function)Computational Theory and MathematicsChange pointsChangepointaCGH analysiFemaleData miningSettore SECS-S/01 - StatisticacomputerChange detection

researchProduct

Systematic handling of missing data in complex study designs : experiences from the Health 2000 and 2011 Surveys

2016

We present a systematic approach to the practical and comprehensive handling of missing data motivated by our experiences of analyzing longitudinal survey data. We consider the Health 2000 and 2011 Surveys (BRIF8901) where increased non-response and non-participation from 2000 to 2011 was a major issue. The model assumptions involved in the complex sampling design, repeated measurements design, non-participation mechanisms and associations are presented graphically using methodology previously defined as a causal model with design, i.e. a functional causal model extended with the study design. This tool forces the statistician to make the study design and the missing-data mechanism explicit…

Statistics and Probabilitymultiple imputationComputer sciencecomputer.software_genre01 natural sciences010104 statistics & probability03 medical and health sciences0302 clinical medicinenon-responseSampling design030212 general & internal medicine0101 mathematicsCausal modelta112Clinical study designInverse probability weightingSampling (statistics)non-participationMissing dataData sciencedoubly robust methodsSurvey data collectionData miningStatistics Probability and Uncertaintycomputerinverse probability weightingStatisticiancausal model with designJournal of Applied Statistics

researchProduct