Search results for "Data mining"
showing 10 items of 907 documents
No-Reference 3D Mesh Quality Assessment Based on Dihedral Angles Model and Support Vector Regression
2016
International audience; 3D meshes are subject to various visual distortions during their transmission and geometrical processing. Several works have tried to evaluate the visual quality using either full reference or reduced reference approaches. However, these approaches require the presence of the reference mesh which is not available in such practical situations. In this paper, the main contribution lies in the design of a computational method to automatically predict the perceived mesh quality without reference and without knowing beforehand the distortion type. Following the no-reference (NR) quality assessment principle, the proposed method focuses only on the distorted mesh. Specific…
Identification and visualisation of differential isoform expression in RNA-seq time series
2017
AbstractAs sequencing technologies improve their capacity to detect distinct transcripts of the same gene and to address complex experimental designs such as longitudinal studies, there is a need to develop statistical methods for the analysis of isoform expression changes in time series data. Iso-maSigPro is a new functionality of the R package maSigPro for transcriptomics time series data analysis. Iso-maSigPro identifies genes with a differential isoform usage across time. The package also includes new clustering and visualization functions that allow grouping of genes with similar expression patterns at the isoform level, as well as those genes with a shift in major expressed isoform. T…
Data Analytics in Healthcare: A Tertiary Study
2022
AbstractThe field of healthcare has seen a rapid increase in the applications of data analytics during the last decades. By utilizing different data analytic solutions, healthcare areas such as medical image analysis, disease recognition, outbreak monitoring, and clinical decision support have been automated to various degrees. Consequently, the intersection of healthcare and data analytics has received scientific attention to the point of numerous secondary studies. We analyze studies on healthcare data analytics, and provide a wide overview of the subject. This is a tertiary study, i.e., a systematic review of systematic reviews. We identified 45 systematic secondary studies on data analy…
Exploring Multiobjective Optimization for Multiview Clustering
2018
We present a new multiview clustering approach based on multiobjective optimization. In contrast to existing clustering algorithms based on multiobjective optimization, it is generally applicable to data represented by two or more views and does not require specifying the number of clusters a priori . The approach builds upon the search capability of a multiobjective simulated annealing based technique, AMOSA, as the underlying optimization technique. In the first version of the proposed approach, an internal cluster validity index is used to assess the quality of different partitionings obtained using different views. A new way of checking the compatibility of these different partitioning…
Content quality assessment and acceptance testing in location‐based services
2006
In this paper, we develop and evaluate an approach to assessing the content quality in a location‐based service (LBS). The proposed approach, instead of assessing the quality in absolute terms such as completeness or accuracy, measures the effect that the imperfection of the content is having on the reliability of that specific LBS. We apply the basic ideas from Software Reliability Engineering (SRE), but develop a modification of SRE, 2‐Branch, in order to separate content quality from other factors, such as positioning imprecision, and to reduce the measurement error. In our experimental study, we first compare 2‐Branch to the standard SRE, after which we experimentally analyze some prope…
HyperLabelMe : A Web Platform for Benchmarking Remote-Sensing Image Classifiers
2017
HyperLabelMe is a web platform that allows the automatic benchmarking of remote-sensing image classifiers. To demonstrate this platform's attributes, we collected and harmonized a large data set of labeled multispectral and hyperspectral images with different numbers of classes, dimensionality, noise sources, and levels. The registered user can download training data pairs (spectra and land cover/use labels) and submit the predictions for unseen testing spectra. The system then evaluates the accuracy and robustness of the classifier, and it reports different scores as well as a ranked list of the best methods and users. The system is modular, scalable, and ever-growing in data sets and clas…
Adapted Transfer of Distance Measures for Quantitative Structure-Activity Relationships and Data-Driven Selection of Source Datasets
2012
Quantitative structure–activity relationships are regression models relating chemical structure to biological activity. Such models allow to make predictions for toxicologically relevant endpoints, which constitute the target outcomes of experiments. The task is often tackled by instance-based methods, which are all based on the notion of chemical (dis-)similarity. Our starting point is the observation by Raymond and Willett that the two families of chemical distance measures, fingerprint-based and maximum common subgraph-based measures, provide orthogonal information about chemical similarity. This paper presents a novel method for finding suitable combinations of them, called adapted tran…
2014
This paper investigates the proficiency of support vector machine (SVM) using datasets generated by Tennessee Eastman process simulation for fault detection. Due to its excellent performance in generalization, the classification performance of SVM is satisfactory. SVM algorithm combined with kernel function has the nonlinear attribute and can better handle the case where samples and attributes are massive. In addition, with forehand optimizing the parameters using the cross-validation technique, SVM can produce high accuracy in fault detection. Therefore, there is no need to deal with original data or refer to other algorithms, making the classification problem simple to handle. In order to…
MetaCache-GPU: Ultra-Fast Metagenomic Classification
2021
The cost of DNA sequencing has dropped exponentially over the past decade, making genomic data accessible to a growing number of scientists. In bioinformatics, localization of short DNA sequences (reads) within large genomic sequences is commonly facilitated by constructing index data structures which allow for efficient querying of substrings. Recent metagenomic classification pipelines annotate reads with taxonomic labels by analyzing their $k$-mer histograms with respect to a reference genome database. CPU-based index construction is often performed in a preprocessing phase due to the relatively high cost of building irregular data structures such as hash maps. However, the rapidly growi…
Analysing the presence of school-shooting related communities at social media sites
2010
Surprisingly cruel mass murders and attacks have been witnessed in the educational institutions of the Western world since the 1970s. These are often referred to as 'school shootings'. There have been over 300 known incidents around the world and the number is growing. Social network sites (SNSs) have enabled the perpetrators to express their views and intentions. Our result is that since about 2005, all major school shooters have had a presence in SNS and some have left traces that would have made possible to evaluate their intentions to carry out a rampage. A further hypothesis is that future school shooters will behave in a similar manner and would thus be traceable in the digital sphere…