Search results for "Data mining"
showing 10 items of 907 documents
Photonic non-contact estimation of blood lactate level
2015
The ability to measure the blood lactate level in a non-invasive, non-contact manner is very appealing to the sports industry as well as the home care field. That is mainly because this substance level is an imperative parameter in the course of devolving a personal workout programs. Moreover, the blood lactate level is also a pivotal means in estimation of muscles' performance capability. In this manuscript we propose an optical non-contact approach to estimate the concentration level of this parameter. Firstly, we introduce the connection between the physiological muscle tremor and the lactate blood levels. Secondly, we suggest a photonic optical method to estimate the physiological tremo…
Mass Spectrometry in Food Quality and Safety
2015
Abstract In recent years, mass spectrometry has gained a wide recognition as a selective and fast technique for the analysis and assessment of a wide range of food products. The state of the art in the determination of safety and quality of food is presented to illustrate the capability of this technique for classification and grading, defect and disease detection, distribution and visualization of chemical attributes, and evaluations of overall quality of meat, fish, fruits, vegetables, and other food products. The features of mass spectrometry for each category were summarized in the aspects of the investigated quality and safety attributes, the used systems (triple quadrupole, quadrupole…
Gaussian Process Regression (GPR) Representation in Predictive Model Markup Language (PMML)
2017
International audience; This paper describes Gaussian process regression (GPR) models presented in predictive model markup language (PMML). PMML is an extensible-markup-language (XML) -based standard language used to represent data-mining and predictive analytic models, as well as pre- and post-processed data. The previous PMML version, PMML 4.2, did not provide capabilities for representing probabilistic (stochastic) machine-learning algorithms that are widely used for constructing predictive models taking the associated uncertainties into consideration. The newly released PMML version 4.3, which includes the GPR model, provides new features: confidence bounds and distribution for the pred…
A methodology to assess the intrinsic discriminative ability of a distance function and its interplay with clustering algorithms for microarray data …
2013
Abstract Background Clustering is one of the most well known activities in scientific investigation and the object of research in many disciplines, ranging from statistics to computer science. Following Handl et al., it can be summarized as a three step process: (1) choice of a distance function; (2) choice of a clustering algorithm; (3) choice of a validation method. Although such a purist approach to clustering is hardly seen in many areas of science, genomic data require that level of attention, if inferences made from cluster analysis have to be of some relevance to biomedical research. Results A procedure is proposed for the assessment of the discriminative ability of a distance functi…
Indexing a sequence for mapping reads with a single mismatch
2014
Mapping reads against a genome sequence is an interesting and useful problem in computational molecular biology and bioinformatics. In this paper, we focus on the problem of indexing a sequence for mapping reads with a single mismatch. We first focus on a simpler problem where the length of the pattern is given beforehand during the data structure construction. This version of the problem is interesting in its own right in the context of the next generation sequencing. In the sequel, we show how to solve the more general problem. In both cases, our algorithm can construct an efficient data structure in time and space and can answer subsequent queries in time. Here, n is the length of the s…
Automated Uncertainty Quantification Through Information Fusion in Manufacturing Processes
2017
International audience; Evaluation of key performance indicators (KPIs) such as energy consumption is essential for decision-making during the design and operation of smart manufacturing systems. The measurements of KPIs are strongly affected by several uncertainty sources such as input material uncertainty, the inherent variability in the manufacturing process, model uncertainty, and the uncertainty in the sensor measurements of operational data. A comprehensive understanding of the uncertainty sources and their effect on the KPIs is required to make the manufacturing processes more efficient. Towards this objective, this paper proposed an automated methodology to generate a hierarchical B…
Mesh Visual Quality Assessment Metrics: A Comparison Study
2017
3D graphics technologies have known a developed progress in the last years, and several processing operations can be applied on 3D meshes such as watermarking, compression, simplification and so forth. Mesh visual quality assessment becomes an important issue to evaluate the visual appearance of the 3D shape after specific modifications. Several metrics have been proposed in this context, from the classical distance-based metrics to the perceptual-based metrics which include perceptual information about the human visual system. In this paper, we propose to study the performance of several mesh visual quality metrics. First, the comparison is conducted regardless the distortion types neither…
Diversity in random subspacing ensembles
2004
Ensembles of learnt models constitute one of the main current directions in machine learning and data mining. It was shown experimentally and theoretically that in order for an ensemble to be effective, it should consist of classifiers having diversity in their predictions. A number of ways are known to quantify diversity in ensembles, but little research has been done about their appropriateness. In this paper, we compare eight measures of the ensemble diversity with regard to their correlation with the accuracy improvement due to ensembles. We conduct experiments on 21 data sets from the UCI machine learning repository, comparing the correlations for random subspacing ensembles with diffe…
Missing values in deduplication of electronic patient data
2011
Data deduplication refers to the process in which records referring to the same real-world entities are detected in datasets such that duplicated records can be eliminated. The denotation ‘record linkage’ is used here for the same problem.1 A typical application is the deduplication of medical registry data.2 3 Medical registries are institutions that collect medical and personal data in a standardized and comprehensive way. The primary aims are the creation of a pool of patients eligible for clinical or epidemiological studies and the computation of certain indices such as the incidence in order to oversee the development of diseases. The latter task in particular requires a database in wh…
A Large-Scale Empirical Evaluation of Cross-Validation and External Test Set Validation in (Q)SAR.
2013
(Q)SAR model validation is essential to ensure the quality of inferred models and to indicate future model predictivity on unseen compounds. Proper validation is also one of the requirements of regulatory authorities in order to accept the (Q)SAR model, and to approve its use in real world scenarios as alternative testing method. However, at the same time, the question of how to validate a (Q)SAR model, in particular whether to employ variants of cross-validation or external test set validation, is still under discussion. In this paper, we empirically compare a k-fold cross-validation with external test set validation. To this end we introduce a workflow allowing to realistically simulate t…