Search results for "Data mining"
showing 10 items of 907 documents
Grapes: a method and a SAS program for graphical representations of assessor performances
1994
GRAPES computes individual and global analyses of variance for sensory profiling data, consisting of several sessions in which all the panelists gave scores to all the products for a number of attributes. The fitted model takes into account the session effect. GRAPES summarizes the results by means of graphical assessor scatterplots which allow to check and to compare panelist performances, such as the way of using scale, the reliability, the discrimination power and the agreement with the panel. In addition, GRAPES detects the outliers for each of these criterion. The usefulness of GRAPES for the panel leader will be demonstrated using texture and flavor profiling of 4 restructured steaks …
Low-cost scalable discretization, prediction and feature selection for complex systems
2019
The introduced data-driven tool allows simultaneous feature selection, model inference, and marked cost and quality gains.
Unlock ways to share data on peer review
2020
Peer review is the defining feature of scholarly communication. In a 2018 survey of more than 11, 000 researchers, 98% said that they considered peer review important or extremely important for ensuring the quality and integrity of scholarly communication.
SORT-CC: A procedure for the statistical treatment of free sorting data
2008
International audience; A statistical approach for the analysis of free sorting data is discussed. In a first stage, the sorting data from each subject are arranged into a dataset consisting of indicator variables which reflect the memberships of the stimuli to the groups formed by the subject under consideration. Thereafter, an appropriate standardization is applied on these data and a three way statistical method, namely Common Components and Specific Weights Analysis, is performed on the datasets thus obtained. This makes it possible to take account of the individual differences among the subjects and to depict graphical displays showing the relationships among the stimuli on the one han…
Hyperion
2019
Indexes are essential in data management systems to increase the speed of data retrievals. Widespread data structures to provide fast and memory-efficient indexes are prefix tries. Implementations like Judy, ART, or HOT optimize their internal alignments for cache and vector unit efficiency. While these measures usually improve the performance substantially, they can have a negative impact on memory efficiency. In this paper we present Hyperion, a trie-based main-memory key-value store achieving extreme space efficiency. In contrast to other data structures, Hyperion does not depend on CPU vector units, but scans the data structure linearly. Combined with a custom memory allocator, Hyperion…
Cell state prediction through distributed estimation of transmit power
2019
Determining the state of each cell, for instance, cell outages, in a densely deployed cellular network is a difficult problem. Several prior studies have used minimization of drive test (MDT) reports to detect cell outages. In this paper, we propose a two step process. First, using the MDT reports, we estimate the serving base station’s transmit power for each user. Second, we learn summary statistics of estimated transmit power for various networks states and use these to classify the network state on test data. Our approach is able to achieve an accuracy of 96% on an NS-3 simulation dataset. Decision tree, random forest and SVM classifiers were able to achieve a classification accuracy of…
Reverse-safe data structures for text indexing
2021
We introduce the notion of reverse-safe data structures. These are data structures that prevent the reconstruction of the data they encode (i.e., they cannot be easily reversed). A data structure D is called z-reverse-safe when there exist at least z datasets with the same set of answers as the ones stored by D. The main challenge is to ensure that D stores as many answers to useful queries as possible, is constructed efficiently, and has size close to the size of the original dataset it encodes. Given a text of length n and an integer z, we propose an algorithm which constructs a z-reverse-safe data structure that has size O(n) and answers pattern matching queries of length at most d optim…
An Internet-based program for depression using activity and physiological sensors: efficacy, expectations, satisfaction, and ease of use
2016
Cristina Botella,1,2 Adriana Mira,1 Inés Moragrega,2,3 Azucena García-Palacios,1,2 Juana Bretón-López,1,2 Diana Castilla,1,2 Antonio Riera López del Amo,1 Carla Soler,1 Guadalupe Molinari,1 Soledad Quero,1,2 Verónica Guillén-Botella,2,3 Ignacio Miralles,1,2 Sara Nebot,1 Berenice Serrano,1,2 Dennis Majoe,4 Mariano Alcañiz,2,5 Rosa María Baños2,31Department of Basic, Clinical Psychology and Psychobiology, Universitat Jaume, Castellón, Spain; 2CIBER Physiopathology of Obesity and Nutrition, CIBERobn, Instituto de Salud Carlos III, Santiago de Compostela, Spain; 3Department o…
Deliberation favours social efficiency by making people disregard their relative shares: evidence from USA and India
2017
Groups make decisions on both the production and the distribution of resources. These decisions typically involve a tension between increasing the total level of group resources (i.e. social efficiency) and distributing these resources among group members (i.e. individuals' relative shares). This is the case because the redistribution process may destroy part of the resources, thus resulting in socially inefficient allocations. Here we apply a dual-process approach to understand the cognitive underpinnings of this fundamental tension. We conducted a set of experiments to examine the extent to which different allocation decisions respond to intuition or deliberation. In a newly developed app…
Interpretability of Recurrent Neural Networks in Remote Sensing
2020
In this work we propose the use of Long Short-Term Memory (LSTM) Recurrent Neural Networks for multivariate time series of satellite data for crop yield estimation. Recurrent nets allow exploiting the temporal dimension efficiently, but interpretability is hampered by the typically overparameterized models. The focus of the study is to understand LSTM models by looking at the hidden units distribution, the impact of increasing network complexity, and the relative importance of the input covariates. We extracted time series of three variables describing the soil-vegetation status in agroe-cosystems -soil moisture, VOD and EVI- from optical and microwave satellites, as well as available in si…