Search results for "Data mining"
showing 10 items of 907 documents
Aggregation in Input–Output Tables: How to Select the Best Cluster Linkage
1991
In this paper we try to give a solution to the aggregation problem on working with Input–Output tables. First of all we verify the degree of similarity among the production functions of the industries which aggregate in each sector. Secondly, once we have established the aggregation by using different cluster analysis, we set a number of conditions required to choose the proper linkage method that allows us to characterize the process of aggregation (weighted or unweighted) of the input–output table.
Data mining-based statistical analysis of biological data uncovers hidden significance: clustering Hashimoto’s thyroiditis patients based on the resp…
2014
The pathogenesis of Hashimoto's thyroiditis includes autoimmunity involving thyroid antigens, autoantibodies, and possibly cytokines. It is unclear what role plays Hsp60, but our recent data indicate that it may contribute to pathogenesis as an autoantigen. Its role in the induction of cytokine production, pro- or anti-inflammatory, was not elucidated, except that we found that peripheral blood mononucleated cells (PBMC) from patients or from healthy controls did not respond with cytokine production upon stimulation by Hsp60 in vitro with patterns that would differentiate patients from controls with statistical significance. This "negative” outcome appeared when the data were pooled and ana…
A data aggregation strategy based on wavelet for the internet of things
2017
The advent of emerging information and communication technologies, such as RFID, small size sensors and sensor networks, has made accessible a huge amount of information that requires sophisticated and efficient search algorithms to support queries on that data. In this paper we focus on the problem of aggregating data collected from these devices to efficiently support queries, inferences or statistics on them. In general, data aggregation techniques are necessary to efficiently collect information in a compact and cost-effective way. Some current solutions try to meet the above criteria, by exploiting different data aggregation techniques, for instance BitVector or Q_Digest. In this manus…
Operational cloud screening service for Sentinel-2 image time series
2015
This paper deals with the development and implementation of a cloud screening algorithm for image time series, with the focus on the forthcoming Sentinel-2 satellites to be launched under the ESA Copernicus Programme. The proposed methodology is based on kernel ridge regression and exploits the temporal information to detect anomalous changes that correspond to cloud covers. The huge data volumes to be processed when dealing with high temporal, spatial, and spectral resolution datasets motivate the implementation of the algorithm within distributed computer resources. In consequence, an operational cloud screening service has been specifically designed and implemented in the frame of the Se…
Does relevance matter to data mining research?
2008
Data mining (DM) and knowledge discovery are intelligent tools that help to accumulate and process data and make use of it. We review several existing frameworks for DM research that originate from different paradigms. These DM frameworks mainly address various DM algorithms for the different steps of the DM process. Recent research has shown that many real-world problems require integration of several DM algorithms from different paradigms in order to produce a better solution elevating the importance of practice-oriented aspects also in DM research. In this chapter we strongly emphasize that DM research should also take into account the relevance of research, not only the rigor of it. Und…
On the use of information systems research methods in data mining
2006
Information systems are powerful instruments for organizational problem solving through formal information processing (Lyytinen, 1987). Data mining (DM) and knowledge discovery are intelligent tools that help to accumulate and process data and make use of it (Fayyad, 1996). Data mining bridges many technical areas, including databases, statistics, machine learning, and human-computer interaction. The set of data mining processes used to extract and verify patterns in data is the core of the knowledge discovery process. Numerous data mining techniques have recently been developed to extract knowledge from large databases. The area of data mining is historically more related to AI (Artificial…
FCA-based knowledge representation and local generalized linear models to address relevance and diversity in diverse social images
2019
Abstract In social image retrieval, the main goal is to offer a relevant but also diverse result set of images to the user. To address relevance and diversity at the same time, we propose a multi-modal procedure. This approach deals with the diversification problem using a two-step procedure based on the application of Formal Concept Analysis (FCA) to organize the text content of the images, followed by a Hierarchical Agglomerative Clustering (HAC) step to find the topics addressed by the images. FCA detects the latent concepts covered by the images in the result set, organizing them according to these concepts. In the second step, clustering is carried out to group together the ones with a…
Handling Context-Sensitive Temporal Knowledge from Multiple Differently Ranked Sources
1999
In this paper we develop one way to represent and reason with temporal relations in the context of multiple experts. Every relation between temporal intervals consists of four endpoints’ relations. It is supposed that the context we know is the value of every expert competence concerning every endpoint relation. Thus the context for an interval temporal relation is one kind of compound expert’s rank, which has four components appropriate to every interval endpoints’ relation. Context is being updated after every new opinion is being added to the previous opinions about certain temporal relation. The context of a temporal relation collects all support given by different experts to all compon…
Feature Ranking of Large, Robust, and Weighted Clustering Result
2017
A clustering result needs to be interpreted and evaluated for knowledge discovery. When clustered data represents a sample from a population with known sample-to-population alignment weights, both the clustering and the evaluation techniques need to take this into account. The purpose of this article is to advance the automatic knowledge discovery from a robust clustering result on the population level. For this purpose, we derive a novel ranking method by generalizing the computation of the Kruskal-Wallis H test statistic from sample to population level with two different approaches. Application of these enlargements to both the input variables used in clustering and to metadata provides a…
A Scheme for Continuous Input to the Tsetlin Machine with Applications to Forecasting Disease Outbreaks
2019
In this paper, we apply a new promising tool for pattern classification, namely, the Tsetlin Machine (TM), to the field of disease forecasting. The TM is interpretable because it is based on manipulating expressions in propositional logic, leveraging a large team of Tsetlin Automata (TA). Apart from being interpretable, this approach is attractive due to its low computational cost and its capacity to handle noise. To attack the problem of forecasting, we introduce a preprocessing method that extends the TM so that it can handle continuous input. Briefly stated, we convert continuous input into a binary representation based on thresholding. The resulting extended TM is evaluated and analyzed…