Search results for "Data mining"
showing 10 items of 907 documents
Enforcing Conceptual Modeling to improve the understanding of human genome
2010
It is widely accepted that the use of Conceptual Modeling techniques in modern Software Engineering leads to a more accurate description of the problem domain. The application of these techniques in the context of challenging domains as the human genome is a fascinating task. The relevant biological concepts should be properly addressed through the creation of the corresponding conceptual schema. This schema will improve the description of the global process followed from a DNA sequence to a fully functional protein. Once the conceptual model is established, the corresponding database is created. The database is intended to act as a unified repository of integrated information that will all…
Three-dimensional Fuzzy Kernel Regression framework for registration of medical volume data
2013
Abstract In this work a general framework for non-rigid 3D medical image registration is presented. It relies on two pattern recognition techniques: kernel regression and fuzzy c-means clustering. The paper provides theoretic explanation, details the framework, and illustrates its application to implement three registration algorithms for CT/MR volumes as well as single 2D slices. The first two algorithms are landmark-based approaches, while the third one is an area-based technique. The last approach is based on iterative hierarchical volume subdivision, and maximization of mutual information. Moreover, a high performance Nvidia CUDA based implementation of the algorithm is presented. The f…
Zur Identifikation von Strukturanalogien in Datenmodellen
2005
On the one hand, data models decrease the complexity of information system development. On the other hand, data models causes additional complexity. Recently structural analogies are discussed as instruments reducing the complexity of data models. This piece of research presents a procedure to identify structural analogies in data models and demonstrates its performance by analyzing Scheer’s reference model for industrial enterprises (Y-CIM-model). The proposed procedure is based on formalizing data models within set theory and uses a quantitative similarity measure. The obtained results show both identical and very similar information structures within the Y-CIM-model. Furthermore, ways of…
A Windowing strategy for Distributed Data Mining optimized through GPUs
2017
Abstract This paper introduces an optimized Windowing based strategy for inducing decision trees in Distributed Data Mining scenarios. Windowing consists in selecting a sample of the available training examples (the window) to induce a decision tree with an usual algorithm, e.g., J48; finding instances not covered by this tree (counter examples) in the remaining training examples, adding them to the window to induce a new tree; and repeating until a termination criterion is met. In this way, the number of training examples required to induce the tree is reduced considerably, while maintaining the expected accuracy levels; which is paid in terms of time performance. Our proposed enhancements…
Sectors on sectors (SonS): A new hierarchical clustering visualization tool
2011
Clustering techniques have been widely applied to extract information from high-dimensional data structures in the last few years. Graphs are especially relevant for clustering, but many graphs associated with hierarchical clustering do not give any information about the values of the centroids' attributes and the relationships among them. In this paper, we propose a new visualization approach for hierarchical cluster analysis in which the above-mentioned information is available. The method is based on pie charts. The pie charts are divided into several pie segments or sectors corresponding to each cluster. The radius of each pie segment is proportional to the number of patterns included i…
sar: Automatic generation of statistical reports using Stata and Microsoft Word for Windows
2013
The output provided by most Stata commands is plain text not suitable to be presented or published. After the numerical and graphical outputs are obtained, the user has to copy them into a word processor to complete the editing process. Some Stata commands help you to obtain well-formatted output, especially tabulated results in LATEX or other formats, but they are not a complete solution nor are they friendly tools. Stata automatic report (Sar) is an easy-to-use macro for Microsoft Word for Windows that allows a powerful integration between Stata and Word. With Sar, the user can retrieve numerical results and graphs from Stata and automatically insert them into a well-formatted Word docum…
Clustering categorical data: A stability analysis framework
2011
Clustering to identify inherent structure is an important first step in data exploration. The k-means algorithm is a popular choice, but K-means is not generally appropriate for categorical data. A specific extension of k-means for categorical data is the k-modes algorithm. Both of these partition clustering methods are sensitive to the initialization of prototypes, which creates the difficulty of selecting the best solution for a given problem. In addition, selecting the number of clusters can be an issue. Further, the k-modes method is especially prone to instability when presented with ‘noisy’ data, since the calculation of the mode lacks the smoothing effect inherent in the calculation …
Adding Synthetic Detail to Natural Terrain Using a Wavelet Approach
2002
Terrain representation is a basic topic in the field of interactive graphics. The amount of data required for good quality terrain representation offers an important challenge to developers of such systems. For users of these applications the accuracy of geographical data is less important than their natural visual appearance. This makes it possible to mantain a limited geographical data base for the system and to extend it generating synthetic data.In this paper we combine fractal and wavelet theories to provide extra data which keeps the natural essence of actual information available. The new levels of detail(LOD) for the terrain are obtained applying an inverse Wavelet Transform (WT) to…
<title>Dynamic integration of multiple data mining techniques in a knowledge discovery management system</title>
1999
One of the most important directions in improvement of data mining and knowledge discovery, is the integration of multiple classification techniques of an ensemble of classifiers. An integration technique should be able to estimate and select the most appropriate component classifiers from the ensemble. We present two variations of an advanced dynamic integration technique with two distance metrics. The technique is one variation of the stacked generalization method, with an assumption that each of the component classifiers is the best one, inside a certain sub area of the entire domain area. Our technique includes two phases: the learning phase and the application phase. During the learnin…
Multivariate statistical technique over QoS variables to analyze video quality metrics on IEEE 802.11ac networks
2017
[EN] We present the results from a measurementbasedperformance evaluation of wireless networks basedon IEEE 802.11ac standard in an indoor environment, withthe aim to analyze their performance under high definitionstreaming video applications. We focus our study on analyzingthe highest performance of these standards using off-theshelfequipment as well as the behavior of Quality of Servicevariables and how they affect to the video quality. Thus, wehave analyzed and measured these variables and have applieda multivariate statistical technique, called Factor Analysis,and finally discuss their behavior.