Search results for "DATA MINING"
showing 10 items of 907 documents
Urban monitoring using multi-temporal SAR and multi-spectral data
2006
In some key operational domains, the joint use of synthetic aperture radar (SAR) and multi-spectral sensors has shown to be a powerful tool for Earth observation. In this paper, we analyze the potentialities of combining interferometric SAR and multi-spectral data for urban area characterization and monitoring. This study is carried out following a standard multi-source processing chain. First, a pre-processing stage is performed taking into account the underlying physics, geometry, and statistical models for the data from each sensor. Second, two different methodologies, one for supervised and another for unsupervised approaches, are followed to obtain features that optimize the urban rela…
A practical methodology to perform global sensitivity analysis for 2D hydrodynamic computationally intensive simulations
2021
Sensitivity analysis is a commonly used technique in hydrological modeling for different purposes, including identifying the influential parameters and ranking them. This paper proposes a simplified sensitivity analysis approach by applying the Taguchi design and the ANOVA technique to 2D hydrodynamic flood simulations, which are computationally intensive. This approach offers an effective and practical way to rank the influencing parameters, quantify the contribution of each parameter to the variability of the outputs, and investigate the possible interaction between the input parameters. A number of 2D flood simulations have been carried out using the proposed combinations by Taguchi (L27…
A probabilistic condensed representation of data for stream mining
2014
Data mining and machine learning algorithms usually operate directly on the data. However, if the data is not available at once or consists of billions of instances, these algorithms easily become infeasible with respect to memory and run-time concerns. As a solution to this problem, we propose a framework, called MiDEO (Mining Density Estimates inferred Online), in which algorithms are designed to operate on a condensed representation of the data. In particular, we propose to use density estimates, which are able to represent billions of instances in a compact form and can be updated when new instances arrive. As an example for an algorithm that operates on density estimates, we consider t…
Aspects Concerning SVM Method’s Scalability
2008
In the last years the quantity of text documents is increasing continually and automatic document classification is an important challenge. In the text document classification the training step is essential in obtaining a good classifier. The quality of learning depends on the dimension of the training data. When working with huge learning data sets, problems regarding the training time that increases exponentially are occurring. In this paper we are presenting a method that allows working with huge data sets into the training step without increasing exponentially the training time and without significantly decreasing the classification accuracy.
The heterogeneity of inter-domain Internet application flows: entropic analysis and flow graph modelling
2013
The growing popularity of the Internet has triggered the proliferation of various applications, which possess diverse communication patterns and user behaviour. In this paper, the heterogeneous characteristics of Internet applications and traffic are investigated from a complex network and entropic perspective. On the basis of real-life flow data collected from a public network provided by an Internet service provider, flow graphs are constructed for five types of applications as follows: Web, P2P Download, P2P Stream, Video Stream and Instant Messaging. Three types of entropy measures are introduced to the flow graphs, and the heterogeneity of applications within a 24-h period is analysed …
Fragtique: Applying an OO Database Distribution Strategy to Data Warehouse
2001
We propose a strategy for distribution of a relational data warehouse organized according to a star schema. We adapt fragmentation and allocation strategies that were developed for OO databases. We split the most-often-accessed dimension table into fragments by using primary horizontal fragmentation. The derived fragmentation then divides the fact table into fragments. Other dimension tables are not fragmented since they are presumed to be sufficiently small. Allocation of fragments encompasses duplication of non-fragmented dimension tables that we call a closure.
A Logical Key Hierarchy Based approach to preserve content privacy in Decentralized Online Social Networks
2020
Distributed Online Social Networks (DOSNs) have been proposed to shift the control over user data from a unique entity, the online social network provider, to the users of the DOSN themselves. In this paper we focus on the problem of preserving the privacy of the contents shared to large groups of users. In general, content privacy is enforced by encrypting the content, having only authorized parties being able to decrypt it. When efficiency has to be taken into account, new solutions have to be devised that: i) minimize the re-encryption of the contents published in a group when the composition of the group changes; and, ii) enable a fast distribution of the cryptographic keys to all the m…
GPU-accelerated exhaustive search for third-order epistatic interactions in case–control studies
2015
This is a post-peer-review, pre-copyedit version of an article published in Journal of Computational Science. The final authenticated version is available online at: https://doi.org/10.1016/j.jocs.2015.04.001 [Abstract] Interest in discovering combinations of genetic markers from case–control studies, such as Genome Wide Association Studies (GWAS), that are strongly associated to diseases has increased in recent years. Detecting epistasis, i.e. interactions among k markers (k ≥ 2), is an important but time consuming operation since statistical computations have to be performed for each k-tuple of measured markers. Efficient exhaustive methods have been proposed for k = 2, but exhaustive thi…
Online Induction of Probabilistic Real Time Automata
2012
Probabilistic real time automata (PRTAs) are a representation of dynamic processes arising in the sciences and industry. Currently, the induction of automata is divided into two steps: the creation of the prefix tree acceptor (PTA) and the merge procedure based on clustering of the states. These two steps can be very time intensive when a PRTA is to be induced for massive or even unbounded data sets. The latter one can be efficiently processed, as there exist scalable online clustering algorithms. However, the creation of the PTA still can be very time consuming. To overcome this problem, we propose a genuine online PRTA induction approach that incorporates new instances by first collapsing…
Relations frequency hypermatrices in mutual, conditional and joint entropy-based information indices.
2012
Graph-theoretic matrix representations constitute the most popular and significant source of topological molecular descriptors (MDs). Recently, we have introduced a novel matrix representation, named the duplex relations frequency matrix, F, derived from the generalization of an incidence matrix whose row entries are connected subgraphs of a given molecular graph G. Using this matrix, a series of information indices (IFIs) were proposed. In this report, an extension of F is presented, introducing for the first time the concept of a hypermatrix in graph-theoretic chemistry. The hypermatrix representation explores the n-tuple participation frequencies of vertices in a set of connected subgrap…