Search results for "DATA MINING"

showing 10 items of 907 documents

Urban monitoring using multi-temporal SAR and multi-spectral data

2006

In some key operational domains, the joint use of synthetic aperture radar (SAR) and multi-spectral sensors has shown to be a powerful tool for Earth observation. In this paper, we analyze the potentialities of combining interferometric SAR and multi-spectral data for urban area characterization and monitoring. This study is carried out following a standard multi-source processing chain. First, a pre-processing stage is performed taking into account the underlying physics, geometry, and statistical models for the data from each sensor. Second, two different methodologies, one for supervised and another for unsupervised approaches, are followed to obtain features that optimize the urban rela…

Synthetic aperture radarEarth observationFeature selectionStatistical modelcomputer.software_genreData setData acquisitionArtificial IntelligenceSignal ProcessingStandard algorithmsComputer Vision and Pattern RecognitionData miningcomputerSoftwareMulti-sourcePattern Recognition Letters

researchProduct

A practical methodology to perform global sensitivity analysis for 2D hydrodynamic computationally intensive simulations

2021

Sensitivity analysis is a commonly used technique in hydrological modeling for different purposes, including identifying the influential parameters and ranking them. This paper proposes a simplified sensitivity analysis approach by applying the Taguchi design and the ANOVA technique to 2D hydrodynamic flood simulations, which are computationally intensive. This approach offers an effective and practical way to rank the influencing parameters, quantify the contribution of each parameter to the variability of the outputs, and investigate the possible interaction between the input parameters. A number of 2D flood simulations have been carried out using the proposed combinations by Taguchi (L27…

TC401-506Physical geographyComputer sciencetaguchi designcomputer.software_genreGB3-5030River lake and water-supply engineering (General)VDP::Teknologi: 500Global sensitivity analysisglobal sensitivity analysisData mininganovacomputer2d hydrodynamic flood modelingWater Science and TechnologyHydrology Research

researchProduct

A probabilistic condensed representation of data for stream mining

2014

Data mining and machine learning algorithms usually operate directly on the data. However, if the data is not available at once or consists of billions of instances, these algorithms easily become infeasible with respect to memory and run-time concerns. As a solution to this problem, we propose a framework, called MiDEO (Mining Density Estimates inferred Online), in which algorithms are designed to operate on a condensed representation of the data. In particular, we propose to use density estimates, which are able to represent billions of instances in a compact form and can be updated when new instances arrive. As an example for an algorithm that operates on density estimates, we consider t…

Task (computing)Association rule learningData stream miningSimple (abstract algebra)Computer scienceProbabilistic logicProbabilistic analysis of algorithmsAlgorithm designData miningRepresentation (mathematics)computer.software_genrecomputer2014 International Conference on Data Science and Advanced Analytics (DSAA)

researchProduct

Aspects Concerning SVM Method’s Scalability

2008

In the last years the quantity of text documents is increasing continually and automatic document classification is an important challenge. In the text document classification the training step is essential in obtaining a good classifier. The quality of learning depends on the dimension of the training data. When working with huge learning data sets, problems regarding the training time that increases exponentially are occurring. In this paper we are presenting a method that allows working with huge data sets into the training step without increasing exponentially the training time and without significantly decreasing the classification accuracy.

Text document classificationStructured support vector machinebusiness.industryComputer scienceDocument classificationcomputer.software_genreSupport vector machineText miningScalabilityData miningbusinessCluster analysiscomputerClassifier (UML)

researchProduct

The heterogeneity of inter-domain Internet application flows: entropic analysis and flow graph modelling

2013

The growing popularity of the Internet has triggered the proliferation of various applications, which possess diverse communication patterns and user behaviour. In this paper, the heterogeneous characteristics of Internet applications and traffic are investigated from a complex network and entropic perspective. On the basis of real-life flow data collected from a public network provided by an Internet service provider, flow graphs are constructed for five types of applications as follows: Web, P2P Download, P2P Stream, Video Stream and Instant Messaging. Three types of entropy measures are introduced to the flow graphs, and the heterogeneity of applications within a 24-h period is analysed …

Theoretical computer scienceComputer sciencebusiness.industryInter-domainTraffic identificationComplex networkcomputer.software_genreDegree distributionInternet service providerEntropy (information theory)Control flow graphThe InternetData miningElectrical and Electronic EngineeringbusinesscomputerTransactions on Emerging Telecommunications Technologies

researchProduct

Fragtique: Applying an OO Database Distribution Strategy to Data Warehouse

2001

We propose a strategy for distribution of a relational data warehouse organized according to a star schema. We adapt fragmentation and allocation strategies that were developed for OO databases. We split the most-often-accessed dimension table into fragments by using primary horizontal fragmentation. The derived fragmentation then divides the fact table into fragments. Other dimension tables are not fragmented since they are presumed to be sufficiently small. Allocation of fragments encompasses duplication of non-fragmented dimension tables that we call a closure.

Theoretical computer scienceDatabaseComputer scienceRelational databaseFragmentation (computing)Dimension tableA* search algorithmFact tablecomputer.software_genreData warehouselaw.inventionData cubelawSchema (psychology)Data miningcomputer

researchProduct

A Logical Key Hierarchy Based approach to preserve content privacy in Decentralized Online Social Networks

2020

Distributed Online Social Networks (DOSNs) have been proposed to shift the control over user data from a unique entity, the online social network provider, to the users of the DOSN themselves. In this paper we focus on the problem of preserving the privacy of the contents shared to large groups of users. In general, content privacy is enforced by encrypting the content, having only authorized parties being able to decrypt it. When efficiency has to be taken into account, new solutions have to be devised that: i) minimize the re-encryption of the contents published in a group when the composition of the group changes; and, ii) enable a fast distribution of the cryptographic keys to all the m…

Theoretical computer scienceFacebookComputer scienceInformation privacyCyber SecurityGroup communicationJoinsEncryptionEncryptioncomputer.software_genreKey managementSet (abstract data type)Peer-to-peer computingElectrical and Electronic EngineeringFocus (computing)VegetationSocial networkSettore INF/01 - Informaticabusiness.industryGroup (mathematics)Composition (combinatorics)Decentralized Online Social NetworksDecentralized Online Social Networks; Encryption; Facebook; Group communication; Information privacy; Key management; Peer-to-peer computing; Privacy; Vegetation; Electrical and Electronic EngineeringPrivacyContent (measure theory)Decentralized online social networkData miningbusinesscomputerData privacy

researchProduct

GPU-accelerated exhaustive search for third-order epistatic interactions in case–control studies

2015

This is a post-peer-review, pre-copyedit version of an article published in Journal of Computational Science. The final authenticated version is available online at: https://doi.org/10.1016/j.jocs.2015.04.001 [Abstract] Interest in discovering combinations of genetic markers from case–control studies, such as Genome Wide Association Studies (GWAS), that are strongly associated to diseases has increased in recent years. Detecting epistasis, i.e. interactions among k markers (k ≥ 2), is an important but time consuming operation since statistical computations have to be performed for each k-tuple of measured markers. Efficient exhaustive methods have been proposed for k = 2, but exhaustive thi…

Theoretical computer scienceSource codeGeneral Computer ScienceComputer scienceComputationmedia_common.quotation_subjectGPUBrute-force searchCUDAMutual informationcomputer.software_genreTheoretical Computer ScienceMutual informationCUDAModeling and SimulationEpistasisGWASNode (circuits)Data miningTupleHeuristicscomputermedia_commonJournal of Computational Science

researchProduct

Online Induction of Probabilistic Real Time Automata

2012

Probabilistic real time automata (PRTAs) are a representation of dynamic processes arising in the sciences and industry. Currently, the induction of automata is divided into two steps: the creation of the prefix tree acceptor (PTA) and the merge procedure based on clustering of the states. These two steps can be very time intensive when a PRTA is to be induced for massive or even unbounded data sets. The latter one can be efficiently processed, as there exist scalable online clustering algorithms. However, the creation of the PTA still can be very time consuming. To overcome this problem, we propose a genuine online PRTA induction approach that incorporates new instances by first collapsing…

Theoretical computer sciencebusiness.industryComputer scienceProbabilistic logiccomputer.software_genreAutomatonData setTrieAutomata theoryThe InternetData miningbusinessCluster analysiscomputer2012 IEEE 12th International Conference on Data Mining

researchProduct

Relations frequency hypermatrices in mutual, conditional and joint entropy-based information indices.

2012

Graph-theoretic matrix representations constitute the most popular and significant source of topological molecular descriptors (MDs). Recently, we have introduced a novel matrix representation, named the duplex relations frequency matrix, F, derived from the generalization of an incidence matrix whose row entries are connected subgraphs of a given molecular graph G. Using this matrix, a series of information indices (IFIs) were proposed. In this report, an extension of F is presented, introducing for the first time the concept of a hypermatrix in graph-theoretic chemistry. The hypermatrix representation explores the n-tuple participation frequencies of vertices in a set of connected subgrap…

Thermodynamic stateEntropyMatrix representationStatistical parameterIncidence matrixGeneral ChemistryEthylenesJoint entropyCombinatoricsComputational Mathematicschemistry.chemical_compoundMatrix (mathematics)chemistryModels ChemicalEntropy (information theory)Data MiningMolecular graphComputer SimulationMathematicsJournal of computational chemistry

researchProduct