Search results for "Mining"
showing 10 items of 1730 documents
Identification of differential risk hotspots for collision and vehicle type in a directed linear network
2019
Traffic accidents can take place in very different ways and involve a substantially distinct number and types of vehicles. Thus, it is of interest to know which parts of a road structure present an overrepresentation of a specific type of traffic accident, specially for some typologies of collisions and vehicles that tend to trigger more severe consequences for the users being involved. In this study, a spatial approach is followed to estimate the risk that different types of collisions and vehicles present in the central area of Valencia (Spain), considering the accidents observed in this city during the period 2014-2017. A directed spatial linear network representing the non-pedestrian ro…
Adaptive Population Importance Samplers: A General Perspective
2016
Importance sampling (IS) is a well-known Monte Carlo method, widely used to approximate a distribution of interest using a random measure composed of a set of weighted samples generated from another proposal density. Since the performance of the algorithm depends on the mismatch between the target and the proposal densities, a set of proposals is often iteratively adapted in order to reduce the variance of the resulting estimator. In this paper, we review several well-known adaptive population importance samplers, providing a unified common framework and classifying them according to the nature of their estimation and adaptive procedures. Furthermore, we interpret the underlying motivation …
Streamlining Assessment using a Knowledge Metric
2016
Estimating Accuracy of Mobile-Masquerader Detection Using Worst-Case and Best-Case Scenario
2006
In order to resist an unauthorized use of the resources accessible through mobile terminals, masquerader detection means can be employed. In this paper, the problem of mobile-masquerader detection is approached as a classification problem, and the detection is performed by an ensemble of one-class classifiers. Each classifier compares a measure describing user behavior or environment with the profile accumulating the information about past behavior and environment. The accuracy of classification is empirically estimated by experimenting with a dataset describing the behavior and environment of two groups of mobile users, where the users within groups are affiliated with each other. It is as…
Measuring the agreement between brain connectivity networks.
2016
Investigating the level of similarity between two brain networks, resulting from measures of effective connectivity in the brain, can be of interest from many respects. In this study, we propose and test the idea to borrow measures of association used in machine learning to provide a measure of similarity between the structure of (un-weighted) brain connectivity networks. The measures here explored are the accuracy, Cohen's Kappa (K) and Area Under Curve (AUC). We implemented two simulation studies, reproducing two contexts of application that can be particularly interesting for practical applications, namely: i) in methodological studies, performed on surrogate data, aiming at comparing th…
Additive noise and multiplicative bias as disclosure limitation techniques for continuous microdata: A simulation study
2004
This paper focuses on a combination of two disclosure limitation techniques, additive noise and multiplicative bias, and studies their efficacy in protecting confidentiality of continuous microdata. A Bayesian intruder model is extensively simulated in order to assess the performance of these disclosure limitation techniques as a function of key parameters like the variability amongst profiles in the original data, the amount of users prior information, the amount of bias and noise introduced in the data. The results of the simulation offer insight into the degree of vulnerability of data on continuous random variables and suggests some guidelines for effective protection measures.
Fusion of experimental data
1997
Abstract The integration of information from various sensory systems is one of the most difficult challenges in understanding both perception and cognition. For example, the problem of auditory-visual integration is a correspondence problem between perceived auditory and visual scenes. Two main questions arise when designing data analysis systems: what is the useful information to be integrated?, and what are the integration rules? The problem of integrating information becomes relevant whenever: (a) the same kind of data are detected by spatially distributed sensors; (b) heterogeneous data are detected by different sensors; (c) heterogeneous distributed data are involved. General problems …
Set similarity joins on mapreduce
2018
Set similarity joins, which compute pairs of similar sets, constitute an important operator primitive in a variety of applications, including applications that must process large amounts of data. To handle these data volumes, several distributed set similarity join algorithms have been proposed. Unfortunately, little is known about the relative performance, strengths and weaknesses of these techniques. Previous comparisons are limited to a small subset of relevant algorithms, and the large differences in the various test setups make it hard to draw overall conclusions. In this paper we survey ten recent, distributed set similarity join algorithms, all based on the MapReduce paradigm. We emp…
Refining a Reference Architecture for Model-Driven Business Apps
2016
-
A naive relevance feedback model for content-based image retrieval using multiple similarity measures
2010
This paper presents a novel probabilistic framework to process multiple sample queries in content based image retrieval (CBIR). This framework is independent from the underlying distance or (dis)similarity measures which support the retrieval system, and only assumes mutual independence among their outcomes. The proposed framework gives rise to a relevance feedback mechanism in which positive and negative data are combined in order to optimally retrieve images according to the available information. A particular setting in which users interactively supply feedback and iteratively retrieve images is set both to model the system and to perform some objective performance measures. Several repo…