Search results for "Data stream"
showing 10 items of 50 documents
Machine learning information fusion in Earth observation: A comprehensive review of methods, applications and data sources
2020
This paper reviews the most important information fusion data-driven algorithms based on Machine Learning (ML) techniques for problems in Earth observation. Nowadays we observe and model the Earth with a wealth of observations, from a plethora of different sensors, measuring states, fluxes, processes and variables, at unprecedented spatial and temporal resolutions. Earth observation is well equipped with remote sensing systems, mounted on satellites and airborne platforms, but it also involves in-situ observations, numerical models and social media data streams, among other data sources. Data-driven approaches, and ML techniques in particular, are the natural choice to extract significant i…
A two-armed bandit collective for hierarchical examplar based mining of frequent itemsets with applications to intrusion detection
2014
Published version of a chapter in the book: Transactions on Computational Collective Intelligence XIV. Also available from the publisher at: http://dx.doi.org/10.1007/978-3-662-44509-9_1 In this paper we address the above problem by posing frequent item-set mining as a collection of interrelated two-armed bandit problems. We seek to find itemsets that frequently appear as subsets in a stream of itemsets, with the frequency being constrained to support granularity requirements. Starting from a randomly or manually selected examplar itemset, a collective of Tsetlin automata based two-armed bandit players - one automaton for each item in the examplar - learns which items should be included in …
DeCyMo: Decentralized Cyber-physical System for Monitoring and Controlling Industries and Homes
2018
The recent revolution of the Internet of Things has given the birth of a series of new technologies and cyber-physical systems to be used in industrial and home scenarios. Cyber- physical systems include physical and software components for providing smart monitoring and control with flexibility and adaptability to the operating context. The IoT paradigm enables the intertwined use of physical and software components through the interconnection of devices that exchange data with each other without direct human interaction in several fields, especially in industrial and home environments. We propose DeCyMo, a decentralized architecture that aims at solving common IoT issues and vulnerabiliti…
Scalable Clustering by Iterative Partitioning and Point Attractor Representation
2016
Clustering very large datasets while preserving cluster quality remains a challenging data-mining task to date. In this paper, we propose an effective scalable clustering algorithm for large datasets that builds upon the concept of synchronization. Inherited from the powerful concept of synchronization, the proposed algorithm, CIPA (Clustering by Iterative Partitioning and Point Attractor Representations), is capable of handling very large datasets by iteratively partitioning them into thousands of subsets and clustering each subset separately. Using dynamic clustering by synchronization, each subset is then represented by a set of point attractors and outliers. Finally, CIPA identifies the…
A Novel Clustering Algorithm based on a Non-parametric "Anti-Bayesian" Paradigm
2015
The problem of clustering, or unsupervised classification, has been solved by a myriad of techniques, all of which depend, either directly or implicitly, on the Bayesian principle of optimal classification. To be more specific, within a Bayesian paradigm, if one is to compare the testing sample with only a single point in the feature space from each class, the optimal Bayesian strategy would be to achieve this based on the distance from the corresponding means or central points in the respective distributions. When this principle is applied in clustering, one would assign an unassigned sample into the cluster whose mean is the closest, and this can be done in either a bottom-up or a top-dow…
A critical review on the implementation of static data sampling techniques to detect network attacks
2021
International audience; Given that the Internet traffic speed and volume are growing at a rapid pace, monitoring the network in a real-time manner has introduced several issues in terms of computing and storage capabilities. Fast processing of traffic data and early warnings on the detected attacks are required while maintaining a single pass over the traffic measurements. To palliate these problems, one can reduce the amount of traffic to be processed by using a sampling technique and detect the attacks based on the sampled traffic. Different parameters have an impact on the efficiency of this process, mainly, the applied sampling policy and sampling ratio. In this paper, we investigate th…
Higher-Fidelity Frugal and Accurate Quantile Estimation Using a Novel Incremental <italic>Discretized</italic> Paradigm
2018
Traditional pattern classification works with the moments of the distributions of the features and involves the estimation of the means and variances. As opposed to this, more recently, research has indicated the power of using the quantiles of the distributions because they are more robust and applicable for non-parametric methods. The estimation of the quantiles is even more pertinent when one is mining data streams. However, the complexity of quantile estimation is much higher than the corresponding estimation of the mean and variance, and this increased complexity is more relevant as the size of the data increases. Clearly, in the context of infinite data streams, a computational and sp…
EUDAQ $-$ A Data Acquisition Software Framework for Common Beam Telescopes
2019
EUDAQ is a generic data acquisition software developed for use in conjunction with common beam telescopes at charged particle beam lines. Providing high-precision reference tracks for performance studies of new sensors, beam telescopes are essential for the research and development towards future detectors for high-energy physics. As beam time is a highly limited resource, EUDAQ has been designed with reliability and ease-of-use in mind. It enables flexible integration of different independent devices under test via their specific data acquisition systems into a top-level framework. EUDAQ controls all components globally, handles the data flow centrally and synchronises and records the data…
Application of dictionary learning to denoise LIGO’s blip noise transients
2020
Data streams of gravitational-wave detectors are polluted by transient noise features, or ``glitches,'' of instrumental and environmental origin. In this work we investigate the use of total variation methods and learned dictionaries to mitigate the effect of those transients in the data. We focus on a specific type of transient, ``blip" glitches, as this is the most common type of glitch present in the LIGO detectors and their waveforms are easy to identify. We randomly select 100 blip glitches scattered in the data from advanced LIGO's O1 run, as provided by the citizen-science project Gravity Spy. Our results show that dictionary-learning methods are a valid approach to model and subtrac…
Grain—A Java data analysis system for Total Data Readout
2008
Grain is a data analysis system developed to be used with the novel Total Data Readout data acquisition system. In Total Data Readout all the electronics channels are read out asynchronously in singles mode and each data item is timestamped. Event building and analysis has to be done entirely in the software post-processing the data stream. A flexible and efficient event parser and the accompanying software system have been written entirely in Java. The design and implementation of the software are discussed along with experiences gained in running real-life experiments.