Search results for "Stream"
showing 10 items of 682 documents
Forest of Normalized Trees: Fast and Accurate Density Estimation of Streaming Data
2018
Density estimation of streaming data is a relevant task in numerous domains. In this paper, a novel non-parametric density estimator called FRONT (forest of normalized trees) is introduced. It uses a structure of multiple normalized trees, segments the feature space of the data stream through a periodically updated linear transformation and is able to adapt to ever evolving data streams. FRONT provides accurate density estimation and performs favorably compared to existing online density estimators in terms of the average log score on multiple standard data sets. Its low complexity, linear runtime as well as constant memory usage, makes FRONT by design suitable for large data streams. Final…
Prototype-based learning on concept-drifting data streams
2014
Data stream mining has gained growing attentions due to its wide emerging applications such as target marketing, email filtering and network intrusion detection. In this paper, we propose a prototype-based classification model for evolving data streams, called SyncStream, which dynamically models time-changing concepts and makes predictions in a local fashion. Instead of learning a single model on a sliding window or ensemble learning, SyncStream captures evolving concepts by dynamically maintaining a set of prototypes in a new data structure called the P-tree. The prototypes are obtained by error-driven representativeness learning and synchronization-inspired constrained clustering. To ide…
New results for finding common neighborhoods in massive graphs in the data stream model
2008
AbstractWe consider the problem of finding pairs of vertices that share large common neighborhoods in massive graphs. We give lower bounds for randomized, two-sided error algorithms that solve this problem in the data-stream model of computation. Our results correct and improve those of Buchsbaum, Giancarlo, and Westbrook [On finding common neighborhoods in massive graphs, Theoretical Computer Science, 299 (1–3) 707–718 (2004)]
Quantifying Vegetation Biophysical Variables from Imaging Spectroscopy Data: A Review on Retrieval Methods
2019
An unprecedented spectroscopic data stream will soon become available with forthcoming Earth-observing satellite missions equipped with imaging spectroradiometers. This data stream will open up a vast array of opportunities to quantify a diversity of biochemical and structural vegetation properties. The processing requirements for such large data streams require reliable retrieval techniques enabling the spatiotemporally explicit quantification of biophysical variables. With the aim of preparing for this new era of Earth observation, this review summarizes the state-of-the-art retrieval methods that have been applied in experimental imaging spectroscopy studies inferring all kinds of vegeta…
Optical remote sensing and the retrieval of terrestrial vegetation bio-geophysical properties – A review
2015
Abstract: Forthcoming superspectral satellite missions dedicated to land monitoring, as well as planned imaging spectrometers, will unleash an unprecedented data stream. The processing requirements for such large data streams involve processing techniques enabling the spatio-temporally explicit quantification of vegetation properties. Typically retrieval must be accurate, robust and fast. Hence, there is a strict requirement to identify next-generation bio-geophysical variable retrieval algorithms which can be molded into an operational processing chain. This paper offers a review of state-of-the-art retrieval methods for quantitative terrestrial bio-geophysical variable extraction using op…
Distributed Real-Time Sentiment Analysis for Big Data Social Streams
2014
Big data trend has enforced the data-centric systems to have continuous fast data streams. In recent years, real-time analytics on stream data has formed into a new research field, which aims to answer queries about "what-is-happening-now" with a negligible delay. The real challenge with real-time stream data processing is that it is impossible to store instances of data, and therefore online analytical algorithms are utilized. To perform real-time analytics, pre-processing of data should be performed in a way that only a short summary of stream is stored in main memory. In addition, due to high speed of arrival, average processing time for each instance of data should be in such a way that…
Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions
2016
The joint density of a data stream is suitable for performing data mining tasks without having access to the original data. However, the methods proposed so far only target a small to medium number of variables, since their estimates rely on representing all the interdependencies between the variables of the data. High-dimensional data streams, which are becoming more and more frequent due to increasing numbers of interconnected devices, are, therefore, pushing these methods to their limits. To mitigate these limitations, we present an approach that projects the original data stream into a vector space and uses a set of representatives to provide an estimate. Due to the structure of the est…
A Selective Change Driven System for High-Speed Motion Analysis.
2016
Vision-based sensing algorithms are computationally-demanding tasks due to the large amount of data acquired and processed. Visual sensors deliver much information, even if data are redundant, and do not give any additional information. A Selective Change Driven (SCD) sensing system is based on a sensor that delivers, ordered by the magnitude of its change, only those pixels that have changed most since the last read-out. This allows the information stream to be adjusted to the computation capabilities. Following this strategy, a new SCD processing architecture for high-speed motion analysis, based on processing pixels instead of full frames, has been developed and implemented into a Field …
Sequential Learning with LS-SVM for Large-Scale Data Sets
2006
We present a subspace-based variant of LS-SVMs (i.e. regularization networks) that sequentially processes the data and is hence especially suited for online learning tasks. The algorithm works by selecting from the data set a small subset of basis functions that is subsequently used to approximate the full kernel on arbitrary points. This subset is identified online from the data stream. We improve upon existing approaches (esp. the kernel recursive least squares algorithm) by proposing a new, supervised criterion for the selection of the relevant basis functions that takes into account the approximation error incurred from approximating the kernel as well as the reduction of the cost in th…
Twitter troļļi - statistikas metodes automātiski ģenerēta satura noteikšanai
2016
Bakalaura darbā „Twitter troļļi – statistikas metodes automātiski ģenerēta satura noteikšanai” tiek pētīts un salīdzināts sociālās vietnes Twitter lietojums dažādu lietotāju grupu vidū. Darba mērķis ir pētīt dažādas metodes automātiski ģenerēta satura noteikšanai Twitter vietnē, kā arī cita veida aizdomīga Twitter lietojuma noteikšanai. Izmantojot publiski pieejamos Twitter lietotāju datus, praktiski tiek pielietotas vienkāršas statistikas metodes, lai identificētu aizdomīgu Twitter lietojumu. Darba rezultātā tika atklātas vairākas anomālijas Twitter lietotāju datos, kas norāda uz to, ka izmantotās statistikas metodes varētu būt sekmīgas Twitter troļļu identificēšanā.