Search results for "Data stream"

showing 10 items of 50 documents

Modeling recurrent distributions in streams using possible worlds

2015

Discovering changes in the data distribution of streams and discovering recurrent data distributions are challenging problems in data mining and machine learning. Both have received a lot of attention in the context of classification. With the ever increasing growth of data, however, there is a high demand of compact and universal representations of data streams that enable the user to analyze current as well as historic data without having access to the raw data. To make a first step towards this direction, we propose a condensed representation that captures the various — possibly recurrent — data distributions of the stream by extending the notion of possible worlds. The representation en…

Possible worldBasis (linear algebra)Computer scienceData stream miningRepresentation (systemics)Context (language use)Data pre-processingData miningRaw datacomputer.software_genrecomputerData modeling2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA)
researchProduct

On using novel “Anti-Bayesian” techniques for the classification of dynamical data streams

2017

The classification of dynamical data streams is among the most complex problems encountered in classification. This is, firstly, because the distribution of the data streams is non-stationary, and it changes without any prior “warning”. Secondly, the manner in which it changes is also unknown. Thirdly, and more interestingly, the model operates with the assumption that the correct classes of previously-classified patterns become available at a juncture after their appearance. This paper pioneers the use of unreported novel schemes that can classify such dynamical data streams by invoking the recently-introduced “Anti-Bayesian” (AB) techniques. Contrary to the Bayesian paradigm, that compare…

QuantilesComputer scienceData stream miningBayesian probability02 engineering and technologyClassificationcomputer.software_genreAnti-Bayesian classificationRobustness (computer science)020204 information systems0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processingData miningcomputerBayesian paradigmQuantile2017 IEEE Congress on Evolutionary Computation (CEC)
researchProduct

A probabilistic condensed representation of data for stream mining

2014

Data mining and machine learning algorithms usually operate directly on the data. However, if the data is not available at once or consists of billions of instances, these algorithms easily become infeasible with respect to memory and run-time concerns. As a solution to this problem, we propose a framework, called MiDEO (Mining Density Estimates inferred Online), in which algorithms are designed to operate on a condensed representation of the data. In particular, we propose to use density estimates, which are able to represent billions of instances in a compact form and can be updated when new instances arrive. As an example for an algorithm that operates on density estimates, we consider t…

Task (computing)Association rule learningData stream miningSimple (abstract algebra)Computer scienceProbabilistic logicProbabilistic analysis of algorithmsAlgorithm designData miningRepresentation (mathematics)computer.software_genrecomputer2014 International Conference on Data Science and Advanced Analytics (DSAA)
researchProduct

Intelligent Sampling for Vegetation Nitrogen Mapping Based on Hybrid Machine Learning Algorithms

2021

Upcoming satellite imaging spectroscopy missions will deliver spatiotemporal explicit data streams to be exploited for mapping vegetation properties, such as nitrogen (N) content. Within retrieval workflows for real-time mapping over agricultural regions, such crop-specific information products need to be derived precisely and rapidly. To allow fast processing, intelligent sampling schemes for training databases should be incorporated to establish efficient machine learning (ML) models. In this study, we implemented active learning (AL) heuristics using kernel ridge regression (KRR) to minimize and optimize a training database for variational heteroscedastic Gaussian processes regression (V…

Training setMean squared errorActive learning (machine learning)Data stream miningComputer scienceFrame (networking)0211 other engineering and technologiesSampling (statistics)02 engineering and technologyVegetation15. Life on landGeotechnical Engineering and Engineering Geologycomputer.software_genreArticleEuclidean distancesymbols.namesakesymbolsData miningElectrical and Electronic EngineeringGaussian processcomputer021101 geological & geomatics engineering
researchProduct

Data Stream Clustering for Application-Layer DDoS Detection in Encrypted Traffic

2018

Application-layer distributed denial-of-service attacks have become a serious threat to modern high-speed computer networks and systems. Unlike network-layer attacks, application-layer attacks can be performed using legitimate requests from legitimately connected network machines that make these attacks undetectable by signature-based intrusion detection systems. Moreover, the attacks may utilize protocols that encrypt the data of network connections in the application layer, making it even harder to detect an attacker’s activity without decrypting users’ network traffic, and therefore violating their privacy. In this paper, we present a method that allows us to detect various application-l…

Web serverbusiness.industryComputer scienceNetwork packetDenial-of-service attackIntrusion detection systemEncryptioncomputer.software_genreApplication layerData stream clusteringbusinesscomputerVirtual networkComputer network
researchProduct

<title>Distance functions in dynamic integration of data mining techniques</title>

2000

One of the most important directions in the improvement of data mining and knowledge discovery is the integration of multiple data mining techniques. An integration method needs to be able either to evaluate and select the most appropriate data mining technique or to combine two or more techniques efficiently. A recent integration method for the dynamic integration of multiple data mining techniques is based on the assumption that each of the data mining techniques is the best one inside a certain subarea of the whole domain area. This method uses an instance-based learning approach to collect information about the competence areas of the mining techniques and applies a distance function to…

business.industryData stream miningComputer scienceFeature selectionMachine learningcomputer.software_genreData modelingInformation extractionKnowledge extractionMetric (mathematics)Artificial intelligenceData miningbusinesscomputerInformation integrationData integrationSPIE Proceedings
researchProduct

Improving big-data automotive applications performance through adaptive resource allocation

2019

In automotive applications, connected vehicles (CVs) can collect various information (external temperature, speed, location, etc.) and send them to a central infrastructure for exploitation in a wide range of applications: Eco-Driving, fleet management, environmental monitoring, etc. Such applications are known to generate a massive volume of data that is processed in real or near real time (i.e., data streams) depending on the target application requirements. To handle this data volume, big data architectures, based on stream computing paradigm, are usually adopted. Within this paradigm, data are continuously processed by a set of operators (elementary operations) instances. Further, a str…

business.industryData stream miningData parallelismComputer scienceDistributed computingStreamBig dataAutomotive industry02 engineering and technologyDirected graph020204 information systems0202 electrical engineering electronic engineering information engineeringResource allocationTuplebusiness2019 IEEE Symposium on Computers and Communications (ISCC)
researchProduct

Mining Maximal Frequent Patterns in Transactional Databases and Dynamic Data Streams: A Spark-based Approach

2018

Mining maximal frequent patterns (MFPs) in transactional databases (TDBs) and dynamic data streams (DDSs) is substantially important for business intelligence. MFPs, as the smallest set of patterns, help to reveal customers’ purchase rules and market basket analysis (MBA). Although, numerous studies have been carried out in this area, most of them extend the main-memory based Apriori or FP-growth algorithms. Therefore, these approaches are not only unscalable but also lack parallelism. Consequently, ever increasing big data sources requirements cannot be met. In addition, mining performance in some existing approaches degrade drastically due to the presence of null transactions. We, therefo…

dynamic data streamsprime number theorybig datatransactional databasesnull transactionsapache sparkmaximal frequent patternstiedonlouhinta
researchProduct

Modelling Recurrent Events for Improving Online Change Detection

2016

The task of online change point detection in sensor data streams is often complicated due to presence of noise that can be mistaken for real changes and therefore affecting performance of change detectors. Most of the existing change detection methods assume that changes are independent from each other and occur at random in time. In this paper we study how performance of detectors can be improved in case of recurrent changes. We analytically demonstrate under which conditions and for how long recurrence information is useful for improving the detection accuracy. We propose a simple computationally efficient message passing procedure for calculating a predictive probability distribution of …

ta113noiseComputer scienceData stream miningMessage passingDetectordata streamsonline change detection02 engineering and technologycomputer.software_genreTask (computing)recurrent eventschange points020204 information systems0202 electrical engineering electronic engineering information engineeringProbability distribution020201 artificial intelligence & image processingNoise (video)Data miningBaseline (configuration management)computerChange detectionProceedings of the 2016 SIAM International Conference on Data Mining
researchProduct

Scalable implementation of dependence clustering in Apache Spark

2017

This article proposes a scalable version of the Dependence Clustering algorithm which belongs to the class of spectral clustering methods. The method is implemented in Apache Spark using GraphX API primitives. Moreover, a fast approximate diffusion procedure that enables algorithms of spectral clustering type in Spark environment is introduced. In addition, the proposed algorithm is benchmarked against Spectral clustering. Results of applying the method to real-life data allow concluding that the implementation scales well, yet demonstrating good performance for densely connected graphs. peerReviewed

ta113ta213Apache SparkComputer sciencedatasetsCorrelation clusteringdata miningcomputer.software_genrealgorithmsSpectral clusteringComputational sciencedependence clusteringData stream clusteringCURE data clustering algorithmScalabilitySpark (mathematics)algoritmitCanopy clustering algorithmData miningtiedonlouhintaCluster analysisclustering algorithmscomputerdata processingtietojenkäsittely
researchProduct