Search results for "Data stream mining"

showing 5 items of 35 documents

A probabilistic condensed representation of data for stream mining

2014

Data mining and machine learning algorithms usually operate directly on the data. However, if the data is not available at once or consists of billions of instances, these algorithms easily become infeasible with respect to memory and run-time concerns. As a solution to this problem, we propose a framework, called MiDEO (Mining Density Estimates inferred Online), in which algorithms are designed to operate on a condensed representation of the data. In particular, we propose to use density estimates, which are able to represent billions of instances in a compact form and can be updated when new instances arrive. As an example for an algorithm that operates on density estimates, we consider t…

Task (computing)Association rule learningData stream miningSimple (abstract algebra)Computer scienceProbabilistic logicProbabilistic analysis of algorithmsAlgorithm designData miningRepresentation (mathematics)computer.software_genrecomputer2014 International Conference on Data Science and Advanced Analytics (DSAA)
researchProduct

Intelligent Sampling for Vegetation Nitrogen Mapping Based on Hybrid Machine Learning Algorithms

2021

Upcoming satellite imaging spectroscopy missions will deliver spatiotemporal explicit data streams to be exploited for mapping vegetation properties, such as nitrogen (N) content. Within retrieval workflows for real-time mapping over agricultural regions, such crop-specific information products need to be derived precisely and rapidly. To allow fast processing, intelligent sampling schemes for training databases should be incorporated to establish efficient machine learning (ML) models. In this study, we implemented active learning (AL) heuristics using kernel ridge regression (KRR) to minimize and optimize a training database for variational heteroscedastic Gaussian processes regression (V…

Training setMean squared errorActive learning (machine learning)Data stream miningComputer scienceFrame (networking)0211 other engineering and technologiesSampling (statistics)02 engineering and technologyVegetation15. Life on landGeotechnical Engineering and Engineering Geologycomputer.software_genreArticleEuclidean distancesymbols.namesakesymbolsData miningElectrical and Electronic EngineeringGaussian processcomputer021101 geological & geomatics engineering
researchProduct

<title>Distance functions in dynamic integration of data mining techniques</title>

2000

One of the most important directions in the improvement of data mining and knowledge discovery is the integration of multiple data mining techniques. An integration method needs to be able either to evaluate and select the most appropriate data mining technique or to combine two or more techniques efficiently. A recent integration method for the dynamic integration of multiple data mining techniques is based on the assumption that each of the data mining techniques is the best one inside a certain subarea of the whole domain area. This method uses an instance-based learning approach to collect information about the competence areas of the mining techniques and applies a distance function to…

business.industryData stream miningComputer scienceFeature selectionMachine learningcomputer.software_genreData modelingInformation extractionKnowledge extractionMetric (mathematics)Artificial intelligenceData miningbusinesscomputerInformation integrationData integrationSPIE Proceedings
researchProduct

Improving big-data automotive applications performance through adaptive resource allocation

2019

In automotive applications, connected vehicles (CVs) can collect various information (external temperature, speed, location, etc.) and send them to a central infrastructure for exploitation in a wide range of applications: Eco-Driving, fleet management, environmental monitoring, etc. Such applications are known to generate a massive volume of data that is processed in real or near real time (i.e., data streams) depending on the target application requirements. To handle this data volume, big data architectures, based on stream computing paradigm, are usually adopted. Within this paradigm, data are continuously processed by a set of operators (elementary operations) instances. Further, a str…

business.industryData stream miningData parallelismComputer scienceDistributed computingStreamBig dataAutomotive industry02 engineering and technologyDirected graph020204 information systems0202 electrical engineering electronic engineering information engineeringResource allocationTuplebusiness2019 IEEE Symposium on Computers and Communications (ISCC)
researchProduct

Modelling Recurrent Events for Improving Online Change Detection

2016

The task of online change point detection in sensor data streams is often complicated due to presence of noise that can be mistaken for real changes and therefore affecting performance of change detectors. Most of the existing change detection methods assume that changes are independent from each other and occur at random in time. In this paper we study how performance of detectors can be improved in case of recurrent changes. We analytically demonstrate under which conditions and for how long recurrence information is useful for improving the detection accuracy. We propose a simple computationally efficient message passing procedure for calculating a predictive probability distribution of …

ta113noiseComputer scienceData stream miningMessage passingDetectordata streamsonline change detection02 engineering and technologycomputer.software_genreTask (computing)recurrent eventschange points020204 information systems0202 electrical engineering electronic engineering information engineeringProbability distribution020201 artificial intelligence & image processingNoise (video)Data miningBaseline (configuration management)computerChange detectionProceedings of the 2016 SIAM International Conference on Data Mining
researchProduct