Search results for "Data mining"

showing 10 items of 907 documents

The Analysis of Auxological Data by Means of Nonlinear Multivariate Growth Curves

1999

In this paper we treat the problem to analyse a data set constituted by multivariate growth curves for different subjects; thus in this context we deal with 3-way data tables. Nevertheless, it is not possible using factorial techniques proposed to deal with 3-way data matrices, because the observations are generally not equally spaced; moreover a multilevel approach founded on polynomial models is not suitable to deal with intrinsic nonlinear models. We propose a non-factorial technique to analyse auxological data sets using an intrinsic nonlinear multivariate growth model with autocorrelated errors. The application to a real data set of growing children gave easily interpretable results.

Data setNonlinear systemFactorialMultivariate statisticsPolynomialAutocorrelationContext (language use)Data miningcomputer.software_genreNonlinear regressioncomputerAlgorithmMathematics

researchProduct

Analysis of multi-source metabolomic data using joint and individual variation explained (JIVE).

2015

Metabolic profiling is increasingly being used for understanding biological processes but there is no single analytical technique that provides a complete quantitative or qualitative profiling of the metabolome. Data fusion (i.e. joint analysis of data from multiple sources) has the potential to circumvent this issue facilitating knowledge discovery and reliable biomarker identification. Another field of application of data fusion is the simultaneous analysis of metabolomic changes through several biofluids or tissues. However, metabolomics typically deals with large datasets, with hundreds to thousands of variables and the identification of shared and individual factors or structures acros…

Data sourceComputer scienceAnalytical techniqueStatistics as TopicAnalytical chemistryUrinalysisSensor fusioncomputer.software_genreBiochemistryAnalytical ChemistryMultiple dataMetabolomicsKnowledge extractionElectrochemistryEnvironmental ChemistryProfiling (information science)HumansMetabolomicsData miningcomputerSpectroscopyMulti-sourceBlood Chemical AnalysisSoftwareThe Analyst

researchProduct

Integrating LSTMs with Online Density Estimation for the Probabilistic Forecast of Energy Consumption

2019

In machine learning applications in the energy sector, it is often necessary to have both highly accurate predictions and information about the probabilities of certain scenarios to occur. We address this challenge by integrating and combining long short-term memory networks (LSTMs) and online density estimation into a real-time data streaming architecture of an energy trader. The online density estimation is done in the MiDEO framework, which estimates joint densities of data streams based on ensembles of chains of Hoeffding trees. One attractive feature of the solution is that queries can be sent to the here-called forecast-based point density estimators (FPDE) to derive information from …

Data streamComputer scienceData stream mining020209 energyProbabilistic logicEstimator02 engineering and technologyEnergy consumptionDensity estimationcomputer.software_genre0202 electrical engineering electronic engineering information engineeringFeature (machine learning)020201 artificial intelligence & image processingData miningRepresentation (mathematics)computer

researchProduct

Prototype-based learning on concept-drifting data streams

2014

Data stream mining has gained growing attentions due to its wide emerging applications such as target marketing, email filtering and network intrusion detection. In this paper, we propose a prototype-based classification model for evolving data streams, called SyncStream, which dynamically models time-changing concepts and makes predictions in a local fashion. Instead of learning a single model on a sliding window or ensemble learning, SyncStream captures evolving concepts by dynamically maintaining a set of prototypes in a new data structure called the P-tree. The prototypes are obtained by error-driven representativeness learning and synchronization-inspired constrained clustering. To ide…

Data streamConcept driftbusiness.industryComputer scienceData stream miningConstrained clusteringcomputer.software_genreData structureMachine learningEnsemble learningSynchronization (computer science)Data miningArtificial intelligencebusinesscomputerProceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining

researchProduct

Quantifying Vegetation Biophysical Variables from Imaging Spectroscopy Data: A Review on Retrieval Methods

2019

An unprecedented spectroscopic data stream will soon become available with forthcoming Earth-observing satellite missions equipped with imaging spectroradiometers. This data stream will open up a vast array of opportunities to quantify a diversity of biochemical and structural vegetation properties. The processing requirements for such large data streams require reliable retrieval techniques enabling the spatiotemporally explicit quantification of biophysical variables. With the aim of preparing for this new era of Earth observation, this review summarizes the state-of-the-art retrieval methods that have been applied in experimental imaging spectroscopy studies inferring all kinds of vegeta…

Data streamEarth observation010504 meteorology & atmospheric sciencesComputer scienceUT-Hybrid-D010502 geochemistry & geophysicscomputer.software_genreQuantitative Biology - Quantitative Methods01 natural sciencesArticleGeochemistry and PetrologyFOS: Electrical engineering electronic engineering information engineeringQuantitative Methods (q-bio.QM)0105 earth and related environmental sciencesParametric statisticsData stream miningImage and Video Processing (eess.IV)Electrical Engineering and Systems Science - Image and Video Processing15. Life on land22/4 OA procedureRegressionImaging spectroscopyGeophysicsSpectroradiometer13. Climate actionMulticollinearityFOS: Biological sciencesITC-ISI-JOURNAL-ARTICLEData miningcomputerSurveys in Geophysics

researchProduct

Optical remote sensing and the retrieval of terrestrial vegetation bio-geophysical properties – A review

2015

Abstract: Forthcoming superspectral satellite missions dedicated to land monitoring, as well as planned imaging spectrometers, will unleash an unprecedented data stream. The processing requirements for such large data streams involve processing techniques enabling the spatio-temporally explicit quantification of vegetation properties. Typically retrieval must be accurate, robust and fast. Hence, there is a strict requirement to identify next-generation bio-geophysical variable retrieval algorithms which can be molded into an operational processing chain. This paper offers a review of state-of-the-art retrieval methods for quantitative terrestrial bio-geophysical variable extraction using op…

Data streamEconomicsComputer scienceOperational variable retrievalcomputer.software_genreLaboratory of Geo-information Science and Remote SensingMachine learningPhysicalLaboratorium voor Geo-informatiekunde en Remote SensingBio-geophysical variablesComputers in Earth SciencesParametricEngineering (miscellaneous)Parametric statisticsRemote sensingData stream miningPhysicsTransparency (human–computer interaction)VegetationPE&RCNon-parametricHybridAtomic and Molecular Physics and OpticsComputer Science ApplicationsVariable (computer science)SatelliteData miningEngineering sciences. TechnologyRetrievabilitycomputerISPRS Journal of Photogrammetry and Remote Sensing

researchProduct

Distributed Real-Time Sentiment Analysis for Big Data Social Streams

2014

Big data trend has enforced the data-centric systems to have continuous fast data streams. In recent years, real-time analytics on stream data has formed into a new research field, which aims to answer queries about "what-is-happening-now" with a negligible delay. The real challenge with real-time stream data processing is that it is impossible to store instances of data, and therefore online analytical algorithms are utilized. To perform real-time analytics, pre-processing of data should be performed in a way that only a short summary of stream is stored in main memory. In addition, due to high speed of arrival, average processing time for each instance of data should be in such a way that…

Data streamFOS: Computer and information sciencesComputer Science - Computation and LanguageComputer sciencebusiness.industryData stream miningSentiment analysisBig dataMachine Learning (stat.ML)Databases (cs.DB)Data structurecomputer.software_genreField (computer science)Computer Science - Information RetrievalTree (data structure)Computer Science - DatabasesComputer Science - Distributed Parallel and Cluster ComputingAnalyticsStatistics - Machine LearningData miningDistributed Parallel and Cluster Computing (cs.DC)businesscomputerComputation and Language (cs.CL)Information Retrieval (cs.IR)

researchProduct

Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions

2016

The joint density of a data stream is suitable for performing data mining tasks without having access to the original data. However, the methods proposed so far only target a small to medium number of variables, since their estimates rely on representing all the interdependencies between the variables of the data. High-dimensional data streams, which are becoming more and more frequent due to increasing numbers of interconnected devices, are, therefore, pushing these methods to their limits. To mitigate these limitations, we present an approach that projects the original data stream into a vector space and uses a set of representatives to provide an estimate. Due to the structure of the est…

Data streamMahalanobis distanceComputer scienceData stream miningbusiness.industry02 engineering and technologyDensity estimationcomputer.software_genreSet (abstract data type)Software020204 information systems0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processingData miningbusinesscomputerCurse of dimensionalityVector space

researchProduct

Analysis of Lipid Experiments (ALEX): A Software Framework for Analysis of High-Resolution Shotgun Lipidomics Data

2013

Global lipidomics analysis across large sample sizes produces high-content datasets that require dedicated software tools supporting lipid identification and quantification, efficient data management and lipidome visualization. Here we present a novel software-based platform for streamlined data processing, management and visualization of shotgun lipidomics data acquired using high-resolution Orbitrap mass spectrometry. The platform features the ALEX framework designed for automated identification and export of lipid species intensity directly from proprietary mass spectral data files, and an auxiliary workflow using database exploration tools for integration of sample information, computat…

Databases FactualComputer scienceData managementlcsh:MedicineBioinformaticscomputer.software_genreMass spectrometryMiceUser-Computer InterfaceData visualizationLipidomicsAnimalslcsh:ScienceInternetMultidisciplinarybusiness.industrylcsh:RBrainLipid-phosphate phosphataseShotgun lipidomicsLipidomeLipidsVisualizationSoftware frameworkKnockout mouselcsh:QData miningbusinesscomputerSoftwareResearch ArticlePLoS ONE

researchProduct

Local dimensionality reduction and supervised learning within natural clusters for biomedical data analysis

2006

Inductive learning systems were successfully applied in a number of medical domains. Nevertheless, the effective use of these systems often requires data preprocessing before applying a learning algorithm. This is especially important for multidimensional heterogeneous data presented by a large number of features of different types. Dimensionality reduction (DR) is one commonly applied approach. The goal of this paper is to study the impact of natural clustering--clustering according to expert domain knowledge--on DR for supervised learning (SL) in the area of antibiotic resistance. We compare several data-mining strategies that apply DR by means of feature extraction or feature selection w…

Databases FactualComputer scienceFeature extractionInformation Storage and RetrievalFeature selectionMachine learningcomputer.software_genreModels BiologicalPattern Recognition AutomatedImmune systemArtificial IntelligenceDrug Resistance BacterialCluster AnalysisHumansComputer SimulationElectrical and Electronic EngineeringRepresentation (mathematics)Cluster analysisCross Infectionbusiness.industryDimensionality reductionSupervised learningGeneral MedicineAnti-Bacterial AgentsComputer Science ApplicationsData pre-processingData miningArtificial intelligenceMultidimensional systemsbusinesscomputerAlgorithmsBiotechnology

researchProduct