Search results for " mining"
showing 10 items of 1548 documents
Mislabel Detection of Finnish Publication Ranks
2019
The paper proposes to analyze a data set of Finnish ranks of academic publication channels with Extreme Learning Machine (ELM). The purpose is to introduce and test recently proposed ELM-based mislabel detection approach with a rich set of features characterizing a publication channel. We will compare the architecture, accuracy, and, especially, the set of detected mislabels of the ELM-based approach to the corresponding reference results on the reference paper.
Integrating Domain Knowledge in Data-Driven Earth Observation With Process Convolutions
2022
The modelling of Earth observation data is a challenging problem, typically approached by either purely mechanistic or purely data-driven methods. Mechanistic models encode the domain knowledge and physical rules governing the system. Such models, however, need the correct specification of all interactions between variables in the problem and the appropriate parameterization is a challenge in itself. On the other hand, machine learning approaches are flexible data-driven tools, able to approximate arbitrarily complex functions, but lack interpretability and struggle when data is scarce or in extrapolation regimes. In this paper, we argue that hybrid learning schemes that combine both approa…
A perspective on Gaussian processes for Earth observation
2019
Earth observation (EO) by airborne and satellite remote sensing and in-situ observations play a fundamental role in monitoring our planet. In the last decade, machine learning and Gaussian processes (GPs) in particular has attained outstanding results in the estimation of bio-geo-physical variables from the acquired images at local and global scales in a time-resolved manner. GPs provide not only accurate estimates but also principled uncertainty estimates for the predictions, can easily accommodate multimodal data coming from different sensors and from multitemporal acquisitions, allow the introduction of physical knowledge, and a formal treatment of uncertainty quantification and error pr…
Using the Tsetlin Machine to Learn Human-Interpretable Rules for High-Accuracy Text Categorization With Medical Applications
2019
Medical applications challenge today's text categorization techniques by demanding both high accuracy and ease-of-interpretation. Although deep learning has provided a leap ahead in accuracy, this leap comes at the sacrifice of interpretability. To address this accuracy-interpretability challenge, we here introduce, for the first time, a text categorization approach that leverages the recently introduced Tsetlin Machine. In all brevity, we represent the terms of a text as propositional variables. From these, we capture categories using simple propositional formulae, such as: if "rash" and "reaction" and "penicillin" then Allergy. The Tsetlin Machine learns these formulae from a labelled tex…
Multi-scale analysis of the European airspace using network community detection
2014
We show that the European airspace can be represented as a multi-scale traffic network whose nodes are airports, sectors, or navigation points and links are defined and weighted according to the traffic of flights between the nodes. By using a unique database of the air traffic in the European airspace, we investigate the architecture of these networks with a special emphasis on their community structure. We propose that unsupervised network community detection algorithms can be used to monitor the current use of the airspaces and improve it by guiding the design of new ones. Specifically, we compare the performance of three community detection algorithms, also by using a null model which t…
Learning Structures in Earth Observation Data with Gaussian Processes
2020
Gaussian Processes (GPs) has experienced tremendous success in geoscience in general and for bio-geophysical parameter retrieval in the last years. GPs constitute a solid Bayesian framework to formulate many function approximation problems consistently. This paper reviews the main theoretical GP developments in the field. We review new algorithms that respect the signal and noise characteristics, that provide feature rankings automatically, and that allow applicability of associated uncertainty intervals to transport GP models in space and time. All these developments are illustrated in the field of geoscience and remote sensing at a local and global scales through a set of illustrative exa…
Randomized kernels for large scale Earth observation applications
2020
Abstract Current remote sensing applications of bio-geophysical parameter estimation and image classification have to deal with an unprecedented big amount of heterogeneous and complex data sources. New satellite sensors involving a high number of improved time, space and wavelength resolutions give rise to challenging computational problems. Standard physical inversion techniques cannot cope efficiently with this new scenario. Dealing with land cover classification of the new image sources has also turned to be a complex problem requiring large amount of memory and processing time. In order to cope with these problems, statistical learning has greatly helped in the last years to develop st…
Machine learning information fusion in Earth observation: A comprehensive review of methods, applications and data sources
2020
This paper reviews the most important information fusion data-driven algorithms based on Machine Learning (ML) techniques for problems in Earth observation. Nowadays we observe and model the Earth with a wealth of observations, from a plethora of different sensors, measuring states, fluxes, processes and variables, at unprecedented spatial and temporal resolutions. Earth observation is well equipped with remote sensing systems, mounted on satellites and airborne platforms, but it also involves in-situ observations, numerical models and social media data streams, among other data sources. Data-driven approaches, and ML techniques in particular, are the natural choice to extract significant i…
Gaussianizing the Earth: Multidimensional Information Measures for Earth Data Analysis
2021
Information theory is an excellent framework for analyzing Earth system data because it allows us to characterize uncertainty and redundancy, and is universally interpretable. However, accurately estimating information content is challenging because spatio-temporal data is high-dimensional, heterogeneous and has non-linear characteristics. In this paper, we apply multivariate Gaussianization for probability density estimation which is robust to dimensionality, comes with statistical guarantees, and is easy to apply. In addition, this methodology allows us to estimate information-theoretic measures to characterize multivariate densities: information, entropy, total correlation, and mutual in…
Statistically validated mobile communication networks: the evolution of motifs in European and Chinese data
2014
Big data open up unprecedented opportunities to investigate complex systems including the society. In particular, communication data serve as major sources for computational social sciences but they have to be cleaned and filtered as they may contain spurious information due to recording errors as well as interactions, like commercial and marketing activities, not directly related to the social network. The network constructed from communication data can only be considered as a proxy for the network of social relationships. Here we apply a systematic method, based on multiple hypothesis testing, to statistically validate the links and then construct the corresponding Bonferroni network, gen…