Search results for "Machine learning"
showing 10 items of 1464 documents
A probabilistic estimation and prediction technique for dynamic continuous social science models: The evolution of the attitude of the Basque Country…
2015
In this paper, a computational technique to deal with uncertainty in dynamic continuous models in Social Sciences is presented.Considering data from surveys,the method consists of determining the probability distribution of the survey output and this allows to sample data and fit the model to the sampled data using a goodness-of-fit criterion based the χ2-test. Taking the fitted parameters that were not rejected by the χ2-test, substituting them into the model and computing their outputs, 95% confidence intervals in each time instant capturing the uncertainty of the survey data (probabilistic estimation) is built. Using the same set of obtained model parameters, a prediction over …
Extending the Tsetlin Machine With Integer-Weighted Clauses for Increased Interpretability
2020
Despite significant effort, building models that are both interpretable and accurate is an unresolved challenge for many pattern recognition problems. In general, rule-based and linear models lack accuracy, while deep learning interpretability is based on rough approximations of the underlying inference. Using a linear combination of conjunctive clauses in propositional logic, Tsetlin Machines (TMs) have shown competitive performance on diverse benchmarks. However, to do so, many clauses are needed, which impacts interpretability. Here, we address the accuracy-interpretability challenge in machine learning by equipping the TM clauses with integer weights. The resulting Integer Weighted TM (…
Adaptive Task Assignment in Online Learning Environments
2016
With the increasing popularity of online learning, intelligent tutoring systems are regaining increased attention. In this paper, we introduce adaptive algorithms for personalized assignment of learning tasks to student so that to improve his performance in online learning environments. As main contribution of this paper, we propose a a novel Skill-Based Task Selector (SBTS) algorithm which is able to approximate a student's skill level based on his performance and consequently suggest adequate assignments. The SBTS is inspired by the class of multi-armed bandit algorithms. However, in contrast to standard multi-armed bandit approaches, the SBTS aims at acquiring two criteria related to stu…
Nonlinearities and Adaptation of Color Vision from Sequential Principal Curves Analysis
2016
Mechanisms of human color vision are characterized by two phenomenological aspects: the system is nonlinear and adaptive to changing environments. Conventional attempts to derive these features from statistics use separate arguments for each aspect. The few statistical explanations that do consider both phenomena simultaneously follow parametric formulations based on empirical models. Therefore, it may be argued that the behavior does not come directly from the color statistics but from the convenient functional form adopted. In addition, many times the whole statistical analysis is based on simplified databases that disregard relevant physical effects in the input signal, as, for instance…
Optimized Kernel Entropy Components
2016
This work addresses two main issues of the standard Kernel Entropy Component Analysis (KECA) algorithm: the optimization of the kernel decomposition and the optimization of the Gaussian kernel parameter. KECA roughly reduces to a sorting of the importance of kernel eigenvectors by entropy instead of by variance as in Kernel Principal Components Analysis. In this work, we propose an extension of the KECA method, named Optimized KECA (OKECA), that directly extracts the optimal features retaining most of the data entropy by means of compacting the information in very few features (often in just one or two). The proposed method produces features which have higher expressive power. In particular…
Simplifying Probabilistic Expressions in Causal Inference
2018
Obtaining a non-parametric expression for an interventional distribution is one of the most fundamental tasks in causal inference. Such an expression can be obtained for an identifiable causal effect by an algorithm or by manual application of do-calculus. Often we are left with a complicated expression which can lead to biased or inefficient estimates when missing data or measurement errors are involved. We present an automatic simplification algorithm that seeks to eliminate symbolically unnecessary variables from these expressions by taking advantage of the structure of the underlying graphical model. Our method is applicable to all causal effect formulas and is readily available in the …
Anomaly Detection Framework Using Rule Extraction for Efficient Intrusion Detection
2014
Huge datasets in cyber security, such as network traffic logs, can be analyzed using machine learning and data mining methods. However, the amount of collected data is increasing, which makes analysis more difficult. Many machine learning methods have not been designed for big datasets, and consequently are slow and difficult to understand. We address the issue of efficient network traffic classification by creating an intrusion detection framework that applies dimensionality reduction and conjunctive rule extraction. The system can perform unsupervised anomaly detection and use this information to create conjunctive rules that classify huge amounts of traffic in real time. We test the impl…
Ensembles of Randomized Time Series Shapelets Provide Improved Accuracy while Reducing Computational Costs
2017
Shapelets are discriminative time series subsequences that allow generation of interpretable classification models, which provide faster and generally better classification than the nearest neighbor approach. However, the shapelet discovery process requires the evaluation of all possible subsequences of all time series in the training set, making it extremely computation intensive. Consequently, shapelet discovery for large time series datasets quickly becomes intractable. A number of improvements have been proposed to reduce the training time. These techniques use approximation or discretization and often lead to reduced classification accuracy compared to the exact method. We are proposin…
Renewable Energy Prediction using Weather Forecasts for Optimal Scheduling in HPC Systems
2014
The objective of the GreenPAD project is to use green energy (wind, solar and biomass) for powering data-centers that are used to run HPC jobs. As a part of this it is important to predict the Renewable (Wind) energy for efficient scheduling (executing jobs that require higher energy when there is more green energy available and vice-versa). For predicting the wind energy we first analyze the historical data to find a statistical model that gives relation between wind energy and weather attributes. Then we use this model based on the weather forecast data to predict the green energy availability in the future. Using the green energy prediction obtained from the statistical model we are able…
Retrieval of Case 2 Water Quality Parameters with Machine Learning
2018
Water quality parameters are derived applying several machine learning regression methods on the Case2eXtreme dataset (C2X). The used data are based on Hydrolight in-water radiative transfer simulations at Sentinel-3 OLCI wavebands, and the application is done exclusively for absorbing waters with high concentrations of coloured dissolved organic matter (CDOM). The regression approaches are: regularized linear, random forest, Kernel ridge, Gaussian process and support vector regressors. The validation is made with and an independent simulation dataset. A comparison with the OLCI Neural Network Swarm (ONSS) is made as well. The best approached is applied to a sample scene and compared with t…