Search results for "STATISTICS"
showing 10 items of 7671 documents
Adaptive Continuous Feature Binarization for Tsetlin Machines Applied to Forecasting Dengue Incidences in the Philippines
2020
The Tsetlin Machine (TM) is a recent interpretable machine learning algorithm that requires relatively modest computational power, yet attains competitive accuracy in several benchmarks. TMs are inherently binary; however, many machine learning problems are continuous. While binarization of continuous data through brute-force thresholding has yielded promising accuracy, such an approach is computationally expensive and hinders extrapolation. In this paper, we address these limitations by standardizing features to support scale shifts in the transition from training data to real-world operation, typical for e.g. forecasting. For scalability, we employ sampling to reduce the number of binariz…
A Review of Kernel Methods in Remote Sensing Data Analysis
2011
Kernel methods have proven effective in the analysis of images of the Earth acquired by airborne and satellite sensors. Kernel methods provide a consistent and well-founded theoretical framework for developing nonlinear techniques and have useful properties when dealing with low number of (potentially high dimensional) training samples, the presence of heterogenous multimodalities, and different noise sources in the data. These properties are particularly appropriate for remote sensing data analysis. In fact, kernel methods have improved results of parametric linear methods and neural networks in applications such as natural resource control, detection and monitoring of anthropic infrastruc…
Daily Peak Temperature Forecasting with Elman Neural Networks
2005
This work presents a forecaster based on an Elman artificial neural network trained with resilient backpropagation algorithm for predicting the daily peak temperatures one day ahead. The available time series was recorded at Petrosino (TP), in the west coast of Sicily, Italy and it is composed by temperature (min and max values), the humidity (min and max values) and the rainfall value between January 1st, 1995 and May 14th, 2003. Performances and reliabilities of the proposed model were evaluated by a number of measures, comparing different neural models. Experimental results show very good prediction performances.
An Application of Hybrid Models in Credit Scoring
2000
The predictive capability of parametric and non-parametric models in solving problems related to financial classification has been widely proved in empirical research carried out in the financial field, particulary in problems like bond rating, bankruptcy prediction and credit scoring. However, recently, it has been shown that a combination of different models generally reduces the prediction error, so that the best alternative to consider may not be a specific model but a combination of them. In this paper, we study hybrid systems based on the aggregation of individual (parametric and nonparametric) models. Our hybrids are built by using both parametric and non parametric models as the sys…
Increasing sample efficiency in deep reinforcement learning using generative environment modelling
2020
CostNet: An End-to-End Framework for Goal-Directed Reinforcement Learning
2020
Reinforcement Learning (RL) is a general framework concerned with an agent that seeks to maximize rewards in an environment. The learning typically happens through trial and error using explorative methods, such as \(\epsilon \)-greedy. There are two approaches, model-based and model-free reinforcement learning, that show concrete results in several disciplines. Model-based RL learns a model of the environment for learning the policy while model-free approaches are fully explorative and exploitative without considering the underlying environment dynamics. Model-free RL works conceptually well in simulated environments, and empirical evidence suggests that trial and error lead to a near-opti…
Optimal Pruned K-Nearest Neighbors: OP-KNN Application to Financial Modeling
2008
The paper proposes a methodology called OP-KNN, which builds a one hidden-layer feed forward neural network, using nearest neighbors neurons with extremely small computational time. The main strategy is to select the most relevant variables beforehand, then to build the model using KNN kernels. Multi-response sparse regression (MRSR) is used as the second step in order to rank each k-th nearest neighbor and finally as a third step leave-one-out estimation is used to select the number of neighbors and to estimate the generalization performances. This new methodology is tested on a toy example and is applied to financial modeling.
Semi-Supervised Support Vector Biophysical Parameter Estimation
2008
Two kernel-based methods for semi-supervised regression are presented. The methods rely on building a graph or hypergraph Laplacian with both the labeled and unlabeled data, which is further used to deform the training kernel matrix. The deformed kernel is then used for support vector regression (SVR). The semi-supervised SVR methods are sucessfully tested in LAI estimation and ocean chlorophyll concentration prediction from remotely sensed images.
Improved reliability estimates for the serial color-word test
1978
Starting from Lennart Sjoberg's serial scoring of the Color-Word Test and his critical review of the test, the possibilities of attaining better reliability estimates are briefly surveyed. As a simple step, the orthogonalization of the regression model is suggested. Ways of maximizing the reliability estimate are demonstrated. On the basis of 261 subjects from five differing subsamples, clinical and control groups, the reliability estimates of the oblique system of the orthogonalized system and of the maximum reliability solutions are compared empirically. The significance of the results for test theoretic evaluation of the Color-Word Test is discussed.
On the geographical distribution of pseudocholinesterase variants.
1975
The incidence of pseudocholinesterase (PCHE equals E.C. 3.1.1.8) variants in samples of 8 different population (total of 2218 individuals) is reported. Together with previously mentioned data from the literature, a general survey on the geographical distribution of PCHE isoenzymes is given. Possible reasons for present-day heterogeneity of their distribution are also discussed. Concerning the incidence of the C5 variant, it is pointed out that the validity of applying population genetic models depends upon the accuracy of the genetic basis.