Search results for "reinforcement learning"
showing 10 items of 95 documents
Experiments in Value Function Approximation with Sparse Support Vector Regression
2004
We present first experiments using Support Vector Regression as function approximator for an on-line, sarsa-like reinforcement learner. To overcome the batch nature of SVR two ideas are employed. The first is sparse greedy approximation: the data is projected onto the subspace spanned by only a small subset of the original data (in feature space). This subset can be built up in an on-line fashion. Second, we use the sparsified data to solve a reduced quadratic problem, where the number of variables is independent of the total number of training samples seen. The feasability of this approach is demonstrated on two common toy-problems.
User Grouping and Power Allocation in NOMA Systems: A Reinforcement Learning-Based Solution
2020
In this paper, we present a pioneering solution to the problem of user grouping and power allocation in Non-Orthogonal Multiple Access (NOMA) systems. There are two fundamentally salient and difficult issues associated with NOMA systems. The first involves the task of grouping users together into the pre-specified time slots. The subsequent second phase augments this with the solution of determining how much power should be allocated to the respective users. We resolve this with the first reported Reinforcement Learning (RL)-based solution, which attempts to solve the partitioning phase of this issue. In particular, we invoke the Object Migration Automata (OMA) and one of its variants to re…
Multi-Layer Offloading at the Edge for Vehicular Networks
2020
This paper proposes a multi-layer platform for job offloading in vehicular networks. Offloading is performed from vehicles in the Vehicular Domain towards Multi-Access Edge Computing (MEC) Servers deployed at the edge of the network, and between MEC Servers. Offloading decisions at both domains are challenging for the overall system performance. Optimization at the MEC Layer domain is obtained by model-based Reinforcement Learning, while a strategy to decide the best offloading rate from the Vehicular Domain is defined to achieve the desired trade-off between costs and performance. Numerical analysis shows the achieved performance.
Weeds sampling for map reconstruction: a Markov random field approach
2012
In the past 15 years, there has been a growing interest for the study of the spatial repartition of weeds in crops, mainly because this is a prerequisite to herbicides use reduction. There has been a large variety of statistical methods developped for this problem ([5], [7], [10]). However, one common point of all of these methods is that they are based on in situ collection of data about weeds spatial repartition. A crucial problem is then to choose where, in the eld, data should be collected. Since exhaustive sampling of a eld is too costly, a lot of attention has been paid to the development of spatial sampling methods ([12], [4], [6] [9]). Classical spatial stochastic model of weeds cou…
Échantillonnage adaptatif optimal dans les champs de Markov, application à l’échantillonnage d’une espèce adventice
2012
This work is divided into two parts: (i) the theoretical study of the problem of adaptive sampling in Markov Random Fields (MRF) and (ii) the modeling of the problem of weed sampling in a crop field and the design of adaptive sampling strategies for this problem. For the first point, we first modeled the problem of finding an optimal sampling strategy as a finite horizon Markov Decision Process (MDP). Then, we proposed a generic algorithm for computing an approximate solution to any finite horizon MDP with known model. This algorithm, called Least-Squared Dynamic Programming (LSDP), combines the concepts of dynamic programming and reinforcement learning. It was then adapted to compute adapt…
Adaptive treatment of anemia on hemodialysis patients: A reinforcement learning approach
2011
The aim of this work is to study the applicability of reinforcement learning methods to design adaptive treatment strategies that optimize, in the long-term, the dosage of erythropoiesis-stimulating agents (ESAs) in the management of anemia in patients undergoing hemodialysis. Adaptive treatment strategies are recently emerging as a new paradigm for the treatment and long-term management of the chronic disease. Reinforcement Learning (RL) can be useful to extract such strategies from clinical data, taking into account delayed effects and without requiring any mathematical model. In this work, we focus on the so-called Fitted Q Iteration algorithm, a RL approach that deals with the data very…
A Comparative Analysis of Multiple Biasing Techniques for $Q_{biased}$ Softmax Regression Algorithm
2021
Over the past many years the popularity of robotic workers has seen a tremendous surge. Several tasks which were previously considered insurmountable are able to be performed by robots efficiently, with much ease. This is mainly due to the advances made in the field of control systems and artificial intelligence in recent years. Lately, we have seen Reinforcement Learning (RL) capture the spotlight, in the field of robotics. Instead of explicitly specifying the solution of a particular task, RL enables the robot (agent) to explore its environment and through trial and error choose the appropriate response. In this paper, a comparative analysis of biasing techniques for the Q-biased softmax …
Validation of a Reinforcement Learning Policy for Dosage Optimization of Erythropoietin
2007
This paper deals with the validation of a Reinforcement Learning (RL) policy for dosage optimization of Erythropoietin (EPO). This policy was obtained using data from patients in a haemodialysis program during the year 2005. The goal of this policy was to maintain patients' Haemoglobin (Hb) level between 11.5 g/dl and 12.5 g/dl. An individual management was needed, as each patient usually presents a different response to the treatment. RL provides an attractive and satisfactory solution, showing that a policy based on RL would be much more successful in achieving the goal of maintaining patients within the desired target of Hb than the policy followed by the hospital so far. In this work, t…
An adaption mechanism for the error threshold of XCSF
2020
Learning Classifier System (LCS) is a class of rule-based learning algorithms, which combine reinforcement learning (RL) and genetic algorithm (GA) techniques to evolve a population of classifiers. The most prominent example is XCS, for which many variants have been proposed in the past, including XCSF for function approximation. Although XCSF is a promising candidate for supporting autonomy in computing systems, it still must undergo parameter optimization prior to deployment. However, in case the later deployment environment is unknown, a-priori parameter optimization is not possible, raising the need for XCSF to automatically determine suitable parameter values at run-time. One of the mo…
Validating Habitual and Goal-Directed Decision-Making Performance Online in Healthy Older Adults
2021
Everyday decision-making is supported by a dual-system of control comprised of parallel goal-directed and habitual systems. Over the past decade, the two-stage Markov decision task has become popularized for its ability to dissociate between goal-directed and habitual decision-making. While a handful of studies have implemented decision-making tasks online, only one study has validated the task by comparing in-person and web-based performance on the two-stage task in children and young adults. To date, no study has validated the dissociation of goal-directed and habitual behaviors in older adults online. Here, we implemented and validated a web-based version of the two-stage Markov task usi…