Search results for "reinforcement learning"

showing 10 items of 95 documents

Experiments in Value Function Approximation with Sparse Support Vector Regression

2004

We present first experiments using Support Vector Regression as function approximator for an on-line, sarsa-like reinforcement learner. To overcome the batch nature of SVR two ideas are employed. The first is sparse greedy approximation: the data is projected onto the subspace spanned by only a small subset of the original data (in feature space). This subset can be built up in an on-line fashion. Second, we use the sparsified data to solve a reduced quadratic problem, where the number of variables is independent of the total number of training samples seen. The feasability of this approach is demonstrated on two common toy-problems.

Support vector machineFunction approximationVariablesmedia_common.quotation_subjectFeature vectorReinforcement learningFunction (mathematics)AlgorithmSubspace topologyVector spaceMathematicsmedia_common
researchProduct

User Grouping and Power Allocation in NOMA Systems: A Reinforcement Learning-Based Solution

2020

In this paper, we present a pioneering solution to the problem of user grouping and power allocation in Non-Orthogonal Multiple Access (NOMA) systems. There are two fundamentally salient and difficult issues associated with NOMA systems. The first involves the task of grouping users together into the pre-specified time slots. The subsequent second phase augments this with the solution of determining how much power should be allocated to the respective users. We resolve this with the first reported Reinforcement Learning (RL)-based solution, which attempts to solve the partitioning phase of this issue. In particular, we invoke the Object Migration Automata (OMA) and one of its variants to re…

Theoretical computer scienceLearning automataComputer science020206 networking & telecommunications02 engineering and technologymedicine.diseaseTask (project management)AutomatonPower (physics)NomaSalient0202 electrical engineering electronic engineering information engineeringmedicineReinforcement learningGreedy algorithm
researchProduct

Multi-Layer Offloading at the Edge for Vehicular Networks

2020

This paper proposes a multi-layer platform for job offloading in vehicular networks. Offloading is performed from vehicles in the Vehicular Domain towards Multi-Access Edge Computing (MEC) Servers deployed at the edge of the network, and between MEC Servers. Offloading decisions at both domains are challenging for the overall system performance. Optimization at the MEC Layer domain is obtained by model-based Reinforcement Learning, while a strategy to decide the best offloading rate from the Vehicular Domain is defined to achieve the desired trade-off between costs and performance. Numerical analysis shows the achieved performance.

Vehicular ad hoc networkComputer scienceDistributed computingServerComputerSystemsOrganization_COMPUTER-COMMUNICATIONNETWORKSReinforcement learningEnergy consumptionEnhanced Data Rates for GSM EvolutionLayer (object-oriented design)Edge computingDomain (software engineering)
researchProduct

Weeds sampling for map reconstruction: a Markov random field approach

2012

In the past 15 years, there has been a growing interest for the study of the spatial repartition of weeds in crops, mainly because this is a prerequisite to herbicides use reduction. There has been a large variety of statistical methods developped for this problem ([5], [7], [10]). However, one common point of all of these methods is that they are based on in situ collection of data about weeds spatial repartition. A crucial problem is then to choose where, in the eld, data should be collected. Since exhaustive sampling of a eld is too costly, a lot of attention has been paid to the development of spatial sampling methods ([12], [4], [6] [9]). Classical spatial stochastic model of weeds cou…

[SDE.BE] Environmental Sciences/Biodiversity and EcologyBiodiversity and Ecology[ SDE.BE ] Environmental Sciences/Biodiversity and Ecology[STAT.TH] Statistics [stat]/Statistics Theory [stat.TH][MATH.MATH-ST]Mathematics [math]/Statistics [math.ST]Biodiversité et EcologieStatistiques (Mathématiques)[ MATH.MATH-ST ] Mathematics [math]/Statistics [math.ST][STAT.TH]Statistics [stat]/Statistics Theory [stat.TH]Markov decision process;dynamic programming;reinforcement learning;adaptive sampling;Markov random field;batch;sampling cost;field approach;weed[SDE.BE]Environmental Sciences/Biodiversity and Ecology[MATH.MATH-ST] Mathematics [math]/Statistics [math.ST][ STAT.TH ] Statistics [stat]/Statistics Theory [stat.TH]
researchProduct

Échantillonnage adaptatif optimal dans les champs de Markov, application à l’échantillonnage d’une espèce adventice

2012

This work is divided into two parts: (i) the theoretical study of the problem of adaptive sampling in Markov Random Fields (MRF) and (ii) the modeling of the problem of weed sampling in a crop field and the design of adaptive sampling strategies for this problem. For the first point, we first modeled the problem of finding an optimal sampling strategy as a finite horizon Markov Decision Process (MDP). Then, we proposed a generic algorithm for computing an approximate solution to any finite horizon MDP with known model. This algorithm, called Least-Squared Dynamic Programming (LSDP), combines the concepts of dynamic programming and reinforcement learning. It was then adapted to compute adapt…

[SDE] Environmental Sciencesdynamic programmingreinforcement learningMarkov random field[SDV]Life Sciences [q-bio]pprentissage par renforcement[SDV] Life Sciences [q-bio]batchprogrammation dynamiquesampling costprocessus décisionnel de Markov[SDE]Environmental Sciencescoût d'échantillonnageMarkov decision processchamp de Markovadventiceweedéchantillonage adaptatif
researchProduct

Adaptive treatment of anemia on hemodialysis patients: A reinforcement learning approach

2011

The aim of this work is to study the applicability of reinforcement learning methods to design adaptive treatment strategies that optimize, in the long-term, the dosage of erythropoiesis-stimulating agents (ESAs) in the management of anemia in patients undergoing hemodialysis. Adaptive treatment strategies are recently emerging as a new paradigm for the treatment and long-term management of the chronic disease. Reinforcement Learning (RL) can be useful to extract such strategies from clinical data, taking into account delayed effects and without requiring any mathematical model. In this work, we focus on the so-called Fitted Q Iteration algorithm, a RL approach that deals with the data very…

business.industryComputer scienceManagement scienceAnemiamedicine.medical_treatmentApproximation algorithmMachine learningcomputer.software_genremedicine.diseaseChronic diseasemedicineTreatment strategyReinforcement learningIn patientPatient treatmentHemodialysisArtificial intelligencebusinesscomputer2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)
researchProduct

A Comparative Analysis of Multiple Biasing Techniques for $Q_{biased}$ Softmax Regression Algorithm

2021

Over the past many years the popularity of robotic workers has seen a tremendous surge. Several tasks which were previously considered insurmountable are able to be performed by robots efficiently, with much ease. This is mainly due to the advances made in the field of control systems and artificial intelligence in recent years. Lately, we have seen Reinforcement Learning (RL) capture the spotlight, in the field of robotics. Instead of explicitly specifying the solution of a particular task, RL enables the robot (agent) to explore its environment and through trial and error choose the appropriate response. In this paper, a comparative analysis of biasing techniques for the Q-biased softmax …

business.industryComputer scienceObstacle avoidanceSoftmax functionQ-learningRobotReinforcement learningMobile robotArtificial intelligencebusinessTrial and errorAction selection2021 International Conference on Artificial Intelligence and Mechatronics Systems (AIMS)
researchProduct

Validation of a Reinforcement Learning Policy for Dosage Optimization of Erythropoietin

2007

This paper deals with the validation of a Reinforcement Learning (RL) policy for dosage optimization of Erythropoietin (EPO). This policy was obtained using data from patients in a haemodialysis program during the year 2005. The goal of this policy was to maintain patients' Haemoglobin (Hb) level between 11.5 g/dl and 12.5 g/dl. An individual management was needed, as each patient usually presents a different response to the treatment. RL provides an attractive and satisfactory solution, showing that a policy based on RL would be much more successful in achieving the goal of maintaining patients within the desired target of Hb than the policy followed by the hospital so far. In this work, t…

business.industryManagement scienceComputer scienceMachine learningcomputer.software_genreData setWork (electrical)Robustness (computer science)ErythropoietinmedicineReinforcement learningArtificial intelligencebusinesscomputermedicine.drug
researchProduct

An adaption mechanism for the error threshold of XCSF

2020

Learning Classifier System (LCS) is a class of rule-based learning algorithms, which combine reinforcement learning (RL) and genetic algorithm (GA) techniques to evolve a population of classifiers. The most prominent example is XCS, for which many variants have been proposed in the past, including XCSF for function approximation. Although XCSF is a promising candidate for supporting autonomy in computing systems, it still must undergo parameter optimization prior to deployment. However, in case the later deployment environment is unknown, a-priori parameter optimization is not possible, raising the need for XCSF to automatically determine suitable parameter values at run-time. One of the mo…

education.field_of_studyLearning classifier systemComputer sciencePopulation0102 computer and information sciences02 engineering and technologyFunction (mathematics)01 natural sciencesSet (abstract data type)Function approximation010201 computation theory & mathematicsApproximation errorGenetic algorithm0202 electrical engineering electronic engineering information engineeringReinforcement learning020201 artificial intelligence & image processingeducationAlgorithmProceedings of the 2020 Genetic and Evolutionary Computation Conference Companion
researchProduct

Validating Habitual and Goal-Directed Decision-Making Performance Online in Healthy Older Adults

2021

Everyday decision-making is supported by a dual-system of control comprised of parallel goal-directed and habitual systems. Over the past decade, the two-stage Markov decision task has become popularized for its ability to dissociate between goal-directed and habitual decision-making. While a handful of studies have implemented decision-making tasks online, only one study has validated the task by comparing in-person and web-based performance on the two-stage task in children and young adults. To date, no study has validated the dissociation of goal-directed and habitual behaviors in older adults online. Here, we implemented and validated a web-based version of the two-stage Markov task usi…

reinforcement learningAging2019-20 coronavirus outbreakCoronavirus disease 2019 (COVID-19)Markov chainCognitive NeuroscienceSevere acute respiratory syndrome coronavirus 2 (SARS-CoV-2)Applied psychologyNeurosciences. Biological psychiatry. Neuropsychiatrydecision-makinggoal-directedTask (project management)habitualReinforcement learningYoung adultPsychologyvalidatingolder adultsonlineRC321-571NeuroscienceOriginal ResearchFrontiers in Aging Neuroscience
researchProduct