Search results for "Markov decision process"
showing 10 items of 22 documents
Optimization of anemia treatment in hemodialysis patients via reinforcement learning
2013
Objective: Anemia is a frequent comorbidity in hemodialysis patients that can be successfully treated by administering erythropoiesis-stimulating agents (ESAs). ESAs dosing is currently based on clinical protocols that often do not account for the high inter- and intra-individual variability in the patient's response. As a result, the hemoglobin level of some patients oscillates around the target range, which is associated with multiple risks and side-effects. This work proposes a methodology based on reinforcement learning (RL) to optimize ESA therapy. Methods: RL is a data-driven approach for solving sequential decision-making problems that are formulated as Markov decision processes (MDP…
Designing a multi-layer edge-computing platform for energy-efficient and delay-aware offloading in vehicular networks
2021
Abstract Vehicular networks are expected to support many time-critical services requiring huge amounts of computation resources with very low delay. However, such requirements may not be fully met by vehicle on-board devices due to their limited processing and storage capabilities. The solution provided by 5G is the application of the Multi-Access Edge Computing (MEC) paradigm, which represents a low-latency alternative to remote clouds. Accordingly, we envision a multi-layer job-offloading scheme based on three levels, i.e., the Vehicular Domain, the MEC Domain and Backhaul Network Domain. In such a view, jobs can be offloaded from the Vehicular Domain to the MEC Domain, and even further o…
Least-squares temporal difference learning based on an extreme learning machine
2014
Abstract Reinforcement learning (RL) is a general class of algorithms for solving decision-making problems, which are usually modeled using the Markov decision process (MDP) framework. RL can find exact solutions only when the MDP state space is discrete and small enough. Due to the fact that many real-world problems are described by continuous variables, approximation is essential in practical applications of RL. This paper is focused on learning the value function of a fixed policy in continuous MPDs. This is an important subproblem of several RL algorithms. We propose a least-squares temporal difference (LSTD) algorithm based on the extreme learning machine. LSTD is typically combined wi…
The Dreaming Variational Autoencoder for Reinforcement Learning Environments
2018
Reinforcement learning has shown great potential in generalizing over raw sensory data using only a single neural network for value optimization. There are several challenges in the current state-of-the-art reinforcement learning algorithms that prevent them from converging towards the global optima. It is likely that the solution to these problems lies in short- and long-term planning, exploration and memory management for reinforcement learning algorithms. Games are often used to benchmark reinforcement learning algorithms as they provide a flexible, reproducible, and easy to control environment. Regardless, few games feature a state-space where results in exploration, memory, and plannin…
Sequence Q-learning: A memory-based method towards solving POMDP
2015
Partially observable Markov decision process (POMDP) models a control problem, where states are only partially observable by an agent. The two main approaches to solve such tasks are these of value function and direct search in policy space. This paper introduces the Sequence Q-learning method which extends the well known Q-learning algorithm towards the ability to solve POMDPs through adding a special sequence management framework by advancing from action values to “sequence” values and including the “sequence continuity principle”.
The Rail Quality Index as an Indicator of the “Global Comfort” in Optimizing Safety, Quality and Efficiency in Railway Rails
2012
AbstractThe proposed model uses the stochastic dynamic programming and in particular Markov decision processes applied to the Rail Quality Index (RQI - Italian Indice di Qualità del Binario, IQB).By performing the integrated analysis of the classes of variables which characterize the overall service quality (in terms of comfort and safety), the proposed mathematical approach allows to find the solutions to the decision-making process in function of the probability of deterioration of the state variables of the infrastructure over time and of the flow of available resources.
A Cognitive Dialogue Manager for Education Purposes
2011
A conversational agent is a software system that is able to interact with users in a natural way, and often uses natural language capabilities. In this chapter, an evolution of a conversational agent is presented according to the definition of dialogue management techniques for the conversational agents. The presented conversational agent is intended to act as a part of an educational system. The chapter outlines the state-of-the-art systems and techniques for dialogue management in cognitive educational systems, and the underlying psychological and social aspects. We present our framework for a dialogue manager aimed to reduce the uncertainty in users’ sentences during the assessment of hi…
A meta-cognitive architecture for planning in uncertain environments
2013
Abstract The behavior of an artificial agent performing in a natural environment is influenced by many different pressures and needs coming from both external world and internal factors, which sometimes drive the agent to reach conflicting goals. At the same time, the interaction between an artificial agent and the environment is deeply affected by uncertainty due to the imprecision in the description of the world, and the unpredictability of the effects of the agent’s actions. Such an agent needs meta-cognition in terms of both self-awareness and control. Self-awareness is related to the internal conditions that may possibly influence the completion of the task, while control is oriented t…
Comprehensive Uncertainty Management in MDPs
2013
Multistage decision-making in robots involved in real-world tasks is a process affected by uncertainty. The effects of the agent’s actions in a physical en- vironment cannot be always predicted deterministically and in a precise manner. Moreover, observing the environment can be a too onerous for a robot, hence not continuos. Markov Decision Processes (MDPs) are a well-known solution inspired to the classic probabilistic approach for managing uncertainty. On the other hand, including fuzzy logics and possibility theory has widened uncertainty representa- tion. Probability, possibility, fuzzy logics, and epistemic belief allow treating dif- ferent and not always superimposable facets of unce…
Allocation des ressources dans l’informatique en brouillard le calcul du brouillard véhiculaire pour une utilisation optimale des véhicules électriqu…
2019
Abstract: Technological advancements made it possible for Electric vehicles (EVs) to have onboard computation, communication, storage, and sensing capabilities. Nevertheless, most of the time these EVs spend their time in parking lots, which makes onboard devices cruelly underutilized. Thus, a better management and pooling these underutilized resources together would be strongly recommended. The new aggregated resources would be useful for traffic safety applications, comfort related applications or can be used as a distributed data center. Moreover, parked vehicles might also be used as a service delivery platform to serve users. Therefore, the use of aggregated abundant resources for the …