Search results for "Reinforcement learning"
showing 10 items of 95 documents
Reinforcement learning approach to nonequilibrium quantum thermodynamics
2021
We use a reinforcement learning approach to reduce entropy production in a closed quantum system brought out of equilibrium. Our strategy makes use of an external control Hamiltonian and a policy gradient technique. Our approach bears no dependence on the quantitative tool chosen to characterize the degree of thermodynamic irreversibility induced by the dynamical process being considered, require little knowledge of the dynamics itself and does not need the tracking of the quantum state of the system during the evolution, thus embodying an experimentally non-demanding approach to the control of non-equilibrium quantum thermodynamics. We successfully apply our methods to the case of single- …
MARL-Ped+Hitmap: Towards Improving Agent-Based Simulations with Distributed Arrays
2016
Multi-agent systems allow the modelling of complex, heterogeneous, and distributed systems in a realistic way. MARL-Ped is a multi-agent system tool, based on the MPI standard, for the simulation of different scenarios of pedestrians who autonomously learn the best behavior by Reinforcement Learning. MARL-Ped uses one MPI process for each agent by design, with a fixed fine-grain granularity. This requirement limits the performance of the simulations for a restricted number of processors that is lesser than the number of agents. On the other hand, Hitmap is a library to ease the programming of parallel applications based on distributed arrays. It includes abstractions for the automatic parti…
Towards Intelligent IoT Networks: Reinforcement Learning for Reliable Backscatter Communications
2019
Backscatter communication is becoming the focal point of research for low-powered Internet of things (IoT). However, the intelligence aspect of the backscattering devices is not well-defined. Since future IoT networks are going to be a formidable platform of intelligent sensing devices operating in a self-organizing manner, it is necessary to incorporate learning capabilities in backscatter devices. Motivated by this objective, this paper aims to employ reinforcement learning for improving the performance of backscatter networks. In particular, a multicluster backscatter communication model is developed for shortrange information sharing. This is followed by a power allocation algorithm usi…
Online fitted policy iteration based on extreme learning machines
2016
Reinforcement learning (RL) is a learning paradigm that can be useful in a wide variety of real-world applications. However, its applicability to complex problems remains problematic due to different causes. Particularly important among these are the high quantity of data required by the agent to learn useful policies and the poor scalability to high-dimensional problems due to the use of local approximators. This paper presents a novel RL algorithm, called online fitted policy iteration (OFPI), that steps forward in both directions. OFPI is based on a semi-batch scheme that increases the convergence speed by reusing data and enables the use of global approximators by reformulating the valu…
Using Inverse Reinforcement Learning with Real Trajectories to Get More Trustworthy Pedestrian Simulations
2020
Reinforcement learning is one of the most promising machine learning techniques to get intelligent behaviors for embodied agents in simulations. The output of the classic Temporal Difference family of Reinforcement Learning algorithms adopts the form of a value function expressed as a numeric table or a function approximator. The learned behavior is then derived using a greedy policy with respect to this value function. Nevertheless, sometimes the learned policy does not meet expectations, and the task of authoring is difficult and unsafe because the modification of one value or parameter in the learned value function has unpredictable consequences in the space of the policies it represents…
2019
As rats learn to search for multiple sources of food or water in a complex environment, they generate increasingly efficient trajectories between reward sites. Such spatial navigation capacity involves the replay of hippocampal place-cells during awake states, generating small sequences of spatially related place-cell activity that we call "snippets". These snippets occur primarily during sharp-wave-ripples (SWRs). Here we focus on the role of such replay events, as the animal is learning a traveling salesperson task (TSP) across multiple trials. We hypothesize that snippet replay generates synthetic data that can substantially expand and restructure the experience available and make learni…
Reinforcement learning in synthetic gene circuits.
2020
Synthetic gene circuits allow programming in DNA the expression of a phenotype at a given environmental condition. The recent integration of memory systems with gene circuits opens the door to their adaptation to new conditions and their re-programming. This lays the foundation to emulate neuromorphic behaviour and solve complex problems similarly to artificial neural networks. Cellular products such as DNA or proteins can be used to store memory in both digital and analog formats, allowing cells to be turned into living computing devices able to record information regarding their previous states. In particular, synthetic gene circuits with memory can be engineered into living systems to al…
Cortical Recruitment Determines Learning Dynamics and Strategy
2018
AbstractSalience is a broad and widely used concept in neuroscience whose neuronal correlates, however, remain elusive. In behavioral conditioning, salience is used to explain various effects, such as stimulus overshadowing, and refers to how fast and strongly a stimulus can be associated with a conditioned event. Here, we show that sounds of diverse quality, but equal intensity and perceptual detectability, can recruit different levels of population activity in mouse auditory cortex. When using these sounds as cues in a Go/NoGo discrimination task, the degree of cortical recruitment matches the salience parameter of a reinforcement learning model used to analyze learning speed. We test an …
Acute stress impairs reward positivity effect in probabilistic learning
2019
Decision making based on feedback learning requires a series of cognitive processes, including estimating the probability of particular outcomes and modulating expectations between expected versus actual outcomes. It has been suggested that stress affects decision making and subsequent processing of feedback valence and magnitude. However, less is known about the effect of acute stress on reward expectancy. In the current study, participants performed a probabilistic learning task, in which they learned an association between response and feedback within different reward expectancy trials (30% and 70%) under the conditions of stress (threat of shock) and safety (no shock). We recorded event…
Thompson Sampling Guided Stochastic Searching on the Line for Non-stationary Adversarial Learning
2015
This paper reports the first known solution to the N-Door puzzle when the environment is both non-stationary and deceptive (adversarial learning). The Multi-Armed-Bandit (MAB) problem is the iconic representation of the exploration versus exploitation dilemma. In brief, a gambler repeatedly selects and play, one out of N possible slot machines or arms and either receives a reward or a penalty. The objective of the gambler is then to locate the most rewarding arm to play, while in the process maximize his winnings. In this paper we investigate a challenging variant of the MAB problem, namely the non-stationary N-Door puzzle. Here, instead of directly observing the reward, the gambler is only…