Search results for "Reinforcement"
showing 10 items of 230 documents
Reinforcement learning approach to nonequilibrium quantum thermodynamics
2021
We use a reinforcement learning approach to reduce entropy production in a closed quantum system brought out of equilibrium. Our strategy makes use of an external control Hamiltonian and a policy gradient technique. Our approach bears no dependence on the quantitative tool chosen to characterize the degree of thermodynamic irreversibility induced by the dynamical process being considered, require little knowledge of the dynamics itself and does not need the tracking of the quantum state of the system during the evolution, thus embodying an experimentally non-demanding approach to the control of non-equilibrium quantum thermodynamics. We successfully apply our methods to the case of single- …
Exploratory behaviour is not related to associative learning ability in the carabid beetle Nebria brevicollis.
2020
Abstract Recently, it has been hypothesised that as learning performance and animal personality vary along a common axis of fast and slow types, natural selection may act on both in parallel leading to a correlation between learning and personality traits. We examined the relationship between risk-taking, exploratory behaviour and associative learning ability in carabid beetle Nebria brevicollis females by quantifying the number of trials individuals required to reach criterion during an associative learning task (‘learning performance’). The associative learning task required the females to associate odour and direction with refugia from light and heat in a T-maze. Further, we assessed lea…
MARL-Ped+Hitmap: Towards Improving Agent-Based Simulations with Distributed Arrays
2016
Multi-agent systems allow the modelling of complex, heterogeneous, and distributed systems in a realistic way. MARL-Ped is a multi-agent system tool, based on the MPI standard, for the simulation of different scenarios of pedestrians who autonomously learn the best behavior by Reinforcement Learning. MARL-Ped uses one MPI process for each agent by design, with a fixed fine-grain granularity. This requirement limits the performance of the simulations for a restricted number of processors that is lesser than the number of agents. On the other hand, Hitmap is a library to ease the programming of parallel applications based on distributed arrays. It includes abstractions for the automatic parti…
Towards Intelligent IoT Networks: Reinforcement Learning for Reliable Backscatter Communications
2019
Backscatter communication is becoming the focal point of research for low-powered Internet of things (IoT). However, the intelligence aspect of the backscattering devices is not well-defined. Since future IoT networks are going to be a formidable platform of intelligent sensing devices operating in a self-organizing manner, it is necessary to incorporate learning capabilities in backscatter devices. Motivated by this objective, this paper aims to employ reinforcement learning for improving the performance of backscatter networks. In particular, a multicluster backscatter communication model is developed for shortrange information sharing. This is followed by a power allocation algorithm usi…
Effect of the mutual position between weld seam and reinforcement on the residual stress distribution in Friction Stir Welding of AA6082 skin and str…
2016
Abstract In the paper, a numerical and experimental study was carried out to highlight the effect of the distance d between the weld seam and the reinforcement on the residual stress distribution in Friction Stir Welded AA6082-T6 structures. An L-shaped profile was welded to a sheet metal with varying tool rotation and distance d from the weld seam. The Cut Compliance method was used to determine the resulting longitudinal residual stress. A dedicated FE model for FSW was set up, validated and utilized to predict the longitudinal residual stress in the assembled part. The analysis allowed the identification of a few design guidelines in order to reduce the detrimental effects of the residua…
Online fitted policy iteration based on extreme learning machines
2016
Reinforcement learning (RL) is a learning paradigm that can be useful in a wide variety of real-world applications. However, its applicability to complex problems remains problematic due to different causes. Particularly important among these are the high quantity of data required by the agent to learn useful policies and the poor scalability to high-dimensional problems due to the use of local approximators. This paper presents a novel RL algorithm, called online fitted policy iteration (OFPI), that steps forward in both directions. OFPI is based on a semi-batch scheme that increases the convergence speed by reusing data and enables the use of global approximators by reformulating the valu…
Using Inverse Reinforcement Learning with Real Trajectories to Get More Trustworthy Pedestrian Simulations
2020
Reinforcement learning is one of the most promising machine learning techniques to get intelligent behaviors for embodied agents in simulations. The output of the classic Temporal Difference family of Reinforcement Learning algorithms adopts the form of a value function expressed as a numeric table or a function approximator. The learned behavior is then derived using a greedy policy with respect to this value function. Nevertheless, sometimes the learned policy does not meet expectations, and the task of authoring is difficult and unsafe because the modification of one value or parameter in the learned value function has unpredictable consequences in the space of the policies it represents…
2019
As rats learn to search for multiple sources of food or water in a complex environment, they generate increasingly efficient trajectories between reward sites. Such spatial navigation capacity involves the replay of hippocampal place-cells during awake states, generating small sequences of spatially related place-cell activity that we call "snippets". These snippets occur primarily during sharp-wave-ripples (SWRs). Here we focus on the role of such replay events, as the animal is learning a traveling salesperson task (TSP) across multiple trials. We hypothesize that snippet replay generates synthetic data that can substantially expand and restructure the experience available and make learni…
Modelos animales de adicción a las drogas
2017
El desarrollo de modelos animales de refuerzo y adicción a las drogas es imprescindible para el avance en el conocimiento de las bases biológicas de este trastorno y la identificación de nuevas dianas terapéuticas. En función del componente del refuerzo que deseemos estudiar podemos servirnos de un tipo de modelos animales u otros. Podemos utilizar modelos de refuerzo basados en el efecto hedónico primario que produce el consumo de la sustancia adictiva, como los modelos de autoadministración (AA) y autoestimulación eléctrica intracraneal (AEIC), o modelos basados en el componente relacionado con el aprendizaje asociativo y la capacidad cognitiva de realizar predicciones sobre la obtención …
Introducing Clicker Training as a Cognitive Enrichment for Laboratory Mice
2017
Establishing new refinement strategies in laboratory animal science is a central goal in fulfilling the requirements of Directive 2010/63/EU. Previous research determined a profound impact of gentle handling protocols on the well-being of laboratory mice. By introducing clicker training to the keeping of mice, not only do we promote the amicable treatment of mice, but we also enable them to experience cognitive enrichment. Clicker training is a form of positive reinforcement training using a conditioned secondary reinforcer, the "click" sound of a clicker, which serves as a time bridge between the strengthened behavior and an upcoming reward. The effective implementation of the clicker trai…