Simple learning rules to cope with changing environments

6533b829fe1ef96bd128a43b

RESEARCH PRODUCT

Simple learning rules to cope with changing environments

Edmund J. Collins Nigel R. Franks Alasdair I. Houston François-xavier Dechaume-moncharmont John M. Mcnamara Roderich Groß

subject

0106 biological sciences Error-driven learning Exploit Computer science Energy (esotericism)Biomedical Engineering Biophysics Bioengineering animal behavior 010603 evolutionary biology 01 natural sciences Biochemistry Multi-armed bandit Models Biological decision making Biomaterials 03 medical and health sciences [ INFO.INFO-BI ] Computer Science [cs]/Bioinformatics [q-bio.QM][ SDV.EE.IEO ] Life Sciences [q-bio]/Ecology environment/Symbiosis Animals Learning Computer Simulation [ SDV.BIBS ] Life Sciences [q-bio]/Quantitative Methods [q-bio.QM]multi-armed bandit Ecosystem 030304 developmental biology Simple (philosophy)0303 health sciences [ SDE.BE ] Environmental Sciences/Biodiversity and Ecology business.industry dynamic environments learning rules decision-making [SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM]Unlimited period Range (mathematics)Action (philosophy)Artificial intelligence [SDE.BE]Environmental Sciences/Biodiversity and Ecology [INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM]business Biotechnology Research Article [SDV.EE.IEO]Life Sciences [q-bio]/Ecology environment/Symbiosis

description

10 pages; International audience; We consider an agent that must choose repeatedly among several actions. Each action has a certain probability of giving the agent an energy reward, and costs may be associated with switching between actions. The agent does not know which action has the highest reward probability, and the probabilities change randomly over time. We study two learning rules that have been widely used to model decision-making processes in animals-one deterministic and the other stochastic. In particular, we examine the influence of the rules' 'learning rate' on the agent's energy gain. We compare the performance of each rule with the best performance attainable when the agent has either full knowledge or no knowledge of the environment. Over relatively short periods of time, both rules are successful in enabling agents to exploit their environment. Moreover, under a range of effective learning rates, both rules are equivalent, and can be expressed by a third rule that requires the agent to select the action for which the current run of unsuccessful trials is shortest. However, the performance of both rules is relatively poor over longer periods of time, and under most circumstances no better than the performance an agent could achieve without knowledge of the environment. We propose a simple extension to the original rules that enables agents to learn about and effectively exploit a changing environment for an unlimited period of time.

year	journal	country	edition	language
2008-10-06

https://infoscience.epfl.ch/record/117355