Using Inverse Reinforcement Learning with Real Trajectories to Get More Trustworthy Pedestrian Simulations

6533b871fe1ef96bd12d1a89

RESEARCH PRODUCT

Using Inverse Reinforcement Learning with Real Trajectories to Get More Trustworthy Pedestrian Simulations

Ignacio García-fernández Francisco Martinez-gil Pau Romero Miguel Lozano Dolors Serra Rafael Sebastian

subject

0209 industrial biotechnology reinforcement learning Computer science General Mathematics 02 engineering and technology pedestrian simulation Task (project management)learning by demonstration 020901 industrial engineering & automation Aprenentatge Informàtica Bellman equation 0202 electrical engineering electronic engineering information engineering Computer Science (miscellaneous)Reinforcement learning Engineering (miscellaneous)business.industry causal entropy lcsh:Mathematics Process (computing)020206 networking & telecommunications Function (mathematics)inverse reinforcement learning lcsh:QA1-939 Problem domain Table (database)Artificial intelligence Temporal difference learning business optimization

description

Reinforcement learning is one of the most promising machine learning techniques to get intelligent behaviors for embodied agents in simulations. The output of the classic Temporal Difference family of Reinforcement Learning algorithms adopts the form of a value function expressed as a numeric table or a function approximator. The learned behavior is then derived using a greedy policy with respect to this value function. Nevertheless, sometimes the learned policy does not meet expectations, and the task of authoring is difficult and unsafe because the modification of one value or parameter in the learned value function has unpredictable consequences in the space of the policies it represents. This invalidates direct manipulation of the learned value function as a method to modify the derived behaviors. In this paper, we propose the use of Inverse Reinforcement Learning to incorporate real behavior traces in the learning process to shape the learned behaviors, thus increasing their trustworthiness (in terms of conformance to reality). To do so, we adapt the Inverse Reinforcement Learning framework to the navigation problem domain. Specifically, we use Soft Q-learning, an algorithm based on the maximum causal entropy principle, with MARL-Ped (a Reinforcement Learning-based pedestrian simulator) to include information from trajectories of real pedestrians in the process of learning how to navigate inside a virtual 3D space that represents the real environment. A comparison with the behaviors learned using a Reinforcement Learning classic algorithm (Sarsa(&lambda

year	journal	country	edition	language
2020-09-02	Mathematics

10.3390/math8091479 https://www.mdpi.com/2227-7390/8/9/1479