6533b81ffe1ef96bd1277a04

RESEARCH PRODUCT

Realizing Undelayed N-step TD prediction with neural networks

Janis Zuters

subject

Dynamic programmingArtificial neural networkComputer sciencebusiness.industryValue (computer science)Reinforcement learningObservableExtension (predicate logic)Artificial intelligencebusiness

description

There exist various techniques to extend reinforcement learning algorithms, e.g., eligibility traces and planning. In this paper, an approach is proposed, which combines several extension techniques, such as using eligibility-like traces, using approximators as value functions and exploiting the model of the environment. The obtained method, ‘Undelayed n-step TD prediction’ (TD-P), has produced competitive results when put in conditions of not fully observable environment.

https://doi.org/10.1109/melcon.2010.5476332