6533b85cfe1ef96bd12bc84c

RESEARCH PRODUCT

Sequence Q-learning: A memory-based method towards solving POMDP

Janis Zuters

subject

SequenceComputer sciencebusiness.industryQ-learningPartially observable Markov decision processMarkov processContext (language use)Markov modelsymbols.namesakeBellman equationsymbolsArtificial intelligenceMarkov decision processbusiness

description

Partially observable Markov decision process (POMDP) models a control problem, where states are only partially observable by an agent. The two main approaches to solve such tasks are these of value function and direct search in policy space. This paper introduces the Sequence Q-learning method which extends the well known Q-learning algorithm towards the ability to solve POMDPs through adding a special sequence management framework by advancing from action values to “sequence” values and including the “sequence continuity principle”.

https://doi.org/10.1109/mmar.2015.7283925