6533b85cfe1ef96bd12bc84c
RESEARCH PRODUCT
Sequence Q-learning: A memory-based method towards solving POMDP
Janis Zuterssubject
SequenceComputer sciencebusiness.industryQ-learningPartially observable Markov decision processMarkov processContext (language use)Markov modelsymbols.namesakeBellman equationsymbolsArtificial intelligenceMarkov decision processbusinessdescription
Partially observable Markov decision process (POMDP) models a control problem, where states are only partially observable by an agent. The two main approaches to solve such tasks are these of value function and direct search in policy space. This paper introduces the Sequence Q-learning method which extends the well known Q-learning algorithm towards the ability to solve POMDPs through adding a special sequence management framework by advancing from action values to “sequence” values and including the “sequence continuity principle”.
| year | journal | country | edition | language |
|---|---|---|---|---|
| 2015-08-01 | 2015 20th International Conference on Methods and Models in Automation and Robotics (MMAR) |