6533b82ffe1ef96bd1295ca1

RESEARCH PRODUCT

Using the Theory of Regular Functions to Formally Prove the ε-Optimality of Discretized Pursuit Learning Algorithms

Lei JiaoB. John OommenXuan ZhangOle-christopher Granmo

subject

Constraint (information theory)Basis pursuit denoisingLearning automataComputer scienceReinforcement learningBasis pursuitMathematical proofMatching pursuitAlgorithmField (computer science)

description

Learning Automata LA can be reckoned to be the founding algorithms on which the field of Reinforcement Learning has been built. Among the families of LA, Estimator Algorithms EAs are certainly the fastest, and of these, the family of Pursuit Algorithms PAs are the pioneering work. It has recently been reported that the previous proofs for e-optimality for all the reported algorithms in the family of PAs have been flawed. We applaud the researchers who discovered this flaw, and who further proceeded to rectify the proof for the Continuous Pursuit Algorithm CPA. The latter proof, though requires the learning parameter to be continuously changing, is, to the best of our knowledge, the current best and only way to prove CPA's e-optimality. However, for all the algorithms with absorbing states, for example, the Absorbing Continuous Pursuit Algorithm ACPA and the Discretized Pursuit Algorithm DPA, the constrain of a continuously changing learning parameter can be removed. In this paper, we provide a new method to prove the e-optimality of the Discretized Pursuit Algorithm which does not require this constraint. We believe that our proof is both unique and pioneering. It can also form the basis for formally showing the e-optimality of the other EAs with absorbing states.

https://doi.org/10.1007/978-3-319-07455-9_40