6533b857fe1ef96bd12b4fb8

RESEARCH PRODUCT

A formal proof of the e-optimality of discretized pursuit algorithms

Xuan ZhangOle-christopher GranmoLei JiaoB. John Oommen

subject

Learning automataDiscretizationInequalityBasis (linear algebra)Computer sciencemedia_common.quotation_subjectField (mathematics)Monotonic function02 engineering and technologyMathematical proofFormal proof020202 computer hardware & architectureAlgebraArtificial Intelligence0202 electrical engineering electronic engineering information engineeringReinforcement learning020201 artificial intelligence & image processingAlgorithmmedia_common

description

Learning Automata (LA) can be reckoned to be the founding algorithms on which the field of Reinforcement Learning has been built. Among the families of LA, Estimator Algorithms (EAs) are certainly the fastest, and of these, the family of discretized algorithms are proven to converge even faster than their continuous counterparts. However, it has recently been reported that the previous proofs for ??-optimality for all the reported algorithms for the past three decades have been flawed. We applaud the researchers who discovered this flaw, and who further proceeded to rectify the proof for the Continuous Pursuit Algorithm (CPA). The latter proof examines the monotonicity property of the probability of selecting the optimal action, and requires the learning parameter to be continuously changing. In this paper, we provide a new method to prove the ??-optimality of the Discretized Pursuit Algorithm (DPA) which does not require this constraint, by virtue of the fact that the DPA has, in and of itself, absorbing barriers to which the LA can jump in a discretized manner. Unlike the proof given (Zhang et al., Appl Intell 41:974---985, 3) for an absorbing version of the CPA, which utilizes the single-action Hoeffding's inequality, the current proof invokes what we shall refer to as the "multi-action" version of the Hoeffding's inequality. We believe that our proof is both unique and pioneering. It can also form the basis for formally showing the ??-optimality of the other EAs that possess absorbing states.

http://hdl.handle.net/11250/2413070