6533b871fe1ef96bd12d0e17

RESEARCH PRODUCT

Thompson Sampling Guided Stochastic Searching on the Line for Non-stationary Adversarial Learning

Sondre GlimsdalOle-christopher Granmo

subject

Adversarial systemComputer scienceProperty (programming)business.industryProcess (computing)Reinforcement learningArtificial intelligencebusinessRepresentation (mathematics)Bayesian inferenceMulti-armed banditThompson sampling

description

This paper reports the first known solution to the N-Door puzzle when the environment is both non-stationary and deceptive (adversarial learning). The Multi-Armed-Bandit (MAB) problem is the iconic representation of the exploration versus exploitation dilemma. In brief, a gambler repeatedly selects and play, one out of N possible slot machines or arms and either receives a reward or a penalty. The objective of the gambler is then to locate the most rewarding arm to play, while in the process maximize his winnings. In this paper we investigate a challenging variant of the MAB problem, namely the non-stationary N-Door puzzle. Here, instead of directly observing the reward, the gambler is only told whether the optimal arm lies to the "left" or to the "right" of the selected arm, with the feedback being erroneous with probability 1 -- p. However, due to the non-stationary property the optimal arm can abruptly and without notice switch place with a previous sub-optimal arm. To further complicate the situation, we do not assume that the environment is informative, that is, we allow for a traitorous environment that on-average guide the gambler in the opposite direction of the optimal arm (adversarial learning problem). This coupled with the non-stationary property makes for a highly demanding reinforcement learning problem. The novel scheme presented in this paper enhance the previous top contender for the stationary N-door problem with the capability to detect and adapt to a changing environment. The resulting scheme TS-NSPL is then empirically proved to be superior to the existing state-of-art.

https://doi.org/10.1109/icmla.2015.203