6533b85afe1ef96bd12b9e96

RESEARCH PRODUCT

Online fitted policy iteration based on extreme learning machines

José M. Martínez-martínezJoan Vila-francésEmilio Soria-olivasPablo Escandell-monteroD. LorenteJosé D. Martín-guerrero

subject

0209 industrial biotechnologyInformation Systems and ManagementRadial basis function networkArtificial neural networkComputer sciencebusiness.industryStability (learning theory)02 engineering and technologyMachine learningcomputer.software_genreManagement Information Systems020901 industrial engineering & automationArtificial IntelligenceBellman equation0202 electrical engineering electronic engineering information engineeringBenchmark (computing)Reinforcement learning020201 artificial intelligence & image processingArtificial intelligencebusinesscomputerSoftwareExtreme learning machine

description

Reinforcement learning (RL) is a learning paradigm that can be useful in a wide variety of real-world applications. However, its applicability to complex problems remains problematic due to different causes. Particularly important among these are the high quantity of data required by the agent to learn useful policies and the poor scalability to high-dimensional problems due to the use of local approximators. This paper presents a novel RL algorithm, called online fitted policy iteration (OFPI), that steps forward in both directions. OFPI is based on a semi-batch scheme that increases the convergence speed by reusing data and enables the use of global approximators by reformulating the value function approximation as a standard supervised problem. The proposed method has been empirically evaluated in three benchmark problems. During the experiments, OFPI has employed a neural network trained with the extreme learning machine algorithm to approximate the value functions. Results have demonstrated the stability of OFPI using a global function approximator and also performance improvements over two baseline algorithms (SARSA and Q-learning) combined with eligibility traces and a radial basis function network.

https://doi.org/10.1016/j.knosys.2016.03.007