6533b872fe1ef96bd12d2cf9

RESEARCH PRODUCT

Towards Model-Based Reinforcement Learning for Industry-Near Environments

Per-arne AndersenMorten GoodwinOle-christoffer Granmo

subject

HyperparameterArtificial neural networkComputer sciencebusiness.industrySample (statistics)Variance (accounting)Machine learningcomputer.software_genreVariety (cybernetics)Test suiteReinforcement learningArtificial intelligenceMarkov decision processbusinesscomputer

description

Deep reinforcement learning has over the past few years shown great potential in learning near-optimal control in complex simulated environments with little visible information. Rainbow (Q-Learning) and PPO (Policy Optimisation) have shown outstanding performance in a variety of tasks, including Atari 2600, MuJoCo, and Roboschool test suite. Although these algorithms are fundamentally different, both suffer from high variance, low sample efficiency, and hyperparameter sensitivity that, in practice, make these algorithms a no-go for critical operations in the industry.

https://doi.org/10.1007/978-3-030-34885-4_3