6533b825fe1ef96bd1282c1d

RESEARCH PRODUCT

Efficient gaussian process based optimistic knapsack sampling with applications to stochastic resource allocation

Sondre Glimsdal

subject

description

Masteroppgave i informasjons- og kommunikasjonsteknologi IKT590 2013 – Universitetet i Agder, Grimstad The stochastic non-linear fractional knapsack problem is a challeng- ing optimization problem with numerous applications, including resource allocation. The goal is to nd the most valuable mix of materials that ts within a knapsack of xed capacity. When the value functions of the involved materials are fully known and di erentiable, the most valuable mixture can be found by direct application of Lagrange multipliers. How- ever, in many real-world applications, such as web polling, information about material value is uncertain, and in many cases missing altogether. Surprisingly, without prior information about material value, the recently proposed Learning Automata Knapsack Game (LAKG) and Hierarchy of Twofold Resource Allocation Automata (H-TRAA) o ers arbitrarily ac- curate convergence towards the optimal solution, simply by interacting with the knapsack on-line. This paper introduces Gaussian Process based Optimistic Knapsack Sampling (GPOKS) a novel model-based reinforce- ment learning scheme for solving stochastic fractional knapsack problems, founded on Gaussian Process (GP) enabled Optimistic Thompson Sam- pling (OTS). Not only does this scheme converge signi cantly faster than LAKG, GPOKS also incorporates GP based learning of the material val- ues themselves, forming the basis for OTS supported balancing between exploration and exploitation. Using resource allocation in web polling as a proof-of-concept application, our empirical results show that GPOKS con- sistently outperforms LAKG and H-TRAA, the current top-performers, under a wide variety of parameter settings.

http://hdl.handle.net/11250/137604