6533b852fe1ef96bd12aa605

RESEARCH PRODUCT

Generating Levels and Playing Super Mario Bros. with Deep Reinforcement Learning Using various techniques for level generation and Deep Q-Networks for playing

Ruben Nygård EngelsvollAnders GammelsrødBjørn-inge Støtvig Thoresen

subject

IKT590VDP::Matematikk og Naturvitenskap: 400::Informasjons- og kommunikasjonsvitenskap: 420::Kunnskapsbaserte systemer: 425VDP::Teknologi: 500::Informasjons- og kommunikasjonsteknologi: 550

description

Master's thesis in Information- and communication technology (IKT590) This thesis aims to explore the behavior of two competing reinforcement learning agents in Super Mario Bros. In video games, PCG can be used to assist human game designers by generating a particular aspect of the game. A human game designer can use generated game content as inspiration to build further upon, which saves time and resources. Much research has been conducted on AI in video games, including AI for playing Super Mario Bros. Additionally, there exists a research field focused on PCG for video games, which includes generation of Super Mario Bros. levels. In this thesis, the two fields of research are combined to form a GAN-inspired system of two competing AI agents. One agent is controlling Mario, and this agent represents the discriminator. The other agent generates the level Mario is playing, and represents the generator. In an ordinary GAN system, the generator is attempting to mimic a database containing real data, while the discriminator attempts to distinguish real data samples from the generated data samples. The Mario agent utilizes a DQN algorithm for learning to navigate levels, while the level generator uses a DQN-based algorithm with different types of neural networks. The DQN algorithm utilizes neural networks to predict the expected future reward for each possible action. The expected future rewards are denoted as Q-values. The results show that the generator is capable of generating content better than random when the generator model takes a sequence of tiles as input and produces a sequence of predictions of Q-values as output.

https://hdl.handle.net/11250/2683046