B.Sc. Thesis: Phillip Overlöpper on An exploration scheme based on the state-action novelty in continuos state-action space

Supervisors: Elmar Rückert, Honghu Xue

Finished: 11. November.2019

Abstract

Exploration in step-based reinforcement learning is a challenging and open problem. If it is applied in a continuous search space, the naive exploration strategy could result in an explored space which is only explored in the neighbourhood of an initial state, leaving a vast amount of entire space unexplored. Visiting states only once leads to poor performance, where the reinforcement learning algorithm gets stuck in a local minimum. This thesis presents a novel exploration scheme for continuous state-action space reinforcement learning, based on the novelty of state-action pairs, where the novelty is measured via the density of the compressed state-action pair. Furthermore, this thesis presents a method to interpolate the action to reach a smooth trajectory in a Markov Decision Process, which can be applied to any robot. The experiment was performed in the CoppeliaSim simulator on a robot with seven degrees of freedom. The results of the new approach show a more effective exploration than the baseline exploration.

Thesis

An exploration scheme based on the state-action novelty in continuos state- action space