Supervisors: Elmar Rückert, Prof. Dr. Jan Peters
We propose and investigate a novel type of parameterized policy based on a two-layered stochastic spiking neural network consisting of multiple populations of stochastic spiking neurons. We show that the proposed policy type is a spatially distributed generalization of a discrete basic policy with lookup table parameterization. Our policy reveals remarkable capabilities but also crucial limitations. In particular, our policy is able to deal with high-dimensional state and action spaces but loses expressive power such that it cannot represent certain functions like XOR.
Furthermore, we propose corresponding reinforcement learning methods to train the policy. These methods are based on value function methods and generalize these to train our distributed policy type. We compare these methods to state-ofthe-art methods including black-box approaches and likelihood ratio approaches. It turns out that our proposed methods outperform these methods significantly. In our experiments we demonstrate that our policy can be trained effectively from rewards to guide a 10-link robot-arm in a toy task through a grid world to reach a specified target without hitting obstacles.