1

M.Sc. Thesis: David Sharma on Adaptive Training Strategies for Brain-Computer-Interfaces

Supervisors: Elmar Rueckert, Prof. Dr. Jan Peters, Dr. Ing. Moritz Grosse-Wentrup

Finished: 26.Januar.2017

Abstract

A problem of todays brain computer interface (BCI) systems is that performance in controlling a BCI can decrease rapidly over time. This is due to the non-stationarity of recorded electroencephalography (EEG) signals. Furthermore, the motivation of the subject can drop if the subject does not experience any success in controlling a BCI. A possible solution to these problems is to provide the subject with continuous feedback and to train a reinforcement learning (RL) agent on the task in order to support the subject in solving that task. A selection policy (implemented through a Monte-Carlo sampling process) selects either the command generated by the subject or by the RL agent. Especially in the beginning, the RL agent controls the actions of the task most of the time. As the experiment proceeds, the impact of the agent decreases and the subject gets more own control over the actions. The subject is not aware of the RL agent. To measure the performance of subjects, we implemented a scoring system, which rewards (positive or negative) the subject for its current performance, i.e., how good the subject solves the task.
We implemented a game, where the subject needs to control a game figure with the imagination of limb movements to jump over approaching obstacles. In our experiments, we collected data from 20 subjects. The evaluation of the gathered results, show a positive trend that subjects which trained with the reinforcement learning agent have a higher performance than subjects that did not train with the reinforcement learning agent. We also wanted to test if the subjects were able to adapt to new environments after the training. We first trained a classifier on the data from the training phase and used this classifier to decode new incoming EEG signals. We confronted the subjects to new obstacles. Unfortunately, performance of the subjects and the classifier were bad, such that we could not verify that the subjects were able to adapt to new environments.

Thesis

Adapting Brain Signals With Reinforcement Learning Strategies for Brain Computer Interfaces




B.Sc. Project: Lena Plage on Reinforcement Learning for tactile-based finger gaiting

Supervisors: Prof. Dr. Jan Peters, Daniel Tanneberg

Finished: 23.Dezember.2016

Abstract

Improving robotic in-hand manipulation based on tactile feedback will open new possibilities to use robots. One basic ability needed in in-hand manipulation is finger gaiting. Finger gaiting is the change of a grasp while the object is kept in its orientation and position. In this thesis, the simulated robot will learn to change from a two finger grasp to a three finger grasp by using tactile feedback.
To learn the task the Parameter-exploring policy gradients (PGPE) algorithm is used. This reinforcement learning algorithm is based on policy gradient methods. It samples N rollouts with different parameters for the controller in every episode, before the policy is improved. Tactile feedback and joint positions are used for the reward function and the state representation.
The features will be generated as Random Fourier Features, which avoids hand-crafting them. Random Fourier Features use a kernel approximation where the state is projected to a one dimensional space and D of these random projections are concatenated for a higher accuracy of the kernel approximation. In this thesis, two methods to use this features in a linear controller will be compared. In one method the state representation is projected and multiplied with the parameters. In the other method, parameters will be projected as well and then be multiplied with the Random Features. It is shown, that direct parameter learning is problematic due to a reward race to the bottom caused by parameter oscillation and choosing too high actions. Projecting parameters vanishes the problem of too high actions because the projection limits the range of the actions. It will be shown, that using a high D and therefore a more accurate kernel approximation leads to better rewards. Additionally, the influence of the state representation and the number of histories used per episode in PGPE will be analyzed. Another advantage of projecting the parameters makes it possible to use a high value for D while only a comparable small number of parameters must be learned.

Thesis

Reinforcement Learning for tactile-based finger gaiting




M.Sc. Project: Mike Smyk on Model-based Control and Planing for Real Robots

Supervisors: Elmar Rueckert, Prof. Dr. Jan Peters

Finished: Juli.2016

Abstract

For controlling high-dimensional robots, most stochastic optimal control algorithms use approximations of the system dynamics and of the cost function (e.g., using linearizations and Taylor expansions). These approximations are typically only locally correct, which might cause instabilities in the greedy policy updates, lead to oscillations or the algorithms diverge. To overcome these drawbacks, we add a regularization term to the cost function that punishes large policy update steps in the trajectory optimization procedure. We applied this concept to the Approximate Inference Control method (AICO), where the resulting algorithm guarantees convergence for uninformative initial solutions without complex handtuning of learning rates. We evaluated our new algorithm on two simulated robotic platforms. A robot arm with five joints was used for reaching multiple targets while keeping the roll angle constant. On the humanoid robot Nao, we show how complex skills like reaching and balancing can be inferred from desired center of gravity or end effector coordinates. The results of the Nao are also evaluated on the real robotic platform, whereas most of the stochastic optimal control methods are only evaluated in simulations.

Thesis

Stochastic Optimal Control on a Real Humanoid Robot




M.Sc. Thesis: Svenja Stark on Learning Probabilistic Feedforward and Feedback Policies for Stable Walking

Supervisors: Elmar Rueckert, Prof. Dr. Jan Peters

Finished: 13.Januar.2016

Abstract

The compliant quadruped robot Oncilla is used as platform to explore the benefits of the interaction between a feedforward gait and a simultaneous active stabilizing feedback controller. The chosen approach is motivated by findings in biology as well as by the advantages of modern stochastic methods. In this thesis, we present a balancing controller, a simple feedforward gait and first results of a system combining both components. The basic components can be modified in further research.

The developed balancing controller is based on a common criteria for static stability, the current center of pressure (CoP) of the Oncilla. It is calculated from force data obtained from mounted OptoForce sensors and the endeffector positions calculated by a simplified forward kinematic model. Locally weighted regression is used to calculate motor commands that bring the Oncilla’s CoP closer to a desired one.

The static walking gait is based on analyses of the walking behavior of four-legged animals. The trajectory for each leg is further carefully hand-tuned and parameterized as a combination of sine curves. In addition, rhythmic movement primitives modulating the handcrafted gait have been applied.

Finally, we explored how the final gait is influenced by combining the forward gait and the balancing controller. The tuning parameter was the amount of applied feedback, which determines how much the Oncilla relies on the gait or on the feedback. In the current setup, the balancing controller could not visibly improve the feedforward gait.

This thesis provides first results towards a versatile platform allowing further experiments on the benefits of feedback for gait learning.

 

Thesis

Learning Probabilistic Feedforward and Feedback Policies for Stable Walking




M.Sc. Thesis: Jan Kohlschütter on Learning Probabilistic Classifiers from Electromyography Data for Predicting Knee Abnormalities

Supervisors: Elmar Rueckert, Prof. Dr. Jan Peters

Finished: 4.Januar.2016

Abstract

Identifying movement abnormalities from raw Electromyography (EMG) data requires three steps that are the data pre-processing, the feature extraction and training a classifier. As EMG data shows large variation (even for consecutive trials in a single subject) probabilistic classifiers like naive Bayes or probabilistic support vector machines have been proposed. The used feature representations (e.g., PCA, NMF, wavelet transformation) however, can not capture the variation. Here, we propose a fully Bayesian approach where both, the features and the classifier, are probabilistic models. The generative model reproduces the observed variance in the EMG data, provides an estimate of the reliability of the predictions and can be applied in combination with dimensionality reduction techniques such as PCA and NMF. We found the optimal number of components and Gaussians for each model and tuned their metaparameters. Besides the the focus on the four EMG channels, we tested the knee angle alone and EMG channels with the knee angle. We found that these probabilistic extensions outperforms classical approaches in terms of the prediction of knee abnormalities from few samples. We also show that the robustness against noise of the proposed probabilist model is superior than classical methods.

Thesis

Learning Probabilistic Classifiers from Electromyography Data for Predicting Knee Abnormaities




M.Sc. Thesis: Daniel Tanneberg on Spiking Neural Networks Solve Robot Planning Problems

Supervisors: Elmar Rueckert, Prof. Dr. Jan Peters

Finished: 21.September.2015

Abstract

We propose here a novel approach to solve robot planning problems based on spiking neural network models. The method is motivated by recent neuroscience findings on how rodents create mental plans for maze navigation and is grounded in the framework of planning as probabilistic inference. In this thesis, we demonstrate that the proposed spiking neural network is a suitable alternative to classical approaches and comes with interesting features.
Neural networks can be used in massive parallel computing, e.g., when implemented in neuromorphic hardware. These brain-like chips consist of thousands of memory and processing units operating in parallel. However, we are lacking suitable learning rules and algorithms. The developments in this thesis provide first testable algorithms for real-world robot planning applications.
Arbitrary complex functions can be learned such as dynamic or kinematic models. For that, a spike dependent version of contrastive divergence was derived to learn non-linear functions with kinesthetic teaching.
We show that these models can scale to a six-dimensional KUKA robot system, where in addition to an existing two-dimensional task space planning model two additional models were developed. One of these models can be queried in both directions, enabling that forward and inverse models can be learned at the same time.
Obstacles of arbitrary shape can be encoded in form of repelling forces through synaptic inhibition. Sampling of movement plans is done 4 − 60 times faster than real-time, which allows for foraging robot control, preparing multiple alternative solutions and deciding online which plan to execute. With the additionally implemented online rejection sampling, we could achieve target reaching errors of 4% in the modelled operational area. Furthermore, the generated movement trajectories did not require any post processing. Using bidirectional feedback between task and joint space during planning, smooth and goal-directed movements were computed at the same time.

Thesis

Spiking Neural Networks Solve Robot Planning Problems




M.Sc. Thesis: Max Mindt on Probabilistic Inference for Movement Planning in Humanoids

Supervisors: Elmar Rueckert, Prof. Dr. Jan Peters

Finished: 4.November.2014

Abstract

For controlling robots with many actuators, most stochastic optimal control algorithms use approximations of the system dynamics and of the cost function (e.g., using linearizations and Taylor expansions). These approximations are typically only locally correct, which might cause instabilities in the greedy policy updates, lead to oscillations or the algorithms diverge. To overcome these drawbacks, we add a regularization term to the cost function that punishes large policy update steps in the trajectory optimization procedure. In the first part of this thesis, we applied this concept to the Approximate Inference Control method (AICO), where the resulting algorithm guarantees convergence for uninformative initial solutions without complex hand-tuning of learning rates. We evaluated our new algorithm on two simulated robotic platforms. A robot arm with five joints was used for reaching multiple targets while keeping the roll angle constant. On the humanoid robot Nao, we showed how complex skills like reaching and balancing can be inferred from desired center of gravity or end effector coordinates.
In these tasks we assumed a known forward dynamic model. Typically, inaccurate model predictions have catastrophic effects on the numerical stability of SOC methods. In particular, if the model predictions are poor, the SOC method should not further explore but collect more data around the current trajectory. Therefore, we investigated in the second part of this thesis how to learn such a forward dynamics model with Gaussian processes in parallel to movement planning. The trade off between exploration and exploitation can be regularized with the model uncertainty, which was introduced as an additional objective in AICO. We evaluated the simultaneous model learning and movement planning approach on a simple pendulum toy task. We achieved safe planning and stable convergence even with inaccurate learned models. The planned trajectories, torques and end effector positions converges to local optimal solutions during the learning process. The model prediction error of the forward dynamics model converges to zero.

Thesis

Probabilistic Inference for Movement Planning in Humanoids




M.Sc. Thesis: Jan Mundo on Extracting Low-Dimensional Control Variables for Movement Primitives

Supervisors: Elmar Rueckert, Prof. Dr. Jan Peters, Prof. Dr. Gerhard Neumann

Finished: 29. Oktober.2014

Abstract

In robotics we often want to solve a multitude of different, but related tasks. Movement primitives (MPs) provide a powerful framework for data driven movement generation that has been successfully applied for learning from demonstrations and robot reinforcement learning. As the parameters of the primitives are typically high dimensional, a common practice for the generalization of movement primitives to new tasks is to adapt only a small set of control variables, also called meta parameters, of the primitive. Yet, for most MP representations, the encoding of these control variables is pre-coded in the representation and can not be adapted to the considered tasks. In this thesis, we want to learn the encoding of task-specific control variables also from data instead of relying on fixed meta-parameter representations. We use hierarchical Bayesian models (HBMs) to estimate a low dimensional latent variable model for probabilistic movement primitives (ProMPs), which is a recent movement primitive representation. We show on two real robot datasets that ProMPs based on HBMs outperform standard ProMPs in terms of generalization and learning from a small amount of data and also allows for an intuitive analysis of the movement. We also extend our HBM to a mixture model, such that we can model different movement types in the same dataset.

Thesis

Extracting Low-Dimensional Control Variables for Movement Primitives




M.Sc. Project: Gerhard Kniewasser on Reinforcement Learning with Dynamic Movement Primitives – DMPs

Supervisors: Elmar Rueckert, Univ.-Prof.Dr. Wolfgang Maass

Finished: May, 2013

Abstract

In this project we set up the AMARSi Oncilla Simulator1 and used Dynamic movement primitivies (DMPs) as movement representation and optimized their parameters in a reinforcement learning framework to adapt the robot’s behaviour to new problems. After some experiments on toy examples we applied an open-loop control scheme to the Oncilla Simulator. In the end we want to apply this approach to a real robot, the AMARSi2 Oncilla quadroped and evaluate its performance.

Paper

Reinforcement Learning with Dynamic Movement Primitives – DMPs




M.Sc. Thesis: Oliver Prevenhueber on Motor Planning with Monte Carlo Sampling Methods

Supervisors: Elmar Rueckert, Univ.-Prof.Dr. Wolfgang Maass

Finished: May, 2013

Abstract

Motor planning algorithms are essential for the development of robust autonomous robot systems. Various approaches exist to compute movement trajectories efficiently by applying quadratic control costs. However, with quadratic costs hard constraints cannot be adequately modelled. In this thesis I choose the Monte Carlo (MC) sampling approach to investigate how dynamic motor planning tasks, considering hard constraints can be solved efficiently. For efficient sampling, Gibbs sampling, rejection sampling, and importance sampling are combined. Two different sampling methods are investigated. The first and simpler method does not consider the dynamic state transition model of a robot. The second method is more sophisticated and considers a linearised approximation of this dynamic model. The experiments range from simple tasks on a 2-link robot arm to tasks using a more complex 4-link robot arm. To enhance the performance of the investigated methods, they are extended by a via point approach. Finally, in a novel trajectory mixing approach complex planning scenarios are solved by mixing multiple trajectories, which are computed in parallel.

Thesis

Motor Planning with Monte Carlo Sampling Methods