B.Sc. Project: Lena Plage on Reinforcement Learning for tactile-based finger gaiting


Supervisors: Prof. Dr. Jan Peters, Daniel Tanneberg

Finished: 23.Dezember.2016


Improving robotic in-hand manipulation based on tactile feedback will open new possibilities to use robots. One basic ability needed in in-hand manipulation is finger gaiting. Finger gaiting is the change of a grasp while the object is kept in its orientation and position. In this thesis, the simulated robot will learn to change from a two finger grasp to a three finger grasp by using tactile feedback.
To learn the task the Parameter-exploring policy gradients (PGPE) algorithm is used. This reinforcement learning algorithm is based on policy gradient methods. It samples N rollouts with different parameters for the controller in every episode, before the policy is improved. Tactile feedback and joint positions are used for the reward function and the state representation.
The features will be generated as Random Fourier Features, which avoids hand-crafting them. Random Fourier Features use a kernel approximation where the state is projected to a one dimensional space and D of these random projections are concatenated for a higher accuracy of the kernel approximation. In this thesis, two methods to use this features in a linear controller will be compared. In one method the state representation is projected and multiplied with the parameters. In the other method, parameters will be projected as well and then be multiplied with the Random Features. It is shown, that direct parameter learning is problematic due to a reward race to the bottom caused by parameter oscillation and choosing too high actions. Projecting parameters vanishes the problem of too high actions because the projection limits the range of the actions. It will be shown, that using a high D and therefore a more accurate kernel approximation leads to better rewards. Additionally, the influence of the state representation and the number of histories used per episode in PGPE will be analyzed. Another advantage of projecting the parameters makes it possible to use a high value for D while only a comparable small number of parameters must be learned.


Leave a Reply

Your email address will not be published. Required fields are marked *