The lecture Reinforcement Learning belongs to the Module Robot Learning (RO4100).
In the winter semester, Prof. Dr. Elmar Rueckert is teaching the course Probabilistic Machine Learning – PML (RO5101 T).
In the summer semester, Prof. Dr. Elmar Rueckert is teaching the course Reinforcement Learning – RL (RO4100 T).
- Students will receive a single grade for the Module Robot Learning (RO4100) based on the average grade of PML and RL (rounded down in favor of the students).
- This course is organized through online lectures and exercises. Details to the organizations will be discussed in our
FIRST MEETING: 17.04.2020 12:15-13:45 WEBEX Slides
using the WEBEX tool. Please follow the instructions of the ITSC here to setup your computer. Click on the links to create a google calendar event, joint the WEBEX meeting or to access the online slides.
Dates & Times of the Online Webex Meetings
- Lectures are organized on FRIDAYS, 12:15-13:45, WEBEX Link
- Exercises are organized on THURSDAYS, 09:15-10:00, WEBEX Link
- Introduction to Robotics and Reinforcement Learning (Refresher on Robotics, kinematics, model learning and learning feedback control strategies).
- Foundations of Decision Making (Reward Hypothesis, Markov Property, Markov Reward Process, Value Iteration, Markov Decision Process, Policy Iteration, Bellman Equation, Link to Optimal Control).
- Principles of Reinforcement Learning (Exploration and Exploitation strategies, On & Off-policy learning, model-free and model-based policy learning, Algorithmic principles: Q-Learning, SARSA, (Multi-step) TD-Learning, Eligibility Traces).
- Deep Reinforcement Learning (Introduction to Deep Networks, Stochastic Gradient Descent, Function Approximation, Fitted Q-Iteration, (Double) Deep Q-Learning, Policy-Gradient approaches, Recent research results in Stochastic Deep Neural Networks).
The learning objectives / qualifications are
- Students get a comprehensive understanding of basic decision making theories, assumptions and methods.
- Students learn to analyze the challenges in a reinforcement learning application and to identify promising learning approaches.
- Students will understand the difference between deterministic and probabilistic policies and can define underlying assumptions and requirements for learning them.
- Students understand and can apply advanced policy gradient methods to real world problems.
- Students know how to analyze the learning results and improve the policy learner parameters.
- Students understand how the basic concepts are used in current state of the art research in robot reinforcement learning and in deep neural networks.
Follow this link to register for the course: https://moodle.uni-luebeck.de
Basic knowledge in Machine Learning and Neural Networks is required. It is highly recommended to attend any of (but not restricted to) the following courses Probabilistic Machine Learning (RO 5101 T), Artificial Intelligence II (CS 5204 T), Machine Learning (CS 5450), Medical Deep Learning (CS 4374) prior to attending this course. The students will also experiment with state-of-the-art Reinforcement Learning (RL) methods on benchmark RL simulator (OpenAI Gym, Pybullet), which requires strong Python programming skills and knowledge on Pytorch is preferred. All assignment related materials have been tested on a windows machine (Win10 platform).
Course dates & materials
|Dates & Times||Topics||Links|
|10.04.2020 12:15-13:45||---||Public Holiday, no lecture.|
|16.04.2020 09:15-10:00||---||no exercise, as the lecture has not started!|
|17.04.2020 12:15-13:45||VO||An Introduction to Robotics and Reinforcement Learning||Slides, VodCast|
|23.04.2020 09:15-10:00||UE||Assignment I Presentation: Tutorial on environmental setup OpenAI|
|24.04.2020 12:15-13:45||VO||Foundations on Decision Making||Slides, VodCast|
|30.04.2020 09:15-10:00||UE||Q & A session|
|01.05.2020 12:15-13:45||---||Public Holiday, no lecture.|
|07.05.2020 09:15-10:00||UE||Q & A session|
|08.05.2020 12:15-13:45||VO||Policy Iteration, Bellman Equation, Link to Optimal Control||Slides, VodCast|
|14.05.2020 09:15-10:00||UE||Assignment II Presentation|
|21.05.2020 09:15-10:00||---||Public Holiday, no exercise, shifted to 22.05|
|22.05.2020 12:15-13:45||UE||Q & A session|
|28.05.2020 09:15-10:00||UE||Assignment III Presentation|
|04.06.2020 09:15-10:00||UE||Q & A session|
|05.06.2020 12:15 - 13:45||--|
|11.06.2020 09:15-10:00||UE||Assignment IV Presentation|
|12.06.2020 12:15 - 13:45||--|
|18.06.2020 09:15-10:00||UE||Q & A session|
|19.06.2020 12:15 - 13:45||VO||Policy Search Methods and Contextual Policy Search||Slides, VodCast|
|25.06.2020 09:15-10:00||UE||Assignment V Presentation|
|26.06.2020 12:15 - 13:45||UE||Assignment V presentation (cont), Assignment IV solution presentation|
|02.07.2020 09:15-10:00||UE||Q & A session|
|03.07.2020 12:15 - 13:45||VO||Summary, Advanced Topics and Bonus Point Quizz|
|09.07.2020 09:15-10:00||UE||Q & A session|
|10.07.2020 12:15-13:45||UE||Q & A session|
|17.07.2020 12:15-13:45||UE||Assignment V solution presentation|
The course grades will be computed solely from submitted student reports of six assignments. The reports and the code have to be submitted (one report per team) to firstname.lastname@example.org. Please note the list of dates and deadlines below. Each assignment has minimally two-week deadline, some of them are of longer duration.
Please use Latex for writing your report. A Latex template for this course can be found here: https://drive.google.com/open?id=186A7UvnMgnzh_EROuORemjymLOq7Eixz
|Presentation Date||Topics||Points||BPs||Submission Deadline||Links|
|23.04.2020 09:15-10:00||Assignment I Presentation||20||14.05.2020 09:00||PDF, ZIP|
|14.05.2020 09:15-10:00||Assignment II Presentation||20||28.05.2020 09:00||PDF, ZIP|
|28.05.2020 09:15-10:00||Assignment III Presentation||15||5||11.06.2020 09:00||PDF, ZIP|
|11.06.2020 09:15-10:00||Assignment IV Presentation||20||10||25.06.2020 09:00||PDF, ZIP|
|25.06.2020 09:15-10:00||Assignment V Presentation||25||5||16.07.2020 09:00||PDF, ZIP|
tudents can get Bonus Points (BP) during the lectures when all quiz questions are correctly answered (1 BP per lecture). In the assignments, BPs will be given to the students when optional (and often also challenging) tasks are implemented and discussed.
Points to Grades
|95||1.0||Best possible grade|
|0||5.0||Worst possible grade|
Materials for the Exercise
The course is accompanied by pieces of course work on policy search for discrete state and action spaces (grid world example), policy learning in continuous spaces using function approximations and policy gradient methods in challenging simulated robotic tasks. The theoretical assignment questions are based on the lecture and also on the first three literature sources listed above. It is strongly recommended to read (or watch) these material in parallel to attending lecture. The assignments will include both written tasks and algorithmic implementations in Python. The tasks will be presented during the exercise sessions. As simulation environment, the OpenAI Gym platform will be used in the project works.
- Richard S. Sutton, Andrew Barto: Reinforcement Learning: An Introduction second edition. The MIT Press Cambridge, Massachusetts London, England, 2018. Link to the online book (PDF)
- David Silver’s Reinforcement Learning online lecture series. Link to the online video and script
- Sergey Levine’s Deep Reinforcement Learning online lecture series. Link to the online video, Link to the script
- Csaba Szepesvri: Algorithms for Reinforcement Learning. Morgan & Claypool in July 2010.
- B. Siciliano, L. Sciavicco: Robotics: Modelling,Planning and Control, Springer, 2009.
- Puterman, Martin L. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, 2014.
- Szepesvari, Csaba. Algorithms for reinforcement learning (synthesis lectures on artificial intelligence and machine learning). Morgan and Claypool (2010).