M.Sc. Thesis: Rui Song on Solving Visual Navigation Tasks for Pedestrian Trajectory Generation Using Distributional Reinforcement Learning and Automatic Curriculum Learning in CARLA

Supervisors: Honghu Xue, Elmar Rückert

Finished: 22.April 2022


In this thesis, we propose an approach that combines reinforcement learning and automatic curriculum learning to solve a visual navigation task. A pedestrian agent is expected to learn a policy from scratch in a street-crossing scenario in a realistic traffic simulator CARLA. For this, the pedestrian is restricted to its first-person perspective as sensory input. The pedestrian cannot obtain full knowledge of the environment, which raises a partial observability challenge. To achieve this, an improved version of the Distributional Soft Actor-Critic algorithm is implemented. The algorithm adopts a newly proposed 3D dilated convolutional architecture to deal with the partial observability problem. To further improve its performance, we develop an automatic curriculum learning algorithm called NavACL+ on top of NavACL. As suggested in the results and ablation studies, our approach outperforms the original NavACL by 23.1%. Additionally, the convergence speed of NavACL+ is also observed to be 37.5% quicker. Moverover, the validation results show that the trained policies of NavACL+ are much more generalizable and robust than other variants in terms of different initial starting poses. NavACL+ policies perform 28.3% better than other policies training from a fixed start.


Deep Reinforcement Learning for Navigation in Warehouses

Deep Reinforcement Learning (DRL) has demonstrated great success in learning single or multiple tasks from scratch. Various DRL algorithms have been proposed and were applied in a broad class of tasks including chess and video games, robot navigation or robot manipulation. 

In this work, we investigate the potential of DRL in a mapless navigation task within a warehouse. The challenges of the task are the partial observability of the space and the need of effective exploration strategies for fast learning of navigation strategies. 

We trained a mobile robot (the agent) from scratch and compared how different sensor observations could influence the navigation performance. The evaluated sensors are a 360-degree Lidar sensor, only depth image and only RGB image. For Lidar and RGB inputs, we evaluated partial and full observability of the state space. We successfully trained the agent to navigate to a goal with a reward setting that is also applicable to the real world.

Currently, we are extending the work to multi-modal sensor inputs with both Lidar and RGB inputs (RGB image only frontal view) and incorporated self-curriculum learning on a more challenging navigation task in a warehouse and obtained promising initial outcomes.

The video shows learned navigation strategies using a single RGB-D camera mounted at the front of the robot.  The results were obtained after 85.000 interactions (single action excution, e.g. wheel velocity commands). 

This video shows the learnded strategy after 140.000 interactions with the environment.