M.Sc. Thesis, Adiole Promise Emeziem: Language-Grounded Robot Autonomy through Large Language Models and Multimodal Perception

Supervisor: Linus Nwankwo, M.Sc.;
Univ.-Prof. Dr Elmar Rückert
Start date: As soon as possible

Theoretical difficulty: mid
Practical difficulty: High

Abstract

The goal of this thesis is to enhance the method proposed in [1] to enable autonomous robots to effectively interpret open-ended language commands, plan actions, and adapt to dynamic environments.

The scope is limited to grounding the semantic understanding of large-scale pre-trained language and multimodal vision

language models to physical sensor data that enables autonomous agents to execute complex, long-horizon tasks without task-specific programming. The expected outcomes include a unified framework for language-driven autonomy, a method for cross-modal alignment, and real-world validation.

Tentative Work Plan

To achieve the objectives, the following concrete tasks will be focused on:

Backgrounds and Setup:
- Study LLM-for-robotics papers (e.g., ReLI [1], Code-as-Policies [2], ProgPrompt [3]), vision-language models (CLIP, LLaVA).
- Set up a ROS/Isaac Sim simulation environment and build a robot model (URDF) for the simulation (optional if you wish to use an existing one).
- Familiarise with how LLMs and VLMs can be grounded for short-horizon robotic tasks (e.g., “Move towards the {color} block near the {object}”), in static environments.
- Recommended programming tools: C++, Python, Matlab.
Modular Pipeline Design:
- Speech/Text (Task Instruction) ⇾ LLM (Task Planning) ⇾ CLIP (Object Grounding) ⇾ Motion Planner (e.g., move towards the {colour} block near the {object}) ⇾ Execution (In simulation or real-world environment).
Intermediate Presentation:
- Present the results of your background study or what you must have done so far.
- Detailed planning of the next steps.
Implementation & Real-World Testing (If Possible):
- Test the implemented pipeline with a Gazebo-simulated quadruped or differential drive robot.
- Perform real-world testing of the developed framework with our Unitree Go1 quadruped robot or with our Segway RMP 220 Lite robot.
- Analyse and compare the model’s performance in real-world scenarios versus simulations with the different LLMs and VLMs pipelines.
- Validate with 50+ language commands in both simulation and the real world.
Optimise the Pipeline for Optimal Performance and Efficiency (Optional):
- Validate the model to identify bottlenecks within the robot’s task environment.
Documentation and Thesis Writing:
- Document the entire process, methodologies, and tools used.
- Analyse and interpret the results.
- Draft the thesis, ensuring that the primary objectives are achieved.
  - Chapters: Introduction, Background (LLMs/VLMs in robotics), Methodology, Results, Conclusion.
- Deliverables: Code repository, simulation demo video, thesis document.
Research Paper Writing (optional)

References

[1] Nwankwo L, Ellensohn B, Özdenizci O, Rueckert E. ReLI: A Language-Agnostic Approach to Human-Robot Interaction. arXiv preprint arXiv:2505.01862. 2025 May 3.

[2] Liang J, Huang W, Xia F, Xu P, Hausman K, Ichter B, Florence P, Zeng A. Code as policies: Language model programs for embodied control. In2023 IEEE International Conference on Robotics and Automation (ICRA) 2023 May 29 (pp. 9493-9500). IEEE.

[3] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, Fox D, Thomason J, Garg A. Progprompt: Generating situated robot task plans using large language models. In2023 IEEE International Conference on Robotics and Automation (ICRA) 2023 May 29 (pp. 11523-11530). IEEE.

[4] Nwankwo L, Rueckert E. The Conversation is the Command: Interacting with Real-World Autonomous Robots Through Natural Language. In Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction 2024 Mar 11 (pp. 808-812).

BSc. Thesis, Weiyi Lin – Image augmentation and its impacts on reinforcement learning models

Supervisor: Vedant Dave, M.Sc.
Univ.-Prof. Dr Elmar Rückert
Start date: 3rd April 2025

Theoretical difficulty: mid
Practical difficulty: low

Abstract

Due to the tendency of reinforcement learning models to overfit to training data, data augmentation has become a widely adopted technique for visual reinforcement learning tasks for its capability of enhancing the performance and generalization of agents by increasing the diversity of training data. Often, different tasks benefit from different types of augmentations, and selecting them requires prior knowledge of the environment. This thesis aims to explore how various augmentation strategies can impact the performance and generalization of agents in visual environments, including visual augmentations and context-aware augmentations.

Tentative Work Plan

Literature research.
Understanding of concepts of visual RL models (SVEA).
Implementing and testing different augmentations.
Observation and documentation of results.
Thesis writing.

Related Work

[1] N. Hansen and X. Wang, “Generalization in Reinforcement Learning by Soft Data Augmentation,” 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 2021, pp. 13611-13617, doi: 10.1109/ICRA48506.2021.9561103

[2] Hansen, Nicklas, Hao Su, and Xiaolong Wang. “Stabilizing deep q-learning with convnets and vision transformers under data augmentation.” Advances in neural information processing systems 34 (2021): 3680-3693.

[3] Almuzairee, Abdulaziz, Nicklas Hansen, and Henrik I. Christensen. “A Recipe for Unbounded Data Augmentation in Visual Reinforcement Learning.” Reinforcement Learning Conference.

BSc. Thesis, Merisa Salkic – Smart conversations: Enhancing robotic task execution through advanced language models

Supervisor: Linus Nwankwo, M.Sc.;
Univ.-Prof. Dr Elmar Rückert
Start date: As soon as possible

Theoretical difficulty: mid
Practical difficulty: High

Abstract

In this thesis, we aim to enhance the method proposed in [1] for robust natural human-autonomous agent interaction through verbal and textual conversations.

The primary focus would be to develop a system that can enhance the natural language conversations, understand the

semantic context of the robot’s task environment, and abstract this information into actionable commands or queries. This will be achieved by leveraging the capabilities of pre-trained large language models (LLMs) – GPT-4, visual language models (VLMs) – CLIP, and audio language models (ALMs) – AudioLM.

Tentative Work Plan

To achieve the objectives, the following concrete tasks will be focused on:

Initialisation and Background:
- Study the concept of LLMs, VLMs, and ALMs.
- How LLMs, VLMs, and ALMs can be grounded for autonomous robotic tasks.
- Familiarise yourself with the methods at the project website – https://linusnep.github.io/MTCC-IRoNL/.
Setup and Familiarity with the Simulation Environment
- Build a robot model (URDF) for the simulation (optional if you wish to use the existing one).
- Set up the ROS framework for the simulation (Gazebo, Rviz).
- Recommended programming tools: C++, Python, Matlab.
Coding
- Improve the existing code of the method proposed in [1] to incorporate the aforementioned modalities—the code to be provided to the student.
- Integrate other LLMs e.g., LLaMA and VLMs e.g., GLIP modalities into the framework and compare their performance with the baseline (GPT-4 and CLIP).
Intermediate Presentation:
- Present the results of your background study or what you must have done so far.
- Detailed planning of the next steps.
Simulation & Real-World Testing (If Possible):
- Test your implemented model with a Gazebo-simulated quadruped or differential drive robot.
- Perform the real-world testing of the developed framework with our Unitree Go1 quadruped robot or with our Segway RMP 220 Lite robot.
- Analyse and compare the model’s performance in real-world scenarios versus simulations with the different LLMs and VLMs pipelines.
Optimize the Framework for Optimal Performance and Efficiency (Optional):
- Validate the model to identify bottlenecks within the robot’s task environment.
Documentation and Thesis Writing:
- Document the entire process, methodologies, and tools used.
- Analyse and interpret the results.
- Draft the project report or thesis, ensuring that the primary objectives are achieved.
Research Paper Writing (optional)

Related Work

[1] Linus Nwankwo and Elmar Rueckert. 2024. The Conversation is the Command: Interacting with Real-World Autonomous Robots Through Natural Language. In Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction (HRI ’24). Association for Computing Machinery, New York, NY, USA, 808–812. https://doi.org/10.1145/3610978.3640723.

[2] Nwankwo, L., & Rueckert, E. (2024). Multimodal Human-Autonomous Agents Interaction Using Pre-Trained Language and Visual Foundation Models. arXiv preprint arXiv:2403.12273.

B.Sc. Thesis: Reineke Peter on Deep Learning for Predicting Fluid Dynamics

Supervisor: Univ.-Prof. Dr Elmar Rückert

Project: K1-MET P3.4
Start date: 1st of May 2024

Theoretical difficulty: high
Practical difficulty: mid

Topic

The the steel production, the steel quality heavily depends on the dynamic processes of the meniscus level fluctuations in the mold. These complex dynamic processes can be observed using IR cameras observing the surface level and the casting powder temperature.

The goal of this thesis is to develop and compare deep learning approaches (CNNs, transformers) for predicting fluid dynamics in lab prototype environment.

Tasks

Literature research of state of the art, see references
Lab prototype environment for generating complex (structured and chaotic) fluid dynamics
Dataset recording, visualization and annotation
Deep Learning algorithm implementation (CNNs & Transformers)
Evaluation on different datasets (predictable dynamics, complex dynamics, synchronous and async. surface level dynamics, chaotic dynamics).
Thesis writing.

References

Enhancing computational fluid dynamics with machine learning R Vinuesa, SL Brunton, Nature Computational Science, 2022.
Current and emerging deep–learning methods for the simulation of fluid dynamics M Lino, S Fotiadis, AA Bharath… – Proceedings of the …, 2023.
Unsupervised deep learning of incompressible fluid dynamics, N Wandel, M Weinmann, R Klein, arXiv preprint arXiv:2006.08762, 2020.

B.Sc. Thesis – Philipp Zeni – Precision in Motion: ML-Enhanced Race Course Identification for Formula Student Racing

Supervisor: Linus Nwankwo, M.Sc.;
Univ.-Prof. Dr Elmar Rückert
Start date: 30th October 2023

Theoretical difficulty: mid
Practical difficulty: High

Abstract

This thesis explores machine learning techniques for analysing onboard recordings from the TU Graz Racing Team, a prominent Formula Student team. The main goal is to design and train an end-to-end machine learning model to autonomously discern race courses based on sensor observations.

Further, this thesis seeks to address the following research questions:

Can track markers (cones) be reliably detected and segmented from onboard recordings?
Does the delineated racing track provide an adequate level of accuracy to support autonomous driving, minimizing the risk of accidents?
How well does a neural network trained on simulated data adapt to real-world situations?
Can the neural network ensure real-time processing in high-speed scenarios surpassing 100 km/h?

Tentative Work Plan

To achieve the objectives, the following concrete tasks will be focused on:

Thesis initialisation and literature review:
- Define the scope and boundaries of your work.
- Study the existing project in [1] and [2] to identify gaps and methodologies.
Setup and familiarize with the simulation environment
- Build the car model (URDF) for the simulation (optional if you wish to use the existing one)
- Setup the ROS framework for the simulation (Gazebo, Rviz)
- Recommended programming tools: C++, Python, Matlab
Data acquisition and preprocessing (3D Lidar and RGB-D data)
- Collect onboard recordings and sensor data from the TU Graz Racing track.
- Augment the data with additional simulated recordings using ROS, if necessary.
- Preprocess and label the data for machine learning (ML). This includes segmenting tracks, markers, and other relevant features.
Intermediate presentation:
- Present the results of the literature study or what has been done so far
- Detailed planning of the next steps
ML Model Development:
- Design the initial neural network architecture.
- Train the model using the preprocessed data.
- Evaluate model performance using metrics like accuracy, precision, recall, etc.
- Iteratively refine the model based on the evaluation results.
Real-world Testing (If Possible):
- Implement the trained model on a real vehicle’s onboard computer.
- Test the vehicle in a controlled environment, ensuring safety measures are in place.
- Analyze and compare the model’s performance in real-world scenarios versus simulations.
Optimization for Speed and Efficiency (Optional):
- Validate the model to identify bottlenecks.
- Optimize the neural network for real-time performance, especially for high-speed scenarios
Documentation and B.Sc. thesis writing:
- Document the entire process, methodologies, and tools used.
- Analyze and interpret the results.
- Draft the thesis, ensuring that at least two of the research questions are addressed.
Research paper writing (optional)

Related Work

[1] Autonomous Racing Graz, “Enhanced localisation for autonomous racing with high-resolution lidar“, Article by Tom Grey, Visited 30.10.2023.

[2] Autonomous RC car racing ETH Zürich, “The ORCA (Optimal RC Racing) Project“, Article by Alex Liniger, Visited 30.10.2023.

[3] P. Cai, H. Wang, H. Huang, Y. Liu and M. Liu, “Vision-Based Autonomous Car Racing Using Deep Imitative Reinforcement Learning,” in IEEE Robotics and Automation Letters, vol. 6, no. 4, pp. 7262-7269, Oct. 2021, doi: 10.1109/LRA.2021.3097345.

[4] Z. Lu, C. Zhang, H. Zhang, Z. Wang, C. Huang and Y. Ji, “Deep Reinforcement Learning Based Autonomous Racing Car Control With Priori Knowledge,” 2021 China Automation Congress (CAC), Beijing, China, 2021, pp. 2241-2246, doi: 10.1109/CAC53003.2021.9728289.

[5] J. Kabzan, L. Hewing, A. Liniger and M. N. Zeilinger, “Learning-Based Model Predictive Control for Autonomous Racing,” in IEEE Robotics and Automation Letters, vol. 4, no. 4, pp. 3363-3370, Oct. 2019, doi: 10.1109/LRA.2019.2926677.

M.Sc. Thesis, Stefan Maintinger – Map-based and map-less mobile navigation via deep reinforcement learning in dynamic environments

Supervisor: Linus Nwankwo, M.Sc.;
Univ.-Prof. Dr Elmar Rückert
Start date: 5th September 2022

Theoretical difficulty: mid
Practical difficulty: mid

Abstract

For over 20 years today, the simultaneous localisation and mapping (SLAM) approach has been widely used to achieve autonomous navigation objectives. The SLAM problem is the problem of building a map of the environment while simultaneously estimating the robot’s position relative to the map given noisy sensor observations and a series of control data. Recently, the

mapless-based approach with deep reinforcement learning has been proposed. For this approach, the agent (robot) learns the navigation policy given only sensor data and a series of control data without a prior map of the task environment. In the context of this thesis, we will evaluate the performance of both approaches in a crowded dynamic environment using our differential drive open-source open-shuttle mobile robot.

Tentative Work Plan

To achieve our objective, the following concrete tasks will be focused on:

Literature research and a general understanding of the field
- mobile robotics and industrial use cases
- Overview of map-based autonomous navigation (SLAM & Path planning)
- Overview of mapless-based autonomous navigation approach with deep reinforcement learning
Setup and familiarize with the simulation environment
- Build the robot model (URDF) for the simulation (optional if you wish to use the existing one)
- Setup the ROS framework for the simulation (Gazebo, Rviz)
- Recommended programming tools: C++, Python, Matlab
Intermediate presentation:
- Presenting the results of the literature study
- Possibility to ask questions about the theoretical background
- Detailed planning of the next steps
Define key performance/quality metrics for evaluation:
- Time to reach the desired goal
- Average/mean speed
- Path smoothness
- Obstacle avoidance/distance to obstacles
- Computational requirement
- success rate
- e.t.c
Assessment and execution:
- Compare the results from both map-based and map-less approaches on the above-defined evaluation metrics.
Validation:
- Validate both approaches in a real-world scenario using our open-source open-shuttle mobile robot.
Furthermore, the following optional goals are planned:
- Develop a hybrid approach combining both the map-based and the map-less methods.
M.Sc. thesis writing
Research paper writing (optional)

Related Work

[1] Xue, Honghu; Hein, Benedikt; Bakr, Mohamed; Schildbach, Georg; Abel, Bengt; Rueckert, Elmar, “Using Deep Reinforcement Learning with Automatic Curriculum Learning for Mapless Navigation in Intralogistics“, In: Applied Sciences (MDPI), Special Issue on Intelligent Robotics, 2022.

[2] Han Hu; Kaicheng Zhang; Aaron Hao Tan; Michael Ruan; Christopher Agia; Goldie Nejat “Sim-to-Real Pipeline for Deep Reinforcement Learning for Autonomous Robot Navigation in Cluttered Rough Terrain”, IEEE Robotics and Automation Letters ( Volume: 6, Issue: 4, October 2021).

[3] Md. A. K. Niloy; Anika Shama; Ripon K. Chakrabortty; Michael J. Ryan; Faisal R. Badal; Z. Tasneem; Md H. Ahamed; S. I. Mo, “Critical Design and Control Issues of Indoor Autonomous Mobile Robots: A Review”, IEEE Access ( Volume: 9), February 2021.

[4] Ning Wang, Yabiao Wang, Yuming Zhao, Yong Wang and Zhigang Li , “Sim-to-Real: Mapless Navigation for USVs Using Deep Reinforcement Learning”, Journal of Marine Science and Engineering, 2022, 10, 895. https://doi.org/10.3390/jmse10070895

Master Thesis

The final master thesis document can be downloaded here.