M.Sc. Thesis, Adiole Promise Emeziem: Language-Grounded Robot Autonomy through Large Language Models and Multimodal Perception

Supervisor: Linus Nwankwo, M.Sc.;
Univ.-Prof. Dr Elmar Rückert
Start date: As soon as possible

Theoretical difficulty: mid
Practical difficulty: High

Abstract

The goal of this thesis is to enhance the method proposed in [1] to enable autonomous robots to effectively interpret open-ended language commands, plan actions, and adapt to dynamic environments.

The scope is limited to grounding the semantic understanding of large-scale pre-trained language and multimodal vision

language models to physical sensor data that enables autonomous agents to execute complex, long-horizon tasks without task-specific programming. The expected outcomes include a unified framework for language-driven autonomy, a method for cross-modal alignment, and real-world validation.

Tentative Work Plan

To achieve the objectives, the following concrete tasks will be focused on:

Backgrounds and Setup:
- Study LLM-for-robotics papers (e.g., ReLI [1], Code-as-Policies [2], ProgPrompt [3]), vision-language models (CLIP, LLaVA).
- Set up a ROS/Isaac Sim simulation environment and build a robot model (URDF) for the simulation (optional if you wish to use an existing one).
- Familiarise with how LLMs and VLMs can be grounded for short-horizon robotic tasks (e.g., “Move towards the {color} block near the {object}”), in static environments.
- Recommended programming tools: C++, Python, Matlab.
Modular Pipeline Design:
- Speech/Text (Task Instruction) ⇾ LLM (Task Planning) ⇾ CLIP (Object Grounding) ⇾ Motion Planner (e.g., move towards the {colour} block near the {object}) ⇾ Execution (In simulation or real-world environment).
Intermediate Presentation:
- Present the results of your background study or what you must have done so far.
- Detailed planning of the next steps.
Implementation & Real-World Testing (If Possible):
- Test the implemented pipeline with a Gazebo-simulated quadruped or differential drive robot.
- Perform real-world testing of the developed framework with our Unitree Go1 quadruped robot or with our Segway RMP 220 Lite robot.
- Analyse and compare the model’s performance in real-world scenarios versus simulations with the different LLMs and VLMs pipelines.
- Validate with 50+ language commands in both simulation and the real world.
Optimise the Pipeline for Optimal Performance and Efficiency (Optional):
- Validate the model to identify bottlenecks within the robot’s task environment.
Documentation and Thesis Writing:
- Document the entire process, methodologies, and tools used.
- Analyse and interpret the results.
- Draft the thesis, ensuring that the primary objectives are achieved.
  - Chapters: Introduction, Background (LLMs/VLMs in robotics), Methodology, Results, Conclusion.
- Deliverables: Code repository, simulation demo video, thesis document.
Research Paper Writing (optional)

References

[1] Nwankwo L, Ellensohn B, Özdenizci O, Rueckert E. ReLI: A Language-Agnostic Approach to Human-Robot Interaction. arXiv preprint arXiv:2505.01862. 2025 May 3.

[2] Liang J, Huang W, Xia F, Xu P, Hausman K, Ichter B, Florence P, Zeng A. Code as policies: Language model programs for embodied control. In2023 IEEE International Conference on Robotics and Automation (ICRA) 2023 May 29 (pp. 9493-9500). IEEE.

[3] Singh I, Blukis V, Mousavian A, Goyal A, Xu D, Tremblay J, Fox D, Thomason J, Garg A. Progprompt: Generating situated robot task plans using large language models. In2023 IEEE International Conference on Robotics and Automation (ICRA) 2023 May 29 (pp. 11523-11530). IEEE.

[4] Nwankwo L, Rueckert E. The Conversation is the Command: Interacting with Real-World Autonomous Robots Through Natural Language. In Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction 2024 Mar 11 (pp. 808-812).

Chair of Cyber-Physical-Systems

Montanuniversität Leoben, Austria

M.Sc. Thesis, Adiole Promise Emeziem: Language-Grounded Robot Autonomy through Large Language Models and Multimodal Perception

Abstract

Tentative Work Plan

References