image_pdfimage_print

Open Project, MSc. or BSc. Thesis – Multimodal Human-Autonomous Agents Interaction Using Pre-trained Language and Visual Foundation Models

Supervisor: Linus Nwankwo, M.Sc.;
Univ.-Prof. Dr Elmar Rückert
Start date:  As soon as possible

 

Theoretical difficulty: mid
Practical difficulty: High

Abstract

In this project or thesis, we aim to enhance the method proposed in [1] for robust natural human-autonomous agent interaction through verbal and textual conversations. 

The primary focus would be to develop a system that can enhance the natural language conversations, understand the 

semantic  context of the robot’s task environment, and abstract this information into actionable commands or queries. This will be achieved by leveraging the capabilities of pre-trained large language models (LLMs) – GPT-4, visual language models (VLMs) – CLIP, and audio language models (ALMs) – AudioLM.

Tentative Work Plan

To achieve the objectives, the following concrete tasks will be focused on:

  • Initialisation and Background:
    • Study the concept of LLMs, VLMs, and ALMs.
    • How LLMs, VLMs, and ALMs can be grounded for autonomous robotic tasks.
    • Familiarise yourself with the methods at the project website – https://linusnep.github.io/MTCC-IRoNL/.
    •  
  • Setup and Familiarity with the Simulation Environment
    • Build a robot model (URDF) for the simulation (optional if you wish to use the existing one).
    • Set up the ROS framework for the simulation (Gazebo, Rviz).
    • Recommended programming tools: C++, Python, Matlab.
    •  
  • Coding
    • Improve the existing code of the method proposed in [1] to incorporate the aforementioned modalities—the code to be provided to the student.
    • Integrate other LLMs e.g., LLaMA and VLMs e.g., GLIP modalities into the framework and compare their performance with the baseline (GPT-4 and CLIP).
    •  
  • Intermediate Presentation:
    • Present the results of your background study or what you must have done so far.
    • Detailed planning of the next steps.
    •  
  • Simulation & Real-World Testing (If Possible):
    • Test your implemented model with a Gazebo-simulated quadruped or differential drive robot.
    • Perform the real-world testing of the developed framework with our Unitree Go1 quadruped robot or with our Segway RMP 220 Lite robot.
    • Analyse and compare the model’s performance in real-world scenarios versus simulations with the different LLMs and VLMs pipelines.
    •  
  • Optimize the Framework for Optimal Performance and Efficiency (Optional):
    • Validate the model to identify bottlenecks within the robot’s task environment.
    •  
  • Documentation and Thesis Writing:
    • Document the entire process, methodologies, and tools used.
    • Analyse and interpret the results.
    • Draft the project report or thesis, ensuring that the primary objectives are achieved.
    •  
  • Research Paper Writing (optional)
    •  

Related Work

[1]  Linus Nwankwo and Elmar Rueckert. 2024. The Conversation is the Command: Interacting with Real-World Autonomous Robots Through Natural Language. In Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction (HRI ’24). Association for Computing Machinery, New York, NY, USA, 808–812. https://doi.org/10.1145/3610978.3640723.

[2]  Nwankwo, L., & Rueckert, E. (2024). Multimodal Human-Autonomous Agents Interaction Using Pre-Trained Language and Visual Foundation ModelsarXiv preprint arXiv:2403.12273.

3D perception and SLAM using geometric and semantic information for mine inspection with quadruped robot

Supervisor: Linus Nwankwo, M.Sc.;
Univ.-Prof. Dr Elmar Rückert
Start date: As soon as possible

 

Theoretical difficulty: mid
Practical difficulty: high

Abstract

Unlike the traditional mine inspection approach, which is inefficient in terms of time, terrain, and coverage, this project/thesis aims to investigate novel 3D perception and SLAM using geometric and semantic information for real-time mine inspection.

We propose to develop a SLAM approach that takes into account the terrain of the mining site and the sensor characteristics to ensure complete coverage of the environment while minimizing traversal time.

Tentative Work Plan

To achieve our objective, the following concrete tasks will be focused on:

  • Study the concept of 3D perception and SLAM for mine inspection, as well as algorithm development, system integration and real-world demonstration using Unitree Go1 quadrupedal robot.

  • Setup and familiarize with the simulation environment:
    • Build the robot model (URDF) for the simulation (optional if you wish to use the existing one)
    • Setup the ROS framework for the simulation (Gazebo, Rviz)
    • Recommended programming tools: C++, Python, Matlab
  • Develop a novel SLAM system for the quadrupedal robot to navigate, map and interact with challenging real-world environments:
    • 2D/3D mapping in complex indoor/outdoor environments

    • Localization using either Monte Carlo or extended Kalman filter

    • Complete coverage path-planning

  • Intermediate presentation:
    • Presenting the results of the literature study
    • Possibility to ask questions about the theoretical background
    • Detailed planning of the next steps
  • Implementation:

    • Simulate the achieved results in a virtual environment (Gazebo, Rviz, etc.)

    • Real-time testing on Unitree Go1 quadrupedal robot.

  • Evaluate the performance in various challenging real-world environments, including outdoor terrains, urban environments, and indoor environments with complex structures.
  • M.Sc. thesis or research paper writing (optional)

Related Work

[1]  Wolfram Burgard, Cyrill Stachniss, Kai Arras, and Maren Bennewitz , ‘SLAM: Simultaneous
Localization and Mapping’,  http://ais.informatik.uni-freiburg.de/teaching/ss12/robotics/slides/12-slam.pdf

[2]  V.Barrile, G. Candela, A. Fotia, ‘Point cloud segmentation using image processing techniques for structural analysis’, The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-2/W11, 2019 

[3]  Łukasz Sobczak , Katarzyna Filus , Adam Domanski and Joanna Domanska, ‘LiDAR Point Cloud Generation for SLAM Algorithm Evaluation’, Sensors 2021, 21, 3313. https://doi.org/10.3390/ s21103313.

Self-Supervised Learning Techniques for Improving Unsupervised Representation Learning [M.Sc. Thesis/Int. CPS project]

Abstract

The need for efficient and compact representations of sensory data such as visual and textual has grown significantly due to the exponential growth in the size and complexity of the data. Self-supervised learning techniques, such as autoencoders, contrastive learning, and transformer, have shown significant promise in learning such representations from large unlabeled datasets. This research aims to develop novel self-supervised learning techniques inspired by these approaches to improve the quality and efficiency of unsupervised representation learning.

Description

The study will begin by reviewing the state-of-the-art self-supervised learning techniques and their applications in various domains, including computer vision and natural language processing. Next, a set of experiments will be conducted to develop and evaluate the proposed techniques on standard datasets in these domains.

The experiments will focus on learning compact and efficient representations of sensory data using autoencoder-based techniques, contrastive learning, and transformer-based approaches. The performance of the proposed techniques will be evaluated based on their ability to improve the accuracy and efficiency of unsupervised representation learning tasks.

The research will also investigate the impact of different factors such as the choice of loss functions, model architecture, and hyperparameters on the performance of the proposed techniques. The insights gained from this study will help in developing guidelines for selecting appropriate self-supervised learning techniques for efficient and compact representation learning.

Overall, this research will contribute to the development of novel self-supervised learning techniques for efficient and compact representation learning of sensory data. The proposed techniques will have potential applications in various domains, including computer vision, natural language processing, and other sensory data analysis tasks.

Qualifications

  • Currently pursuing a Bachelor’s or Master’s degree in Computer Science,
    Electrical Engineering, Mechanical Engineering, Mathematics, or related
    fields.
  • Strong programming skills in Python
  • Experience with deep learning frameworks such as PyTorch or TensorFlow.
  • Good written and verbal communication skills in English.
  • (optional) Familiarity with unsupervised learning techniques such as contrastive learning, self-supervised learning, and generative models

Interested?

If this topic excites you you, please contact Fotios Lygerakis by email at fotios.lygerakis@unileoben.ac.at or simple visit us at our chair in the Metallurgie building, 1st floor.

Sign Language Robot Hand [M.Sc. Thesis/Int. CPS Project]

Abstract

Human-Robot Interaction using Sign Language is a project that aims to revolutionize the way we communicate with machines. With the increasing use of robots in our daily lives, it is important to create a more natural and intuitive way for humans to communicate with them.

Sign language is a unique and powerful form of communication that is widely used by the deaf and hard-of-hearing community. By incorporating sign language into robot interaction, we can create a more inclusive and accessible technology for everyone.

Moreover, sign language will provide a new and innovative way to interact with robots, making it possible for people to control and communicate with them in a way that is both non-verbal and non-intrusive.

Note: This project is also offered as Internship position.

DALL·E 2023-02-09 17.32.48 - robot hand communicating with sign language

Thesis Description

The implementation of sign language in human-robot interaction will not only improve the user experience but will also advance the field of robotics and artificial intelligence. This project has the potential to bring about a new era of human-robot interaction, where machines and humans can communicate in a more natural and human-like way. Therefore, the Human-Robot Interaction using Sign Language project is a crucial step toward creating a more accessible and user-friendly technology for everyone.

This thesis will encompass three crucial elements. The first part will focus on recognizing human gestures in sign language through the development of deep learning methods utilizing a camera. The second part will involve programming a robotic hand to translate text back into gestures. Finally, the third part will bring together the first two components to create a seamless human-robot interaction framework using sign language.

Qualifications

  • Currently pursuing a Bachelor’s or Master’s degree in Computer Science, Electrical Engineering, Mechanical Engineering, Mathematics or related fields.
  • Strong programming skills in Python
  • Experience with deep learning frameworks such as PyTorch or TensorFlow.
  • Experience working with robotics hardware
  • Knowledge of computer vision and image processing techniques
  • Good written and verbal communication skills in English.

Interested?

If this project sounds like fun to you, please contact Fotios Lygerakis by email at fotios.lygerakis@unileoben.ac.at or simple visit us at our chair in the Metallurgie building, 1st floor.

Integrated CPS Project or B.Sc. Thesis: Mobile Navigation via micro-ROS

Supervisors:

Start date: October 2022

 

Qualifications

  • Interest in controlling and simulating mobile robotics
  • Interest in Programming in Python and ROS or ROS2
 
Keywords: Mobile robot control, robot operating system (ROS), ESP32

Description

The goal of this project or thesis is to develop a control and sensing interface for our mobile robot “RMP220“. The RMP220 has two powerful brush-less motors equipped with two magnetic encoders.

Learn in this project how to read the sensor values and how to control the motors via micro-ros on a ESP32 controller.

Links:

 

Note: This project is also offered as Internship position.

https://www.youtube.com/watch?v=-MfNrxHXwow

Single Person Project or Team Work

You may work on the project alone or in teams of up to 4 persons.

For a team work task, the goals will be extended to control the robot via ROS 2 and to simulate it in Gazebo or RViz.

Interested?

If this project sounds like fun to you, please contact Linus Nwankwo or Elmar Rueckert or simply visit us at our chair in the Metallurgie building, 1st floor.

Mixed Reality Robot Teleoperation with Hololens 2 [Thesis/Int. CPS Project ]

Description

Mixed Reality (AR) interface based on Unity 3D for intuitive programming of robotic manipulators (UR3). The interface will be implemented within on the ROS 2 robotic framework.

Note: This project is also offered as Internship position.

https://www.youtube.com/watch?v=-MfNrxHXwow

Abstract

Robots will become a necessity for every business in the near future. Especially companies that rely heavily on the constant manipulation of objects will need to be able to constantly repurpose their robots to meet the ever changing demands. Furthermore, with the rise of Machine Learning, human collaborators or ” robot teachers” will need a more intuitive interface to communicate with them, either when interacting with them or when teaching them.

In this project we will develop a novel Mixed (Augmented) Reality Interface for teleoperating the UR3 robotic manipulator. For this purpose we will use AR glasses to augment the user’s reality with information about the robot and enable intuitive programming of the robot. The interface will be implemented on a ROS 2 framework for enhanced scalability and better integration potential to other devices.

Outcomes

This thesis will result to an innovative graphical interface that enables non-experts to program a robotic manipulator.

The student will get valuable experience in the Robot Operating System (ROS) framework and developing graphical interfaces on Unity. The student will also get a good understanding of robotic manipulators (like UR3) and develop a complete engineering project.

Qualifications

  • Currently pursuing a Bachelor’s or Master’s degree in Computer Science, Electrical Engineering, Mechanical Engineering or related fields.
  • Good programming skills in C# and Unity 3D
  • Familiarity with ROS or other robotic frameworks
  • Basic knowledge of 3D modeling and animation
  • Good written and verbal communication skills in English.
  • (optional) Experience with mixed reality development and programming

Interested?

If this project sounds like fun to you, please contact Fotios Lygerakis by email at fotios.lygerakis@unileoben.ac.at or simple visit us at our chair in the Metallurgie building, 1st floor.

Integrated CPS Project or B.Sc./M.Sc. Thesis: Learning to Walk through Reinforcement Learning

Supervisor: 

Start date: ASAP, e.g., 1st of October 2022

Qualifications

  • Interest in controlling and simulating legged robots
  • Interest in Programming in Python and ROS or ROS2
 
Keywords: locomotion, robot control, robot operating system (ROS), ESP32

Introduction

For humans, walking and running are effortless provided good health conditions are satisfied. However, training bipedal or quadrupedal robots to do the same is still today a challenging problem for roboticists and researchers. Quadrupedal robots are known to exhibit complex nonlinear dynamics which makes it near impossible for control engineers to design an effective controller for its locomotion or task-specific actions. 

Reinforcement learning in recent years has shown the most exciting and state-of-the-art artificial intelligence approaches to solving the above-mentioned problem. Although, other challenges, such as learning effective locomotion skills from scratch, transversing rough terrains, walking on a narrow balance beam [3], etc remains. Several researchers in their respective work have proved the possibilities of training quadrupedal robots to walk (fast or slow) or run (fast or slow) through reinforcement learning. Nevertheless, how efficient and effective these walking and running skills are achieved with real-time systems in comparison to humans or quadrupedal animals is still a task to solve.

In the context of this thesis, we propose to study the concept of reinforcement learning and subsequently apply it to train our 3D printed quadrupedal robot in the figure above to walk and run. For this, we will leverage on the work of [1, 2] to explore the robots’ capabilities in generating very dynamic motions or task-specific locomotive actions through reinforcement learning.

Tentative Work Plan

The following concrete tasks will be focused on:

  • study the concept of reinforcement learning as well as its application in quadruped robots for testing control and learning algorithms.
  • apply reinforcement learning algorithms to train the robot to perform skill-specific tasks such as walking, running, etc.
  • real-time experimentation, simulation (MATLAB, ROS & Gazebo, Rviz, C/C++, Python, etc) and validation.

References

[1]        Felix Grimminger, Avadesh Meduri, Majid Khadiv, Julian Viereck, Manuel Wuthrich Maximilien Naveau, Vincent Berenz, Steve Heim, Felix Widmaier, Thomas Flayols Jonathan Fiene, Alexander Badri-Sprowitz and Ludovic Righetti, “An Open Torque-Controlled Modular Robot Architecture for Legged Locomotion Research”, arXiv:1910.00093v2 [cs.RO] 23 Feb 2020.

[2]        Tuomas Haarnoja, Sehoon Ha, Aurick Zhou, Jie Tan, George Tucker and Sergey Levine, Learning to Walk via Deep Reinforcement Learning, arXiv:1812.11103v3 [cs.LG] 19 Jun 2019.

[3]        Haojie Shi1, Bo Zhou2, Hongsheng Zeng2, Fan Wang2y, Yueqiang Dong2, Jiangyong Li2, Kang Wang2, Hao Tian2, Max Q.-H. Meng, “Reinforcement Learning with Evolutionary Trajectory Generator: A General Approach for Quadrupedal Locomotion”, arXiv: 2109.0 6 4 09v1  [cs.RO]  14 Sep 2021.

Link: zur Folie