Integrating Vision and Olfaction via Multi-modal LLM for Robotic Odor Source Localization

Louisiana Tech University
MDPI Sensors, 2024

Abstract

Odor Source Localization (OSL) technology allows autonomous agents like mobile robots to find an unknown odor source in a given environment. An effective Navigation Algorithm that guides the robot to approach the odor source is the key to successfully locating the odor source. The downside of traditional olfaction-only OSL methods is that they struggle to localize odor sources in real-world environments with complex airflow. Our proposed solution integrates vision and olfaction sensor modalities to localize odor sources even if olfaction sensing is disrupted by turbulent airflow or vision sensing is impaired by environmental complexities. The model leverages the zero-shot multi-modal reasoning capabilities of large language models (LLMs), negating the requirement of manual knowledge encoding or custom-trained supervised learning models. A key feature of the proposed algorithm is the `High-level Reasoning' module, which encodes the olfaction and vision sensor data into a multi-modal prompt and instructs the LLM to employ a hierarchical reasoning process to select an appropriate high-level navigation behavior. Subsequently, the `Low-level Action' module translates the selected high-level navigation behavior into low-level action commands that can be executed by the mobile robot. To validate our method, we implemented the proposed algorithm on a mobile robot in a complex, real-world search environment that presents challenges to both olfaction and vision-sensing modalities. We compared the performance of our proposed algorithm to single sensory modality-based olfaction-only and vision-only Navigation Algorithms, and a supervised learning-based vision and olfaction fusion Navigation Algorithm. Experimental results demonstrate that multi-sensory Navigation Algorithms are statistically superior to single sensory Navigation Algorithms. The proposed algorithm outperformed the other algorithms in both laminar and turbulent airflow environments.

Methodology

Search Area — Figure: Framework of the Proposed LLM-based Navigation Algorithm.

'High-level Reasoning Module': The 'High-level Reasoning Module' is the core of the proposed OSL algorithm. The module first generates a multi-modal prompt using a system prompt, olfaction description, and robot's egocentric visual frame. The multi-modal prompt is used to query a multi-modal LLM for high-level navigation decision. The decision is decoded and passed to the 'Low-level Action Module'.

'Low-level Action Module' has three primary navigation behaviors. The 'Obstacle-avoid Navigation', the 'Vision-based Navigation' and the 'Olfaction-based Navigation'. The 'Obstacle-avoid Navigation' behavior is activated when a nearby obstacle is detected by the onboard Laser Distance Sensor. The behavior directs the robot to navigate around the obstacle without deviating significantly from the direction the robot was following. The 'Vision-based Navigation' is a class of behaviors that are selected and returned from the 'High-level Reasoning' module. The core strategy of vision-based navigation is to keep the detected target in the middle of the image. The 'Olfaction-based Navigation' includes moth-inspired 'Surge' behavior for following odor and moth-inspired 'Casting' behavior for finding odor.

Experiment

Search area: The focus of the experiment is to test if the proposed navigation algorithm can reason over vision and olfaction sensory inputs to determine the actions to localize an unknown odor source in an environment with obstacles and laminar and turbulent airflow setups.

To determine the effectiveness of olfaction and vision integration in OSL, we compared the OSL performance of single sensory modality based 'Olfaction-only' and 'Vision-only', and multi-sensory modality-based 'Vision and Olfaction Fusion' and the proposed LLM-based navigation algorithms.

Robot Platform — Figure: Robotic platform used in experiments.

The robotic platform used for this task utilizes a Raspberry Pi Camera for vision sensing, an MQ3 alcohol detector and a WindSonic Anemometer for olfaction sensing, and a LDS-02 Laser Distance Sensor.

Trajectory Graphs of the Four OSL Navigation Algorithms

Olfaction-only algorithm trajectory.

Vision-only algorithm trajectory.

Vision and Olfaction Fusion Navigation Algorithm trajectory.

Proposed LLM-based Navigation Algorithm trajectory.

The results show that multi-modal algorithms outperform the single modality navigation algorithms. Furthermore, the proposed LLM-based algorithm outperformed the supervised learning-based 'Vision and Olfaction Fusion' navigation algorithm.