Multimodal-XAD : Explainable Autonomous Driving Based on Multimodal Environment Descriptions

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

View graph of relations

Author(s)

Related Research Unit(s)

Detail(s)

Original languageEnglish
Pages (from-to)19469-19481
Journal / PublicationIEEE Transactions on Intelligent Transportation Systems
Volume25
Issue number12
Online published7 Oct 2024
Publication statusPublished - Dec 2024

Abstract

In recent years, deep learning-based end-to-end autonomous driving has become increasingly popular. However, deep neural networks are like black boxes. Their outputs are generally not explainable, making them not reliable to be used in real-world environments. To provide a solution to this problem, we propose an explainable deep neural network that jointly predicts driving actions and multimodal environment descriptions of traffic scenes, including bird-eye-view (BEV) maps and natural-language environment descriptions. In this network, both the context information from BEV perception and the local information from semantic perception are considered before producing the driving actions and natural-language environment descriptions. To evaluate our network, we build a new dataset with hand-labelled ground truth for driving actions and multimodal environment descriptions. Experimental results show that the combination of context information and local information enhances the prediction performance of driving action and environment description, thereby improving the safety and explainability of our end-to-end autonomous driving network. © 2024 IEEE.

Research Area(s)

  • Autonomous driving, BEV perception, decision making, explainable AI (XAI), multimodal explanations