Multimodal-XAD: Explainable Autonomous Driving Based on Multimodal Environment Descriptions

Yuchao Feng, Zhen Feng, Wei Hua, Yuxiang Sun*

*Corresponding author for this work

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

8 Citations (Scopus)

Abstract

In recent years, deep learning-based end-to-end autonomous driving has become increasingly popular. However, deep neural networks are like black boxes. Their outputs are generally not explainable, making them not reliable to be used in real-world environments. To provide a solution to this problem, we propose an explainable deep neural network that jointly predicts driving actions and multimodal environment descriptions of traffic scenes, including bird-eye-view (BEV) maps and natural-language environment descriptions. In this network, both the context information from BEV perception and the local information from semantic perception are considered before producing the driving actions and natural-language environment descriptions. To evaluate our network, we build a new dataset with hand-labelled ground truth for driving actions and multimodal environment descriptions. Experimental results show that the combination of context information and local information enhances the prediction performance of driving action and environment description, thereby improving the safety and explainability of our end-to-end autonomous driving network. © 2024 IEEE.
Original languageEnglish
Pages (from-to)19469-19481
JournalIEEE Transactions on Intelligent Transportation Systems
Volume25
Issue number12
Online published7 Oct 2024
DOIs
Publication statusPublished - Dec 2024

Funding

This work was supported in part by Guangdong Basic and Applied Basic Research Foundation under Grant 2022A1515010116, in part by Zhejiang Laboratory under grant 2021NL0AB01, and in part by the City University of Hong Kong under Grant 9610675

Research Keywords

  • Autonomous driving
  • BEV perception
  • decision making
  • explainable AI (XAI)
  • multimodal explanations

Fingerprint

Dive into the research topics of 'Multimodal-XAD: Explainable Autonomous Driving Based on Multimodal Environment Descriptions'. Together they form a unique fingerprint.

Cite this