Abstract
Autonomous driving has the potential to revolutionize the area of transportation and make a significant impact on society. To achieve high-level autonomous driving, autonomous vehicles must learn to interact with humans as well as human-controlled vehicles, bicycles, motorcycles, etc., which are on-road agents that can influence the decisions made by autonomous vehicles and may also be affected by autonomous vehicles’ behavior. Such multi-agent interactions must ensure safety and are preferred to be human-like so that autonomous vehicles can seamlessly coexist with humans. A prerequisite for achieving safe and human-like interactions is understanding the behavior of on-road agents, for which the task of multi-agent trajectory prediction plays a central role. With the availability of large-scale driving data, deep learning offers promising solutions for capturing the underlying patterns regarding multi-agent interactions and improving the accuracy of trajectory prediction.This thesis presents a unified learning framework for multi-agent trajectory prediction in autonomous driving. The framework focuses on two critical challenges of this task, including 1) encoding the heterogeneous input about traffic scenarios efficiently and effectively and 2) decoding agents’ future trajectories in consideration of uncertainty and social interactions. In tackling the first challenge, we identify that learning invariant representations can enable efficient and robust traffic scene encoding fundamentally. We start with incorporating spatial roto-translation invariance into the learning of scene representation and demonstrate that the spatial invariance facilitates the expressiveness, efficiency, and robustness of traffic scene modeling. Further, we consider leveraging the translation invariance in time to equip deep learning models with stronger invariance properties. By constructing models with roto-translation invariance in space and translation invariance in time, our framework lays the foundation for accurate multi-agent motion forecasting without any redundant computation. On the other hand, we observe that decoding the future trajectories of multiple agents is still challenging even given the invariant scene representation, and the biggest difficulty lies in the uncertainty of the future. To handle the uncertain nature of the world, we configure the output space as mixture models, where each mode of the mixture density represents one plausible future instantiation of agent trajectories. To capture the multimodal output distribution, we innovate a two-stage decoding pipeline where a proposal module generates initial trajectory guesses and a refinement module further improves the quality of the predicted trajectories. We apply this pipeline to both independent and joint multi-agent trajectory prediction tasks and attain the first-ranked results on a wide range of public benchmarks for autonomous driving.
This thesis substantially advances the progress in multi-agent trajectory prediction, paving the way for next-generation autonomous driving systems. Many methodologies proposed in this thesis have been widely adopted in both the research community and autonomous driving companies all around the world
Date of Award | 26 Aug 2024 |
---|---|
Original language | English |
Awarding Institution |
|
Supervisor | Jianping WANG (Supervisor) & Xiaonan Nancy YU (Co-supervisor) |