Fully-decentralized and Near-optimal Large-scale Multi-robot Collision Avoidance via Deep Learning

Project: Research

View graph of relations

Description

Effective collision avoidance technology that is robust to sensor noise and is scalable to large systems is essential for enabling a large number of robots to collaborate efficiently in cluttered and changing environments, with applications in automated warehouses, self-driving cars, and service robots within the close-proximity of humans.However, existing multi-agent navigation techniques have important limitations. Centralized approaches can generate optimal navigation policies but are disadvantageous in scalability, flexibility, and robustness. Decentralized systems claim to be scalable, but unfortunately, most of them are not fully-decentralized, but rather need additional channels such as WIFI communications to share global knowledge among agents. More importantly, their performance measured by navigation speed and quality is significantly lower than their centralized counterparts.In this project, we will show how a novel end-to-end deep learning framework can generate fully-decentralized and near-optimal navigation policies for large-scale robot systems. Our method is among first few working systems that accomplish fully-distributed navigation applicable to large-scale physical robot systems with hundreds of robots without any inter-agent knowledge sharing. More importantly, our approach greatly reduces the performance gap between decentralized and centralized navigation policies, making our method competitive for high-throughput industrial applications.This research includes two parts. First, to learn a fully-decentralized policy, we formulate each agent's end-to-end navigation strategy as a deep neural network mapping from noisy sensor observation to the agent's steering commands. The policy network is trained in a supervised manner over a large set of navigation data. The learned policy is able to generate reliable navigation behaviors that generalize well in various scenarios. Second, to push the performance of the learned decentralized policy towards the limit of centralized approaches, we refine the learned policy using a multi-scenario multi-stage training framework which exploits a reinforcement learning algorithm based on robust policy gradient. The computed optimal policy is able to outperform the original policy in both navigation speed and quality.It is perceivable that the proposed research will contribute to advances in multi-agent navigation by combining the advantages of fully-decentralized methods in scalability and centralized methods in optimality. The project would transform how robot teams are used in industrial applications. Scientific merits will come from: (i) the development of robust fully-distributed multi-agent systems; (ii) the research into the near-optimal decentralized navigation policy via deep reinforcement learning; (iii) the realization of fully-distributed and near-optimal multi-agent control principles on practical robotic systems with more than 10 physical robots and more than 100 simulated agents.

Detail(s)

Project number9042646
Grant typeGRF
StatusFinished
Effective start/end date1/01/192/01/19