Metro Train Scheduling with Deep Reinforcement Learning Approaches


Student thesis: Doctoral Thesis

View graph of relations



Awarding Institution
Award date30 Sep 2021


This thesis investigates deep reinforcement learning (DRL) approaches for integrated metro train scheduling with rolling stock deployment under dynamic and stochastic passenger demand. Metro systems are vital for sustainable development of urban cities with their great carrying capacity and eco-efficiency. To balance the trade-offs between passengers' and operators' perspectives, metro operations require an effective modeling and optimization framework that can derive adaptive service schedules and traffic resource deployment with respect to temporal dynamic passenger demand. Moreover, a robust metro system should respond promptly to uncertain changes in passenger demand in daily metro operations. Thus, it is of importance to develop more advanced optimization techniques for scheduling train services in real time.

There are three associated research objectives within this thesis. First, we present a novel actor-critic DRL approach for metro train scheduling with limited rolling stock. The scheduling problem is modeled as a Markov decision process (MDP) driven by stochastic passenger demand. As in most dynamic optimization problems, the complexity of the scheduling problem grows exponentially with the number of state variables, decision variables, and uncertainties involved. This study aims to address this 'curses of dimensionality' issue by adopting an actor-critic DRL framework. The framework simplifies the evaluation and searching process for potential optimal solutions via the use of artificial neural networks (ANNs). A deep deterministic policy gradient (DDPG) algorithm is developed to train the actor-critic scheduling agent iteratively for delivering an optimal train scheduling policy that minimizes passengers' and operator's costs with respect to real time demand variations. The proposed approach is tested with a real-world scenario configured with data collected from the Victoria Line of London Underground, UK. Experiment results illustrate the advantages of the proposed method over a range of established heuristics in terms of computing time, system efficiency, and robustness under a stochastic environment.

Second, we develop an integrated metro service scheduling and train unit deployment with a proximal policy optimization (PPO) approach based on the DRL framework. The optimization problem is formulated as a MDP subject to a set of operational constraints. To address the computational complexity, the value function and control policy are parameterized by ANNs with which the operational constraints are incorporated through a devised mask scheme. A proximal policy optimization approach is developed for training the ANNs via successive transition simulations. The optimization framework is implemented and tested on a real-world scenario configured with the Victoria Line of London Underground, UK. The results show that the performance of proposed methodology outperforms a set of selected evolutionary heuristics in terms of both solution quality and computational efficiency. Results illustrate the advantages of having flexible train composition in saving operational costs and reducing service irregularities.

Third, we present an adaptive control system for coordinated metro operations with flexible train composition by using a multi-agent deep reinforcement learning~(MADRL) approach. The control problem is formulated as a multi-agent extension of MDPs with multiple agents regulating operations of different service lines over a dynamic metro network with dynamic and transfer of passengers considered. To ensure the overall stability of the multi-agent system, we adopt an actor-critic reinforcement learning framework with a centralized critic estimating future system states and decentralized actors deriving local operational policies. The critic and actors in the MADRL are represented by multi-layer ANNs. A multi-agent deep deterministic policy gradient (MADDPG) algorithm is developed for training the actor and critic ANNs through successive simulated transitions. The developed framework is tested with a real-world scenario configured with the Bakerloo and Victoria Lines of London Underground, UK. Experiment results demonstrate that the proposed method can outperform various centralized optimization methods in terms of solution quality and resource deployment. Further analysis also shows the merits of applying distributed control agents for different services and directions with flexible train composition.

The major contribution of this thesis lies in three areas. Firstly, this thesis contributes to the innovation of real-time urban transit operations with state-of-the-art computer science and dynamic optimization techniques. Besides, this thesis provides practical implications on efficient and robust train operation policy under uncertain passenger demand and limited rolling stock resources. Last but not least, this thesis contributes to real-time metro network service coordination with flexible train deployment and advanced optimization techniques.