Abstract
Dynamic resource allocation (DRA) in supply chain logistics typically refers to the transformation of production materials, data information, and human resources between the supply and demand sides. The majority of DRA problems can be classified as optimization problems. With the increase in the variety of resources to be allocated and the growing uncertainty from supply and demand, these DRA problems encounter challenges similar to those optimization problems in complex, multi-layered, and dynamic operational scenarios. This thesis reviews and analyzes the related research, summarizing the following challenges in DRA: (i) the challenge of constructing optimization models that represent complex real-world scenarios during the modeling stage, (ii) the difficulty in overcoming the 'curse of dimensionality' when designing solution algorithms, and (iii) the challenge of testing the robustness of proposed solution approaches across diverse case scenarios during the validation stage. Existing studies addressing these challenges typically focus on solutions for single-resource configurations, single-layer static settings, or verification within simulated environments. Consequently, current research falls short of providing adaptive and robust solutions under high-dimensional, high-uncertainty, and dynamically changing decision-making requirements. Meanwhile, Reinforcement Learning (RL) offers a promising avenue for formulating problem models and designing decision frameworks for high-dimensional dynamic decision problems. By modeling complex, multi-layered, and dynamic optimization challenges as Markov Decision Processes (MDPs) or partially observable MDPs (POMDPs), RL methods can achieve adaptive optimization strategies in high-dimensional state spaces.By comprehending the aforementioned challenges and opportunities, this thesis proposes a hierarchical RL-based decision system. This system addresses the optimization of DRA, which includes production resources, transportation resources, and human resources, by formulating different types of resource allocation problems, developing adaptive hierarchical RL algorithms, and validating the approach with real-world data. Under conditions of supply-demand uncertainty, the proposed adaptive decision-making system is shown to enhance operating revenues, improve operational efficiency, and increase the fairness of human resource allocation. The specific research contributions and innovations are as follows:
(1) For the production-inventory-distribution problem under market uncertainty, this thesis proposes an adaptive hierarchical RL control system. This proposed system constructs a MDP based on fluctuating market conditions and develops a value-based RL algorithm. By enabling the agent to dynamically adjust production quantities, the system aims to maximize systematic profit. Specifically, the upper-level agent of the hierarchical control system employs the value-based RL algorithm to adjust production decisions dynamically; the lower-level component is designed as an interactive environment for the upper-level agent, with its distribution actions formulated as an Orienteering Problem (OP); and finally, numerical simulations and the industry data from local company validation demonstrate the robustness of the hierarchical control framework and its ability to adeptly adjust production decisions under fluctuating market conditions. The overall experiment results yield a 45.62% improvement in systematic profit compared to traditional methods.
(2) For the distribution process, which considers the zero-inventory constraint under unexpected customer behaviors, this thesis proposes a dynamic pricing-inventory-reassignment problem and a dynamic pricing and reassignment method based on customer price preferences. Based on the adaptive hierarchical RL control system, a route-based MDP model is constructed under dynamic customer behavior conditions, and a policy-based RL algorithm and deep artificial neural network are developed to minimize resource waste while maximizing reassignment profits. Specifically, the upper-level agent employs a policy-based RL algorithm for dynamic pricing and selecting reassignment customers; the lower-level distribution actions are formulated as a vehicle routing problem (VRP); and numerical simulations along with two kinds of companies' industry data validations present that the proposed method effectively reduces resource waste and enhances reassignment benefits. The systematic experiment results show that the proposed solution approach achieves a 32.98% improvement in resource waste reduction over benchmark methods.
(3) For the dynamic production-inventory-distribution problem that considers workforce fairness among retailers, this thesis proposes a dynamic production and retailer assignment-inventory-distribution problem along with a dynamic assignment method based on the distribution capabilities of personnel. Utilizing the proposed adaptive hierarchical RL control system along with the preference quantification approach, a POMDP model is constructed for dynamic retailer assignment, and a cooperative multi-agent RL algorithm with a communication mechanism and the deep ANN are developed. This approach aims to maximize fairness in human resource allocation while balancing operational profits. Specifically, two types of agents are designed within the hierarchical control system to represent distinct decision entities (i.e., production quantity and distribution assignment); a multi-agent RL algorithm with a communication mechanism is proposed to facilitate inter-agent communication; and numerical simulations and industry data validations demonstrate that this method improves overall operational efficiency while ensuring fairness in human resource allocation. The final experiment results illustrate a 25.80% improvement over baseline methods.
In summary, this thesis addresses the challenges of dynamic resource allocation for various resource types by integrating multiple RL algorithms. A model-free, hierarchical RL decision system is developed to achieve adaptive decision optimization in high-dimensional and uncertain environments. Through the modeling and algorithms design, this thesis not only provides a comprehensive and robust solution framework and decision system for dynamic resource allocation problems, but also validate different real-world industrial data, offers a generalizable decision support system for companies in the supply chain and logistics industry to tackle complex optimization problems from the practical scenarios.
| Date of Award | 16 Jun 2025 |
|---|---|
| Original language | English |
| Awarding Institution |
|
| Supervisor | Andy CHOW (Supervisor) & Zhili Zhou (External Supervisor) |
Keywords
- Dynamic resource allocation
- Hierarchical optimization
- Inventory management
- Markov decision processes
- Reinforcement learning