Project Details
Description
The wide area network (WAN) is a crucial infrastructure for many cloud service provides as it carries traffic for numerous online applications and services at a global scale. These WANs are typically software defined with a logically centralized controller taking charge. Traffic engineering (TE) is a critical and difficult problem in these SD-WANs that involves assigning traffic with various requirements to paths with different constraints. Recently, deep learning (DL) algorithms are applied to TE, yet they all assume that the network is a black box, limiting them to model-free reinforcement learning (RL) algorithms with many issues. We therefore ask, why are we stuck with RL for TE? Can we find a way to enable the rapid advance in many other DL models to be adopted for TE? This project presents a holistic effort to address this fundamental question. We show that the SD-WAN network environment can be sufficiently modeled for TE, and design a fully- differentiable network environment, ∂NE, that can be directly integrated into any DL models. With ∂NE, we can differentiate with respect to control parameters, and directly evaluate gradients between actions and states to facilitate gradient descent based training. We show with a proof-of-concept prototype that ∂NE accelerates DL training for TE by 228× and achieves higher scalability compared to existing network simulators. Based on ∂NE, we also propose to develop novel DL-based TE algorithms beyond RL. As a concrete example, we examine a new TE scheme based on LSTM (Long Short-Term Memory), and preliminary experiments on our ∂NE prototype show that it achieves 7.0% higher throughput and 89.9% lower congestion loss than state-of-the-art RL-based TE. Finally, we plan to investigate the explainability of DL-based TE, and better understand how DL helps TE in a principled way. TE in SD-WANs is an important research area with eminent practical value, and DL promises huge potential for online control problems like TE. Our project opens up the possibility to apply arbitrary DL models to TE beyond RL. To maximize the impact, we plan to implement ∂NE and our DL-based TE algorithms in popular machine learning frameworks like PyTorch and Tensorflow and open source them. We are working with Tencent to experiment ∂NE using their production network and traffic information, and we plan to collaborate with other cloud operators in order to propel the transfer of our research results to production systems.
| Project number | 9042974 |
|---|---|
| Grant type | GRF |
| Status | Finished |
| Effective start/end date | 1/01/21 → 1/01/21 |
Fingerprint
Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.
Research output
-
An Online Auction Approach to UAV Scheduling and Trajectory Planning
Mo, K., Li, X., Xue, C. J., Li, Z. & Xu, H., 2024, ICC 2024 - IEEE International Conference on Communications. Valenti, M., Reed, D. & Torres, M. (eds.). IEEE, p. 1011-1016 (IEEE International Conference on Communications).Research output: Chapters, Conference Papers, Creative and Literary Works › RGC 32 - Refereed conference paper (with host publication) › peer-review
1 Link opens in a new tab Citation (Scopus) -
Efficient Time-Series Data Delivery in IoT with Xender
Liu, L., LI, J., Niu, Z., Zhang, W., Xue, J. C. & Xu, H., May 2024, In: IEEE Transactions on Mobile Computing. 23, 5, p. 4777-4792Research output: Journal Publications and Reviews › RGC 21 - Publication in refereed journal › peer-review
4 Link opens in a new tab Citations (Scopus) -
Palantir: Hierarchical Similarity Detection for Post-Deduplication Delta Compression
Huang, H., Wang, P., Su, Q., Xu, H., Xue, C. J. & Brinkmann, A., 2024, ASPLOS '24 - Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. New York, NY: Association for Computing Machinery, Vol. 2. p. 830–845Research output: Chapters, Conference Papers, Creative and Literary Works › RGC 32 - Refereed conference paper (with host publication) › peer-review
7 Link opens in a new tab Citations (Scopus)