Efficiency and Resilience of Resource Allocation for Next-Generation Data Centers
下一代數據中心資源分配的效率與彈性
Student thesis: Doctoral Thesis
Author(s)
Related Research Unit(s)
Detail(s)
Awarding Institution | |
---|---|
Supervisors/Advisors |
|
Award date | 28 Apr 2023 |
Link(s)
Permanent Link | https://scholars.cityu.edu.hk/en/theses/theses(7e26576c-1461-402b-85a3-110a4bd31ebf).html |
---|---|
Other link(s) | Links |
Abstract
Data centers (DCs) are playing an ever more critical role in providing internet services for data processing, storage, and exchange. Driven by the exponential growth of workloads, the next-generation DCs require resource allocation methods that provide customers with high-quality services while ensuring high resource efficiency and lower costs. In this thesis, we consider several revolutionary trends in next-generation DCs and provide corresponding resource allocation methods that focus on both resource efficiency and resilience.
We first notice that the next-generation DCs is evolving toward a fully virtualized architecture where both computing, storage, and networking resources are considered. Server virtualization is a pretty mature technique and is widely adopted in today's DCs. Server virtualization slices computing and storage resources in each server into multiple independent and isolated virtual instances, e.g., virtual machines (VMs) or virtual containers. These virtual instances are then allocated to different users as "real" servers. In addition to server virtualization, network virtualization is emerging and has been introduced in some DCs. Similar to server virtualization, network virtualization abstracts a physical network into multiple isolated virtual networks. The combination of server and network virtualizations leads to a new service form, i.e., a virtual data center (VDC) (also known as virtual infrastructure or virtual private cloud). Each VDC consists of multiple correlated VMs to guarantee the storage/computing resources for deploying applications. Apart from VMs, each VDC also provides bandwidth guarantees for the communication between VMs, in the form of virtual links that interconnect VMs. The problem of allocating physical resources to different VDCs is VDC embedding, which deals with how to map VDC to servers and physical paths in order to ensure their demands for resources are met. Although many publications have considered VDC embedding, they have overlooked the hot spots issue in DCs. The excessive heat dissipation and hot air generation from the IT equipment create hot spots in DCs, adversely affecting hardware resilience. Motivated by this, we propose a temperature-aware VDC embedding scheme to avoid hot spots by minimizing the maximum temperature of air drawn to each rack (i.e., rack inlet temperature). We also aim to reduce the power consumption of the IT equipment (servers and switches) in this scheme. We provide a Mixed-Integer Linear Programming (MILP) formulation and a heuristic algorithm to implement the proposed VDC embedding scheme. We also provide numerical results to evaluate the performance of our proposed scheme. The results show that the proposed scheme can effectively avoid hot spots in DCs while keeping the power consumption of IT equipment at a low level.
In addition to virtualization, DCs are also experiencing an evolution of resource disaggregation. Conventional DCs are server-based, where different resources are strictly coupled in integrated motherboards. This architecture can cause a serious issue of resource stranding and has made it quite costly for resource upgrades and expansion. Resource disaggregation aims to solve these problems by decoupling different resources from integrated servers and reassembling them into different resource pools that are interconnected through advanced networking techniques. This new hardware design leads to a disaggregated DC (DDC) architecture. Apart from improving resource efficiency, resource disaggregation may also improve service reliability because it improves resource allocation flexibility, thereby providing more freedom to select an appropriate group of resource components to compose a system (e.g., a VM). Besides, the decoupling of different resources allows them to fail independently. When one CPU module fails, the associated memory module can still be used by other CPU modules. Motivated by this observation, we consider a reliable resource allocation problem for a DDC, where we try to maximize the number of carried service requests while guaranteeing their reliability requirements. In an initial study, we consider a resource allocation request to be like a single VM and plan to extend it to VDC embedding in the future. We provide an integer linear programming (ILP) formulation and a heuristic algorithm for this problem. Numerical results demonstrate that our proposed methods can significantly improve the number of carried requests with guaranteed reliability requirements, and the improvement is up to 97\% when the hardware is fully disaggregated from integrated servers.
The previous reliable resource allocation methods for DDC focus on the beneficial aspects of resource disaggregation but have overlooked the network impacts. Since DDC directly exposes resource modules to shared networks, the failure of the shared networks may lead to many resource modules not being available. In addition, restricted by the strict latency requirements of inter-resource communication, especially CPU-memory communication, resource disaggregation may not be applicable to an entire DC. Alternatively, a DC may perform resource disaggregation at a rack or a pod scale (A pod basically comprises multiple racks). In a rack/pod-scale DDC, servers in each rack/pod are completely disaggregated while different racks/pods are still isolated. Observing this, we further consider the network effects and different disaggregation scales in the problem of reliable resource allocation for DDCs. For this problem, we provide a MILP formulation and a resource allocation framework named Radar. Numerical results demonstrate that the benefits of hardware disaggregation may be adversely affected by an imperfect network. It also shows that both the hardware backup and our proposed migration-based restoration can be applied to overcome this potential adverse effect.
We first notice that the next-generation DCs is evolving toward a fully virtualized architecture where both computing, storage, and networking resources are considered. Server virtualization is a pretty mature technique and is widely adopted in today's DCs. Server virtualization slices computing and storage resources in each server into multiple independent and isolated virtual instances, e.g., virtual machines (VMs) or virtual containers. These virtual instances are then allocated to different users as "real" servers. In addition to server virtualization, network virtualization is emerging and has been introduced in some DCs. Similar to server virtualization, network virtualization abstracts a physical network into multiple isolated virtual networks. The combination of server and network virtualizations leads to a new service form, i.e., a virtual data center (VDC) (also known as virtual infrastructure or virtual private cloud). Each VDC consists of multiple correlated VMs to guarantee the storage/computing resources for deploying applications. Apart from VMs, each VDC also provides bandwidth guarantees for the communication between VMs, in the form of virtual links that interconnect VMs. The problem of allocating physical resources to different VDCs is VDC embedding, which deals with how to map VDC to servers and physical paths in order to ensure their demands for resources are met. Although many publications have considered VDC embedding, they have overlooked the hot spots issue in DCs. The excessive heat dissipation and hot air generation from the IT equipment create hot spots in DCs, adversely affecting hardware resilience. Motivated by this, we propose a temperature-aware VDC embedding scheme to avoid hot spots by minimizing the maximum temperature of air drawn to each rack (i.e., rack inlet temperature). We also aim to reduce the power consumption of the IT equipment (servers and switches) in this scheme. We provide a Mixed-Integer Linear Programming (MILP) formulation and a heuristic algorithm to implement the proposed VDC embedding scheme. We also provide numerical results to evaluate the performance of our proposed scheme. The results show that the proposed scheme can effectively avoid hot spots in DCs while keeping the power consumption of IT equipment at a low level.
In addition to virtualization, DCs are also experiencing an evolution of resource disaggregation. Conventional DCs are server-based, where different resources are strictly coupled in integrated motherboards. This architecture can cause a serious issue of resource stranding and has made it quite costly for resource upgrades and expansion. Resource disaggregation aims to solve these problems by decoupling different resources from integrated servers and reassembling them into different resource pools that are interconnected through advanced networking techniques. This new hardware design leads to a disaggregated DC (DDC) architecture. Apart from improving resource efficiency, resource disaggregation may also improve service reliability because it improves resource allocation flexibility, thereby providing more freedom to select an appropriate group of resource components to compose a system (e.g., a VM). Besides, the decoupling of different resources allows them to fail independently. When one CPU module fails, the associated memory module can still be used by other CPU modules. Motivated by this observation, we consider a reliable resource allocation problem for a DDC, where we try to maximize the number of carried service requests while guaranteeing their reliability requirements. In an initial study, we consider a resource allocation request to be like a single VM and plan to extend it to VDC embedding in the future. We provide an integer linear programming (ILP) formulation and a heuristic algorithm for this problem. Numerical results demonstrate that our proposed methods can significantly improve the number of carried requests with guaranteed reliability requirements, and the improvement is up to 97\% when the hardware is fully disaggregated from integrated servers.
The previous reliable resource allocation methods for DDC focus on the beneficial aspects of resource disaggregation but have overlooked the network impacts. Since DDC directly exposes resource modules to shared networks, the failure of the shared networks may lead to many resource modules not being available. In addition, restricted by the strict latency requirements of inter-resource communication, especially CPU-memory communication, resource disaggregation may not be applicable to an entire DC. Alternatively, a DC may perform resource disaggregation at a rack or a pod scale (A pod basically comprises multiple racks). In a rack/pod-scale DDC, servers in each rack/pod are completely disaggregated while different racks/pods are still isolated. Observing this, we further consider the network effects and different disaggregation scales in the problem of reliable resource allocation for DDCs. For this problem, we provide a MILP formulation and a resource allocation framework named Radar. Numerical results demonstrate that the benefits of hardware disaggregation may be adversely affected by an imperfect network. It also shows that both the hardware backup and our proposed migration-based restoration can be applied to overcome this potential adverse effect.