Abstract
In modern society, networking has undergone a significant transformation driven by technological advancements. This transformation necessitates the development of resilient network designs capable of addressing the challenges posed by various types of failure events, including single-node failures, multiple-node failures, and regional failures. Networks characterized by high resilience play a pivotal role in attaining diverse objectives such as enhancing the Quantity of Service (QoS) and ensuring real-time operations.To cater to the demands of highly resilient networks, Internet Service Providers (ISPs) must possess comprehensive insights into the internal dynamics of every network component. The conventional method, which involves monitoring the performance of individual network nodes without regard for network topologies, often proves impractical due to the significant overhead and other associated factors. Network tomography, a technique that infers the states of internal network elements from end-to-end measurement paths' states or delays, presents a promising solution for monitoring each node in the network and enabling the detection and localization of single or multiple node failures. Nevertheless, the efficiency of this technique critically relies on the underlying network designs. While network design typically prioritizes scalability and reconfigurability, constructing an easily monitored network topology and deploying a detailed measurement plan that accurately pinpoints failures within the framework of network tomography remain challenging tasks.
Moreover, real-world networks often span different regions and countries, involving large-scale infrastructures and significant distances between computing resources (such as data centers) and end-users. In such scenarios, the network may cover risk zones prone to natural disasters, resulting in a large number of simultaneous failures. Consequently, achieving timely recovery for all these failures becomes infeasible. The quest for an efficient and economical network topology that maintains uninterrupted communication remains a pressing challenge that requires resolution.
To enhance network resilience against the aforementioned failure events, this thesis addresses three optimization problems related to network design. These problems include the progressive construction of k-identifiable networks, the progressive construction of measurement paths, and the design of a risk zone-diversified network. Each problem has been proven to be at least NP-hard.
The first problem focuses on constructing a network topology with minimal cost while ensuring the ability to monitor every internal node through network tomography in the face of failure events. This objective is based on the assumption that the number of simultaneous node failures must not exceed a specific threshold k. To formally define this problem, we adopt the concept of "k-identifiability", a quantitative measure of network monitorability within the context of network tomography. A k-identifiable network denotes the capacity for accurately pinpointing all concurrently occurring node failures within the stipulated threshold k. We derive a series of properties for k-identifiable networks that narrow down the search space for network topology construction, particularly when the number of monitors is limited. Based on these findings, we propose an integrated algorithm that utilizes a novel dual-heuristic approach to generate well-formed network topologies.
Building upon the first problem, the second problem aims to achieve a detailed deployment of measurement paths in a given network for localizing single or multiple-node failures. To accomplish this, we conduct a systematic analysis of topological features and divide the network into specific sub-networks. By prioritizing multiple coverage of each node, we design a heuristic to select measurement paths in each sub-network. These selections are then combined to yield the final measurement set.
In the third problem, we focus on establishing a network consisting of diverse paths between source-destination pairs that are risk zone-disjoint. This design prevents any single disaster from disrupting overall network connectivity. To achieve this goal, we identify key geographical features and introduce an innovative cost framework that considers geographically overlapping links and long-term maintenance costs. By incorporating these considerations, we aim to reduce the cost of network design. We propose a proactive approach that incrementally constructs a risk zone-diversified network topology.
To validate the efficiency of the proposed methodologies, extensive simulations and experiments are conducted. These evaluations employ both small-scale randomly generated networks and large-scale real-world networks. In small-scale networks, our methods demonstrate near-optimality compared to theoretical optimal solutions obtained through brute-force search. Furthermore, we showcase the effectiveness of our approaches in large-scale networks, emphasizing their practical significance in real-world scenarios.
In summary, this thesis formally defines three optimization problems related to network design and resilience against node failures or regional disasters. We prove their intractabilities and propose corresponding methods to address the distinct challenges posed by each problem. Through performance evaluations in both small and large-scale networks, we validate the efficiency and effectiveness of our proposed methods.
| Date of Award | 20 Sept 2024 |
|---|---|
| Original language | English |
| Awarding Institution |
|
| Supervisor | Jianping WANG (Supervisor) |