Abstract
Topologically associating domains (TADs) are fundamental units of chromosome organization, inferred from high-throughput chromosome conformation capture (Hi-C) contact maps. Within a TAD, genomic sequences interact more frequently with each other than with sequences in adjacent domains. Importantly, TADs exhibit diverse configurations: hierarchically nested domains merge into larger structures, overlapped domains share boundary regions, disjoint domains are non-overlapping and distinct from one another, and gapped domains correspond to regions lacking defined TAD structures. TAD boundaries play crucial roles in genome regulation by restricting transcriptional activity and accommodating structural elements such as CCCTC-binding factor (CTCF), cohesin complexes, and housekeeping gene transcription start sites (TSSs). Disruptions in TAD organization have been implicated in diseases, including cancer.Despite advances in Hi-C technology, computational methods for TAD identification remain challenging. Some initial TAD-calling methods relied on single-layer partitioning assumptions, which limited their ability to capture complex TAD architectures. More recent methods can model intricate architectures but often incur high computational costs. Additionally, while some algorithms attempt to integrate hierarchical and overlapped TAD configurations, they frequently misidentify locally relevant boundaries, particularly in noisy or multilayered datasets. Thus, a critical need remains for methods capable of detecting diverse TAD architectures—including disjoint, hierarchical, overlapped, and gapped configurations—while achieving high performance in terms of structural concordance, boundary precision, and stability across varying data conditions.
To address these challenges, the research is structured as follows:
Chapter 1 provides a comprehensive overview of DNA structural organization, with an emphasis on Hi-C data and computational methods for TAD identification.
Chapter 2 introduces SuperTAD-Fast, an approximation algorithm designed to accelerate hierarchical TAD detection using structural information theory. By leveraging a discretized structural entropy model, SuperTAD-Fast reduces the search space for optimal encoding tree construction. SuperTAD-Fast enhances computational efficiency while maintaining accuracy, outperforming SuperTAD in execution time and resource consumption across simulated and real Hi-C datasets.
Chapter 3 presents TADClam, an efficient tool for detecting hierarchical and overlapped TAD structures. TADClam applies a weighted community affiliation model to generate candidate TADs, followed by an entropy-based screening process to refine the final structures. Unlike many existing methods, TADClam does not assume that TADs are strictly disjoint or purely hierarchical, enabling the identification of more comprehensive chromatin architectures.
Chapter 4 introduces DeTAD, a computational framework designed to capture the full spectrum of TAD architectures. Based on a generalized symmetric matrix factorization approach with distance-aware regularization, DeTAD is capable of identifying disjoint (non-overlapping), hierarchical, overlapped, and gapped (regions lacking defined TADs) domain structures. Evaluation on both simulated and real Hi-C datasets indicates that DeTAD performs favorably compared with existing methods and identifies domain boundaries consistent with known biological patterns.
Chapter 5 systematically analyzes twelve TAD-calling methods from an algorithmic perspective. This chapter evaluates the optimality of different formulations, particularly in terms of computational complexity and solvability in polynomial time. Benchmarking studies highlight the trade-offs between different approaches, providing operational guidance for researchers in life sciences.
Chapter 6 concludes the dissertation by summarizing key findings. This dissertation provides a progressive exploration of TAD detection, from foundational concepts to advanced computational frameworks, ultimately contributing to a deeper understanding of genome organization and its regulatory implications.
| Date of Award | 5 Sept 2025 |
|---|---|
| Original language | English |
| Awarding Institution |
|
| Supervisor | Shuaicheng LI (Supervisor) |