Horizontal Gene Transfer in Human Gut Microbiota


Student thesis: Doctoral Thesis

View graph of relations


Related Research Unit(s)


Awarding Institution
Award date21 Oct 2020


Horizontal gene transfer (HGT) refers to the transfer of genetic materials between organisms through mechanisms other than parent-offspring inheritance. It plays a major role in microbial evolution. Its common occurrence in human gut microbiota implies HGT has close relationship with host status. During the process of HGT, transferred fragment from donors would be integrated into receptors' genomes, which would lead to complex structure variation. Especially, in the human gut, numerous HGT events connect microbial species to form HGT networks. In this thesis, we propose a graph-based method named LEMON to reconstruct the local strains from the gut metagenomics data at the HGT sites. Simulation results illustrate that LEMON could recover local strains with complicated structure variation. Furthermore, HGT would also cause gene fusion events. We have found many fusion genes involved in HGT encoding proteins that facilitate the horizontal transfer of genetic material such as recombinase family protein, site-specific integras, conjugal transfer protein Tra, and so on. These findings support HGT events detected by LEMON.

Based on the HGT sites detected by LEMON, we propose a deep residual network named DeepHGT to recognize HGT sites by learning sequence features. To train DeepHGT, we utilize LEMON to extract about 1.55 million sequence segments as training instances from 262 metagenomic samples, where the ratio between positive instances and negative instances is about 1:1. In order to further evaluate the generalization of DeepHGT, we constructed an independent test set containing 689,312 sequence segments from another 147 gut metagenomic samples. DeepHGT has achieved the AUC value of 0.8428. Furthermore, DeepHGT could learn discriminant sequence features; for example, DeepHGT has learned a sequence pattern of palindromic subsequences as a significantly (P-value=0.0182) local feature. Hence, DeepHGT is a reliable model to recognize the HGT insertion site. DeepHGT is the first machine learning methodology that can directly recognize HGT insertion sites on genomes according to sequence pattern.

We apply LEMON on two typical sets of gut microbial samples to construct HGT networks. Then we perform network analysis to characterize these HGT networks. Our findings include 1) the HGT networks are scale free. 2) The networks expand their complexities, sizes, and edge numbers, accompanying the early stage of lives; and Microbiota established in children shared high similarity as their mother (p-value=0.0138), supporting the transmission of microbiota from mother to child. 3) Groups harbour group-specific network edges, and network communities, which can potentially serve as biomarkers.

In this thesis, we investigate HGT in human gut from detection method to network model. This makes us have an overall thorough understanding on the mechanism of HGT and realize the important role of HGT in gut microbiota. HGT provides us a powerful perspective to analysis human gut microbiota in everchanging environment.

    Research areas

  • HGT, metagenomics, gut, deep learning, network