Microbiome Big Data Research towards Species Diversity and Heterogeneity Analysis


Student thesis: Doctoral Thesis

View graph of relations


Related Research Unit(s)


Awarding Institution
Award date21 Jun 2021


Microbes are indispensable components in the biogeochemical cycle, which have important ecological status and functions to balance the ecosystem, regulate the metabolism of animals and plants, and prevent diseases. The rapid development of sequencing technology has accumulated a large amount of microbial whole-genome and metagenome data, which provides abundant resources for studying microbial genetic diversity, dynamic changes, and environmental adaptability, and promotes the study of microbial genomes from individual genomes to species-wide genomes and community-based metagenomes. A microbial pan-genome is a collection of all genes in a given microbial group. Pan-genomic analysis can reveal the genetic diversity, phylogeny, and genetic variation at the microbial species level. Metagenome refers to the collection of the genes of all microorganisms in a specific environment. Metagenomics analysis can capture the dynamic behavior, functional metabolic profiles, and interactions of microbes in a specific environment. The combination of microbial pan-genomics and metagenomics can reveal the complex relationship from a single microbial genome to multiple genomes and communities, thereby extending the genomic profile of a microbial individual to entire species and communities. Therefore, this study combined pan-genomics and metagenomics methods to profile microbial diversity, heterogeneity, and evolutionary patterns at different taxonomic levels and community levels.

In this study, the pan-genomes of Shewanella, Thermococcales, and Aeromonas at genus or order level were reconstructed to profile their genomic composition, metabolic pathways, and evolutionary patterns, revealing genetic diversity, heterogeneity, and evolutionary patterns of key genes involved in the metal-reducing pathway of Shewanella, thermophilia of Thermococcales, and pathogenicity of Aeromonas. On the other hand, this study extended the pan-genomics to the metagenomics, and conducted a linking pan-genome and metagenome analysis of Aeromonas and human gut microbes, revealing the diversity of Aeromonas virulence genes at genus and community levels, and the global diversity and uniqueness patterns of human gut microbiota. The main results were given as follows:

1) The pan-genome study of Shewanella at the genus level revealed a high degree of genomic plasticity. Frequent gene gains and losses have driven the changes in the Shewanella gene pool. The composition and structure of the gene cluster involved in the metal-reducing pathway of different Shewanella strains were different, and this gene cluster underwent strong purifying selection. This study clarified the genetic diversity and evolutionary differences of Shewanella and revealed the important role of purifying selection in stabilizing the Shewanella metal reduction pathway, which would be conducive to further study of the application of Shewanella in bioremediation and metabolic engineering.

2) The pan-genome study of Thermococcales at the order level revealed an open pan-genome. The core genome, accessory genome, and strain-specific genome of Thermococcales had different functional enrichments and evolutionary pressure constraints. Different degrees of purifying selection restricted the changes of key genes involved in motility, secretion system, and defense system. Besides, genes encoding heat shock proteins such as HSP20 and HSP60 were highly conserved under the strong purifying pressure to maintain the stability of their heat resistance and pressure tolerance that restricted their variations. This study revealed the genetic diversity and evolutionary differences of Thermococcales at the genus and order level, which provided theoretical references for the research of the adaptability of thermophilic archaea in the extreme environment.

3) Linking pan-genome to metagenome revealed the widespread existence of genetic factors related to the pathogenicity of Aeromonas. These Aeromonas bacteria harbored a core set of virulence factors, indicating that they were at risk of disease. Gene contraction and expansion, horizontal gene transfer drove the differences in the specific virulence gene pool related to the pathogenicity of this genus. Extending the virulence factors to the community found that Aeromonas and their virulence factors in the microbial community were dynamic in response to environmental changes, and their abundance and diversity increased during chicken storage, which may increase the risk of Aeromonas infection. This study revealed the genetic diversity, evolutionary relationship, and virulence gene patterns of Aeromonas and extended the distribution pattern of virulence gene from a single individual to the pan-genome and microbial community, which deepened the understanding of the virulence diversity of Aeromonas and would help detect and prevent the infection of Aeromonas.

4) The study of pan-microbiome and community-based pan-genome profiled the diversity and specificity of global human gut microbiota at the community level and pan-genomic level. The reconstructed pan-microbiome revealed a pan-microbiota consisting of 434 bacterial genera of which Subdoligranulum, Faecalibacterium, and Blautia were found to be the core microbiota of the global human gut. The meta-pangenome studies of the three bacterial genera related to the enterotypes of these populations revealed the open pan-genome and genetic diversity of Bacteroides, Prevotella, and Bifidobacterium in the human gut, which deepened the understanding of the global distribution pattern of the worldwide human gut microbiome.