Towards A Better Understanding of rDNA in Caenorhabditis Species Using Long-Read Sequencing
DescriptionProtein translation is one of the most essential biological processes in cells, and is executed by ribosomes. Ribosomes are composed of numerous RNAs (i.e., ribosomal RNAs or rRNAs) and protein that form large complexes to perform protein synthesis. rRNA accounts for the majority of RNA molecules in any given cell; this is largely due to the unusually high numbers of copies of genes encoding rRNA (i.e., rDNAs) which are arranged in tandem arrays in genomes. The highly repetitive nature of rDNA and its tandem arrangement has prevented the precise determination of rDNA and its surrounding sequences. The major hindrances to resolving rDNA and its flanking sequences lie in the technologies that are currently used to generate genomic sequences. For example, next-generation sequencing (NGS) has played a dominant role over the past decade in producing the draft genomes of various species. However, although NGS works relatively well for small genomes with low sequence complexity, most eukaryotes have relatively large genomes with numerous repetitive sequences, some of which are arranged in tandem arrays (e.g., rDNA) and thus cannot be correlated to their respective chromosomes using NGS alone. This inability to precisely locate rDNA location and its sequence variants significantly inhibits the functional characterisation of rDNA and its regulatory mechanisms.We have developed a pipeline to accommodate variation calling and assembly of tandem repeat regions using nanopore long reads. Using this pipeline, we have reconstructed almost the entire rDNA cluster of the C. elegans genome. We have located the variable region by using several transgenic strains to provide ‘anchor’ sequences for the tandem arrays of rDNA clusters. Comparative analysis of C. elegans wild isolates has revealed a novel 5S rDNA unit, characterised by 30-bp deletions from the ‘reference’ unit that is related to copy number variations in 5S rDNA. These findings will enable the study of the rDNA replication mechanism in C. elegans. We hypothesise that the sequence variations in the rDNA cluster reveal replication origins and pause sites of different efficiencies, thus linking them to rDNA replication. This hypothesis will be tested by capturing nascent intermediates of rDNA replication using nanopore sequencing of both wild isolates and transgenic strains containing different rDNA units.The proposed research will provide insights into the structure and function of rDNA clusters in eukaryotic species. The methods built to assemble the rDNA clusters will also be used to study rDNA repeats in other species.
|Effective start/end date||1/01/22 → …|