TY - JOUR
T1 - Mining bacterial NGS data vastly expands the complete genomes of temperate phages
AU - Zhang, Xianglilan
AU - Wang, Ruohan
AU - Xie, Xiangcheng
AU - Hu, Yunjia
AU - Wang, Jianping
AU - Sun, Qiang
AU - Feng, Xikang
AU - Lin, Wei
AU - Tong, Shanwei
AU - Yan, Wei
AU - Wen, Huiqi
AU - Wang, Mengyao
AU - Zhai, Shixiang
AU - Sun, Cheng
AU - Wang, Fangyi
AU - Niu, Qi
AU - Kropinski, Andrew M.
AU - Cui, Yujun
AU - Jiang, Xiaofang
AU - Peng, Shaoliang
AU - Li, Shuaicheng
AU - Tong, Yigang
PY - 2022/9
Y1 - 2022/9
N2 - Temperate phages (active prophages induced from bacteria) help control pathogenicity, modulate community structure, and maintain gut homeostasis. Complete phage genome sequences are indispensable for understanding phage biology. Traditional plaque techniques are inapplicable to temperate phages due to their lysogenicity, curbing their identification and characterization. Existing bioinformatics tools for prophage prediction usually fail to detect accurate and complete temperate phage genomes. This study proposes a novel computational temperate phage detection method (TemPhD) mining both the integrated active prophages and their spontaneously induced forms (temperate phages) from next-generation sequencing raw data. Applying the method to the available dataset resulted in 192 326 complete temperate phage genomes with different host species, expanding the existing number of complete temperate phage genomes by more than 100-fold. The wet-lab experiments demonstrated that TemPhD can accurately determine the complete genome sequences of the temperate phages, with exact flanking sites, outperforming other state-of-the-art prophage prediction methods. Our analysis indicates that temperate phages are likely to function in the microbial evolution by (i) cross-infecting different bacterial host species; (ii) transferring antibiotic resistance and virulence genes and (iii) interacting with hosts through restriction-modification and CRISPR/anti-CRISPR systems. This work provides a comprehensively complete temperate phage genome database and relevant information, which can serve as a valuable resource for phage research.
AB - Temperate phages (active prophages induced from bacteria) help control pathogenicity, modulate community structure, and maintain gut homeostasis. Complete phage genome sequences are indispensable for understanding phage biology. Traditional plaque techniques are inapplicable to temperate phages due to their lysogenicity, curbing their identification and characterization. Existing bioinformatics tools for prophage prediction usually fail to detect accurate and complete temperate phage genomes. This study proposes a novel computational temperate phage detection method (TemPhD) mining both the integrated active prophages and their spontaneously induced forms (temperate phages) from next-generation sequencing raw data. Applying the method to the available dataset resulted in 192 326 complete temperate phage genomes with different host species, expanding the existing number of complete temperate phage genomes by more than 100-fold. The wet-lab experiments demonstrated that TemPhD can accurately determine the complete genome sequences of the temperate phages, with exact flanking sites, outperforming other state-of-the-art prophage prediction methods. Our analysis indicates that temperate phages are likely to function in the microbial evolution by (i) cross-infecting different bacterial host species; (ii) transferring antibiotic resistance and virulence genes and (iii) interacting with hosts through restriction-modification and CRISPR/anti-CRISPR systems. This work provides a comprehensively complete temperate phage genome database and relevant information, which can serve as a valuable resource for phage research.
UR - http://www.scopus.com/inward/record.url?scp=85135697286&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-85135697286&origin=recordpage
U2 - 10.1093/nargab/lqac057
DO - 10.1093/nargab/lqac057
M3 - RGC 21 - Publication in refereed journal
C2 - 35937545
SN - 2631-9268
VL - 4
JO - NAR Genomics and Bioinformatics
JF - NAR Genomics and Bioinformatics
IS - 3
M1 - lqac057
ER -