Algorithms and Models for Local Genomics Map of Oncogenic Virus Integration

Project: Research

View graph of relations

Description

Viruses cause up to 15% of human cancers. Viruses persist in hosts through integratingtheir viral genome into the host cell genome, or by expressing proteins which segregatetheir genome into cells during cell partitioning. We are mainly interested in theintegration of viral genomes into the host genome. Knowing exactly how an oncogenicvirus transforms a host genome is important for us to understand how the viruscompromises the host genome’s functions. The general availability of next generationsequencing (NGS) methods has made possible several large-scale studies to identify theintegration profiles of oncogenic viruses in cancer samples. This paved the way for anatural step forward. That is, to investigate the exact mechanism for an oncogenic virusto transform its host genome—knowing this will help us understand the molecular andbiological processes of virus integration, and the virus-host interaction in diseasedevelopment. In this proposal, we develop algorithms and tools for the automaticreconstruction of integration sites in viral-infected genome from NGS data. We refer tothe local portion of a host genome with the integrated virus as the virus integration localgenomic map (LGM). Our algorithms and tools will accept as input NGS data from ahost, and output probabilistically significant LGMs. An LGM consists of intervenedgenome segments from the host and the virus (integrant), modeled as a graph structurewhere the vertices represent short DNA segments. Compared to a linear structure, agraph structure is able to capture more details of the virus integration mechanism, andrepresents the current consensus among computational bioinformatians in referencinghuman genomes. Our approach will attempt to address the heterozygous problem incancer tissues from NGS data, through probabilistic evaluation and the incorporation ofadditional information from other data sources such as PacBio, single cell data, orRNAseq data. Generalizing the LGMs will allow us to decode the core structure and panstructures for the integration mechanism of a specific virus across different hosts. Wepropose a model called consensus graph to capture these information, and developalgorithms and tools to construct such consensus graph out of multiple LGMs. Finally,we create tools that generate visualizations of the consensus graphs. Our tools will bemade available to the research community through a web-based interface. We haveplans to study four oncogenic viruses with the tools. The team has implemented a proof-of-concept system and performed preliminary investigations on the database.

Detail(s)

Project number9042348
Grant typeGRF
StatusFinished
Effective start/end date1/10/168/09/20

    Research areas

  • bioinformatics , computational biology , algorithms , genome sequence , virus