Coding genomes with gapped pattern graph convolutional network
Research output: Journal Publications and Reviews › RGC 21 - Publication in refereed journal › peer-review
Author(s)
Related Research Unit(s)
Detail(s)
Original language | English |
---|---|
Article number | btae188 |
Journal / Publication | Bioinformatics |
Volume | 40 |
Issue number | 4 |
Online published | 11 Apr 2024 |
Publication status | Published - Apr 2024 |
Link(s)
DOI | DOI |
---|---|
Attachment(s) | Documents
Publisher's Copyright Statement
|
Link to Scopus | https://www.scopus.com/record/display.uri?eid=2-s2.0-85191103227&origin=recordpage |
Permanent Link | https://scholars.cityu.edu.hk/en/publications/publication(03bedb53-4005-4279-a3a8-2efade242127).html |
Abstract
Motivation: Genome sequencing technologies reveal a huge amount of genomic sequences. Neural network-based methods can be prime candidates for retrieving insights from these sequences because of their applicability to large and diverse datasets. However, the highly variable lengths of genome sequences severely impair the presentation of sequences as input to the neural network. Genetic variations further complicate tasks that involve sequence comparison or alignment. Results: Inspired by the theory and applications of “spaced seeds,” we propose a graph representation of genome sequences called “gapped pattern graph.” These graphs can be transformed through a Graph Convolutional Network to form lower-dimensional embeddings for downstream tasks. On the basis of the gapped pattern graphs, we implemented a neural network model and demonstrated its performance on diverse tasks involving microbe and mammalian genome data. Our method consistently outperformed all the other state-of-the-art methods across various metrics on all tasks, especially for the sequences with limited homology to the training data. In addition, our model was able to identify distinct gapped pattern signatures from the sequences. © The Author(s) 2024. Published by Oxford University Press.
Research Area(s)
Bibliographic Note
Research Unit(s) information for this publication is provided by the author(s) concerned.
Citation Format(s)
Coding genomes with gapped pattern graph convolutional network. / Wang, Ruo Han; Ng, Yen Kaow; Zhang, Xianglilan et al.
In: Bioinformatics, Vol. 40, No. 4, btae188, 04.2024.
In: Bioinformatics, Vol. 40, No. 4, btae188, 04.2024.
Research output: Journal Publications and Reviews › RGC 21 - Publication in refereed journal › peer-review
Download Statistics
No data available