TY - JOUR
T1 - Prediction of biomarker-disease associations based on graph attention network and text representation
AU - Yang, Minghao
AU - Huang, Zhi-An
AU - Gu, Wenhao
AU - Han, Kun
AU - Pan, Wenying
AU - Yang, Xiao
AU - Zhu, Zexuan
PY - 2022/9
Y1 - 2022/9
N2 - Motivation: The associations between biomarkers and human diseases play a key role in understanding complex pathology and developing targeted therapies. Wet lab experiments for biomarker discovery are costly, laborious and time-consuming. Computational prediction methods can be used to greatly expedite the identification of candidate biomarkers. Results: Here, we present a novel computational model named GTGenie for predicting the biomarker-disease associations based on graph and text features. In GTGenie, a graph attention network is utilized to characterize diverse similarities of biomarkers and diseases from heterogeneous information resources. Meanwhile, a pretrained BERT-based model is applied to learn the text-based representation of biomarker-disease relation from biomedical literature. The captured graph and text features are then integrated in a bimodal fusion network to model the hybrid entity representation. Finally, inductive matrix completion is adopted to infer the missing entries for reconstructing relation matrix, with which the unknown biomarker-disease associations are predicted. Experimental results on HMDD, HMDAD and LncRNADisease data sets showed that GTGenie can obtain competitive prediction performance with other state-of-the-art methods. Availability: The source code of GTGenie and the test data are available at: https://github.com/Wolverinerine/GTGenie.
AB - Motivation: The associations between biomarkers and human diseases play a key role in understanding complex pathology and developing targeted therapies. Wet lab experiments for biomarker discovery are costly, laborious and time-consuming. Computational prediction methods can be used to greatly expedite the identification of candidate biomarkers. Results: Here, we present a novel computational model named GTGenie for predicting the biomarker-disease associations based on graph and text features. In GTGenie, a graph attention network is utilized to characterize diverse similarities of biomarkers and diseases from heterogeneous information resources. Meanwhile, a pretrained BERT-based model is applied to learn the text-based representation of biomarker-disease relation from biomedical literature. The captured graph and text features are then integrated in a bimodal fusion network to model the hybrid entity representation. Finally, inductive matrix completion is adopted to infer the missing entries for reconstructing relation matrix, with which the unknown biomarker-disease associations are predicted. Experimental results on HMDD, HMDAD and LncRNADisease data sets showed that GTGenie can obtain competitive prediction performance with other state-of-the-art methods. Availability: The source code of GTGenie and the test data are available at: https://github.com/Wolverinerine/GTGenie.
KW - miRNA-disease associations
KW - microbe-disease associations
KW - lncRNA-disease associations
KW - graph attention network
KW - text-based relation representation
KW - bimodal fusion network
KW - HETEROGENEOUS NETWORK
KW - RANDOM-WALK
KW - DATABASE
KW - TARGET
UR - http://www.scopus.com/inward/record.url?scp=85138492008&partnerID=8YFLogxK
UR - http://gateway.isiknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcAuth=LinksAMR&SrcApp=PARTNER_APP&DestLinkType=FullRecord&DestApp=WOS&KeyUT=000834318700001
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-85138492008&origin=recordpage
U2 - 10.1093/bib/bbac298
DO - 10.1093/bib/bbac298
M3 - RGC 21 - Publication in refereed journal
SN - 1467-5463
VL - 23
JO - Briefings in Bioinformatics
JF - Briefings in Bioinformatics
IS - 5
M1 - bbac298
ER -