Causal Inference in Transcriptome-Wide Association Studies with Invalid Instruments and GWAS Summary Data
Research output: Journal Publications and Reviews › RGC 21 - Publication in refereed journal › peer-review
Author(s)
Detail(s)
Original language | English |
---|---|
Pages (from-to) | 1525–1537 |
Journal / Publication | Journal of the American Statistical Association |
Volume | 118 |
Issue number | 543 |
Online published | 17 Mar 2023 |
Publication status | Published - 2023 |
Externally published | Yes |
Link(s)
Abstract
Transcriptome-Wide Association Studies (TWAS) have recently emerged as a popular tool to discover (putative) causal genes by integrating an outcome GWAS dataset with another gene expression/transcriptome GWAS (called eQTL) dataset. In our motivating and target application, we’d like to identify causal genes for Low-Density Lipoprotein cholesterol (LDL), which is crucial for developing new treatments for hyperlipidemia and cardiovascular diseases. The statistical principle underlying TWAS is (two-sample) two-stage least squares (2SLS) using multiple correlated SNPs as instrumental variables (IVs); it is closely related to typical (two-sample) Mendelian randomization (MR) using independent SNPs as IVs, which is expected to be impractical and lower-powered for TWAS (and some other) applications. However, often some of the SNPs used may not be valid IVs, for example, due to the widespread pleiotropy of their direct effects on the outcome not mediated through the gene of interest, leading to false conclusions by TWAS (or MR). Building on recent advances in sparse regression, we propose a robust and efficient inferential method to account for both hidden confounding and some invalid IVs via two-stage constrained maximum likelihood (2ScML), an extension of 2SLS. We first develop the proposed method with individual-level data, then extend it both theoretically and computationally to GWAS summary data for the most popular two-sample TWAS design, to which almost all existing robust IV regression methods are however not applicable. We show that the proposed method achieves asymptotically valid statistical inference on causal effects, demonstrating its wider applicability and superior finite-sample performance over the standard 2SLS/TWAS (and MR). We apply the methods to identify putative causal genes for LDL by integrating large-scale lipid GWAS summary data with eQTL data. Supplementary materials for this article are available online. © 2023 American Statistical Association.
Research Area(s)
- 2SLS, Causal inference, Genome-wide association studies, Mendelian randomization (MR), Reference panel, SNP, Truncated L 1-constraint (TLC)
Citation Format(s)
Causal Inference in Transcriptome-Wide Association Studies with Invalid Instruments and GWAS Summary Data. / Xue, Haoran; Shen, Xiaotong; Pan, Wei.
In: Journal of the American Statistical Association, Vol. 118, No. 543, 2023, p. 1525–1537.
In: Journal of the American Statistical Association, Vol. 118, No. 543, 2023, p. 1525–1537.
Research output: Journal Publications and Reviews › RGC 21 - Publication in refereed journal › peer-review