TY - JOUR
T1 - Canary
T2 - an automated tool for the conversion of MaCH imputed dosage files to PLINK files
AU - Bennett, Adam N.
AU - Rainford, Jethro
AU - Huang, Xiaotai
AU - He, Qian
AU - Chan, Kei Hang Katie
PY - 2022
Y1 - 2022
N2 - Background: Previous studies have demonstrated the value of re-analysing publicly available genetics data with recent analytical approaches. Publicly available datasets, such as the Women’s Health Initiative (WHI) offered by the database of genotypes and phenotypes (dbGaP), provide a wealthy resource for researchers to perform multiple analyses, including Genome-Wide Association Studies. Often, the genetic information of individuals in these datasets are stored in imputed dosage files output by MaCH; mldose and mlinfo files. In order for researchers to perform GWAS studies with this data, they must first be converted to a file format compatible with their tool of choice e.g., PLINK. Currently, there is no published tool which easily converts the datasets provided in MACH dosage files into PLINK-ready files. Results: Herein, we present Canary a singularity-based tool which converts MaCH dosage files into PLINK-compatible files with a single line of user input at the command line. Further, we provide a detailed tutorial on preparation of phenotype files. Moreover, Canary comes with preinstalled software often used during GWAS studies, to further increase the ease-of-use of HPC systems for researchers. Conclusions: Until now, conversion of imputed data in the form of MaCH mldose and mlinfo files needed to be completed manually. Canary uses singularity container technology to allow users to automatically convert these MaCH files into PLINK compatible files. Additionally, Canary provides researchers with a platform to conduct GWAS analysis more easily as it contains essential software needed for conducting GWAS studies, such as PLINK and Bioconductor. We hope that this tool will greatly increase the ease at which researchers can perform GWAS with imputed data, particularly on HPC environments.
AB - Background: Previous studies have demonstrated the value of re-analysing publicly available genetics data with recent analytical approaches. Publicly available datasets, such as the Women’s Health Initiative (WHI) offered by the database of genotypes and phenotypes (dbGaP), provide a wealthy resource for researchers to perform multiple analyses, including Genome-Wide Association Studies. Often, the genetic information of individuals in these datasets are stored in imputed dosage files output by MaCH; mldose and mlinfo files. In order for researchers to perform GWAS studies with this data, they must first be converted to a file format compatible with their tool of choice e.g., PLINK. Currently, there is no published tool which easily converts the datasets provided in MACH dosage files into PLINK-ready files. Results: Herein, we present Canary a singularity-based tool which converts MaCH dosage files into PLINK-compatible files with a single line of user input at the command line. Further, we provide a detailed tutorial on preparation of phenotype files. Moreover, Canary comes with preinstalled software often used during GWAS studies, to further increase the ease-of-use of HPC systems for researchers. Conclusions: Until now, conversion of imputed data in the form of MaCH mldose and mlinfo files needed to be completed manually. Canary uses singularity container technology to allow users to automatically convert these MaCH files into PLINK compatible files. Additionally, Canary provides researchers with a platform to conduct GWAS analysis more easily as it contains essential software needed for conducting GWAS studies, such as PLINK and Bioconductor. We hope that this tool will greatly increase the ease at which researchers can perform GWAS with imputed data, particularly on HPC environments.
KW - MaCH
KW - Imputed data
KW - GWAS
KW - Dosage file
KW - PLINK
KW - GENOME-WIDE ASSOCIATION
KW - GENOTYPE IMPUTATION
KW - QUALITY-CONTROL
KW - DISCOVERY
UR - http://www.scopus.com/inward/record.url?scp=85134878861&partnerID=8YFLogxK
UR - http://gateway.isiknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcAuth=LinksAMR&SrcApp=PARTNER_APP&DestLinkType=FullRecord&DestApp=WOS&KeyUT=000831227200002
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-85134878861&origin=recordpage
U2 - 10.1186/s12859-022-04822-8
DO - 10.1186/s12859-022-04822-8
M3 - RGC 21 - Publication in refereed journal
C2 - 35896971
SN - 1471-2105
VL - 23
JO - BMC Bioinformatics
JF - BMC Bioinformatics
M1 - 304
ER -