Canary: an automated tool for the conversion of MaCH imputed dosage files to PLINK files

Adam N. Bennett, Jethro Rainford, Xiaotai Huang, Qian He, Kei Hang Katie Chan*

*Corresponding author for this work

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

1 Citation (Scopus)
59 Downloads (CityUHK Scholars)

Abstract

Background: Previous studies have demonstrated the value of re-analysing publicly available genetics data with recent analytical approaches. Publicly available datasets, such as the Women’s Health Initiative (WHI) offered by the database of genotypes and phenotypes (dbGaP), provide a wealthy resource for researchers to perform multiple analyses, including Genome-Wide Association Studies. Often, the genetic information of individuals in these datasets are stored in imputed dosage files output by MaCH; mldose and mlinfo files. In order for researchers to perform GWAS studies with this data, they must first be converted to a file format compatible with their tool of choice e.g., PLINK. Currently, there is no published tool which easily converts the datasets provided in MACH dosage files into PLINK-ready files. Results: Herein, we present Canary a singularity-based tool which converts MaCH dosage files into PLINK-compatible files with a single line of user input at the command line. Further, we provide a detailed tutorial on preparation of phenotype files. Moreover, Canary comes with preinstalled software often used during GWAS studies, to further increase the ease-of-use of HPC systems for researchers. Conclusions: Until now, conversion of imputed data in the form of MaCH mldose and mlinfo files needed to be completed manually. Canary uses singularity container technology to allow users to automatically convert these MaCH files into PLINK compatible files. Additionally, Canary provides researchers with a platform to conduct GWAS analysis more easily as it contains essential software needed for conducting GWAS studies, such as PLINK and Bioconductor. We hope that this tool will greatly increase the ease at which researchers can perform GWAS with imputed data, particularly on HPC environments.
Original languageEnglish
Article number304
JournalBMC Bioinformatics
Volume23
Online published27 Jul 2022
DOIs
Publication statusPublished - 2022

Research Keywords

  • MaCH
  • Imputed data
  • GWAS
  • Dosage file
  • PLINK
  • GENOME-WIDE ASSOCIATION
  • GENOTYPE IMPUTATION
  • QUALITY-CONTROL
  • DISCOVERY

Publisher's Copyright Statement

  • This full text is made available under CC-BY 4.0. https://creativecommons.org/licenses/by/4.0/

Fingerprint

Dive into the research topics of 'Canary: an automated tool for the conversion of MaCH imputed dosage files to PLINK files'. Together they form a unique fingerprint.

Cite this