Reducing healthcare disparities using multiple multiethnic data distributions with fine-tuning of transfer learning

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

3 Scopus Citations
View graph of relations



Original languageEnglish
Article numberbbac078
Journal / PublicationBriefings in Bioinformatics
Issue number3
Online published21 Mar 2022
Publication statusPublished - May 2022


Healthcare disparities in multiethnic medical data is a major challenge; the main reason lies in the unequal data distribution of ethnic groups among data cohorts. Biomedical data collected from different cancer genome research projects may consist of mainly one ethnic group, such as people with European ancestry. In contrast, the data distribution of other ethnic races such as African, Asian, Hispanic, and Native Americans can be less visible than the counterpart. Data inequality in the biomedical field is an important research problem, resulting in the diverse performance of machine learning models while creating healthcare disparities. Previous researches have reduced the healthcare disparities only using limited data distributions. In our study, we work on fine-tuning of deep learning and transfer learning models with different multiethnic data distributions for the prognosis of 33 cancer types. In previous studies, to reduce the healthcare disparities, only a single ethnic cohort was used as the target domain with one major source domain. In contrast, we focused on multiple ethnic cohorts as the target domain in transfer learning using the TCGA and MMRF CoMMpass study datasets. After performance comparison for experiments with new data distributions, our proposed model shows promising performance for transfer learning schemes compared to the baseline approach for old and new data distributation experiments.

Research Area(s)

  • ethnic disparities, transfer learning, domain adaptation, deep learning, NETWORK