Abstract
Motivation: The early detection of cancer through accessible blood tests can foster early patient interventions. Although there are developments in cancer detection from cell-free DNA (cfDNA), its accuracy remains speculative. Given its central importance with broad impacts, we aspire to address the challenge.
Methods: A bagging Ensemble Meta Classifier (CancerEMC) is proposed for early cancer detection based on circulating protein biomarkers and mutations in cfDNA from the blood. CancerEMC is generally designed for both binary cancer detection and multi-class cancer type localization. It can address the class imbalance problem in multi-analyte blood test data based on robust oversampling and adaptive synthesis techniques.
Results: Based on the clinical blood test data, we observe that the proposed CancerEMC has outperformed other algorithms and state-of-the-arts studies (including CancerSEEK published in Science, 2018) for cancer detection. The results reveal that our proposed method (i.e., CancerEMC) can achieve the best performance result for both binary cancer classification with 99.1748% accuracy (AUC = 0.999) and localized multiple cancer detection with 74.1214% accuracy (AUC = 0.938). For addressing the data imbalance issue with oversampling techniques, the accuracy can be increased to 91.4966% (AUC = 0.992), where the state-of-the-art method can only be estimated at 69.64% (AUC = 0.921). Similar results can also be observed on independent and isolated testing data.
Methods: A bagging Ensemble Meta Classifier (CancerEMC) is proposed for early cancer detection based on circulating protein biomarkers and mutations in cfDNA from the blood. CancerEMC is generally designed for both binary cancer detection and multi-class cancer type localization. It can address the class imbalance problem in multi-analyte blood test data based on robust oversampling and adaptive synthesis techniques.
Results: Based on the clinical blood test data, we observe that the proposed CancerEMC has outperformed other algorithms and state-of-the-arts studies (including CancerSEEK published in Science, 2018) for cancer detection. The results reveal that our proposed method (i.e., CancerEMC) can achieve the best performance result for both binary cancer classification with 99.1748% accuracy (AUC = 0.999) and localized multiple cancer detection with 74.1214% accuracy (AUC = 0.938). For addressing the data imbalance issue with oversampling techniques, the accuracy can be increased to 91.4966% (AUC = 0.992), where the state-of-the-art method can only be estimated at 69.64% (AUC = 0.921). Similar results can also be observed on independent and isolated testing data.
| Original language | English |
|---|---|
| Pages (from-to) | 3319–3327 |
| Journal | Bioinformatics |
| Volume | 37 |
| Issue number | 19 |
| Online published | 30 Jan 2021 |
| DOIs | |
| Publication status | Published - 1 Oct 2021 |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 3 Good Health and Well-being
RGC Funding Information
- RGC-funded
Fingerprint
Dive into the research topics of 'CancerEMC: frontline non-invasive cancer screening from circulating protein biomarkers and mutations in cell-free DNA'. Together they form a unique fingerprint.Projects
- 2 Finished
-
HMRF: Development of Big Data Tools for High-Throughput Sequencing Data with Applications to Colorectal Cancer Genomes
WONG, K. C. (Principal Investigator / Project Coordinator) & WANG, X. (Co-Investigator)
1/09/20 → 13/11/23
Project: Research
-
GRF: Heterodimeric DNA Motif Synthesis and Validations
WONG, K. C. (Principal Investigator / Project Coordinator) & SONG, Y. Q. (Co-Investigator)
1/12/18 → 29/11/22
Project: Research
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver