Research on Generalized Three-Way Decision Classification Models for Breast Cancer Auxiliary Diagnosis

Student thesis: Doctoral Thesis

Abstract

Breast cancer stands as the foremost threat to women's health, and precise early diagnosis plays a pivotal role in mitigating its mortality rate. Disease diagnosis inherently revolves around classification tasks. The three-way decision (3WD) theory proposed by Yao [1] has found extensive application in the classification field. This theory not only partitions samples into positive and negative regions but also allows for deferred decisions in cases of insufficient information, assigning samples to the boundary region for further information supplementation. In practical diagnosis, physicians often opt for deferred diagnosis in cases of ambiguity or uncertainty, gathering additional data through further observation, examination, or experimentation to avert misdiagnosis. Thus, the 3WD theory aligns seamlessly with breast cancer diagnosis. Throughout the breast cancer diagnosis process, diverse forms of diagnostic data emerge, including numerical, image-based, and textual data, each holding unique diagnostic value. Simultaneously, these data typically exhibit high-dimensional characteristics, profoundly impacting the performance of classification models. To enhance the accuracy of early breast cancer diagnosis, this study proposes a feature selection algorithm for eliminating redundant features and constructs three generalized 3WD classification models tailored to different data forms, furnishing dependable "second opinions" for breast cancer auxiliary diagnosis. The specific research contents are as follows:

(1) A novel feature selection algorithm based on robust fuzzy rough set theory is proposed, aiming to eliminate redundant features and enhance robustness. First, a radial basis kernel similarity measure based on a mixed distance formula is defined, which enhances the discriminability of feature subset significance in mixed data environments. Second, a relative classification uncertainty measure is proposed based on posterior probability differences to construct robust fuzzy rough set models and feature selection algorithms, overcoming the barrier of sensitivity to noise. This study establishes technical foundation for optimizing input data in subsequent classification models.

(2) A generalized 3WD classification model is constructed for auxiliary diagnosis of breast cancer numerical data, enhancing the adaptability of model parameters and the objectivity of classification quality evaluation. First, an adaptive D-Gap neighborhood relation for conditional probability calculation is proposed, eliminating the need for predefined neighborhood granularity parameters. Second, by redefining the chi-square statistic to establish an evaluation function, the objectivity of classification quality evaluation for the 3WD model is improved. In parallel, an adaptive genetic algorithm is employed to optimize this objective function to determine the optimal decision threshold. These methods extend the connotation for the special 3WD, collectively termed the generalized 3WD classification model. Finally, validation on multiple breast cancer datasets demonstrates the effectiveness of the proposed generalized 3WD classification model in supporting the diagnosis of breast tumor category.

(3) A sequential three-way decision (S3WD) classification model is developed to aid in the diagnosis of breast X-ray images, aiming to improve image interpretability and offer multi-level dynamic diagnostic support. First, the techniques of region of interest segmentation and radiomics feature extraction are introduced to extract interpretable semantic features from breast X-ray images. Second, a combination of KNN and Bayes rule is proposed to compute the conditional probability of S3WD, providing a novel perspective for quantifying the representativeness of neighboring sample labels. Additionally, intuitive issues within existing definitions of information gain and regret value are addressed and utilized to construct the classification quality evaluation function for the S3WD. This method integrates data-driven objectivity with subjective preferences of physicians' risk aversion, further enhancing the practical utility of the S3WD classification model. Finally, validation on breast X-ray image datasets demonstrates the significant benefits of this model.

(4) A two-stage multi-classification model that integrates a 3WD model with machine learning is constructed for EMR data-based breast disease diagnosis. First, a BERT_BiLSTM_CRF model is designed to identify key diagnostic symptoms or features within EMRs. Second, the 3WD multi-classification model accomplishes the multi-classification task in the first stage, transferring boundary region samples to an ensemble machine learning model in the second stage to enhance the 3WD classification results, thereby completing the final multi-classification task. This two-stage multi-classification model ensures both interpretability of model operation and accuracy improvement, providing a new perspective for the design of auxiliary diagnostic models in complex text situations. Finally, the model is validated using inpatient EMRs, confirming its effectiveness in assisting the diagnosis of various subtypes of breast diseases.
Date of Award28 Nov 2024
Original languageEnglish
Awarding Institution
  • City University of Hong Kong
SupervisorHoi Shou Alan CHAN (Supervisor), Junhua HU (External Supervisor) & Kwai Sang CHIN (Co-supervisor)

Keywords

  • Three-way decisions
  • Classification
  • Auxiliary Diagnosis
  • Breast Cancer

Cite this

'