Deep Learning Methods for Computer-Aided Image-Based Medical Diagnosis

基於圖像的電腦輔助醫學診斷中的深度學習方法

Student thesis: Doctoral Thesis

View graph of relations

Author(s)

Related Research Unit(s)

Detail(s)

Awarding Institution
Supervisors/Advisors
Award date14 Aug 2023

Abstract

Computer-aided diagnosis (CAD) has been a major field of research for the past few decades. Medical image analysis is one of the key areas where CAD is used. The advent of deep learning methods has greatly enhanced the accuracy of computer-aided diagnosis and the ability to handle large-scale data. However, in addition to problems that are encountered similarly in natural image processing, such as improving classification and segmentation accuracy, an important challenge in medical image processing is the scarcity of well-annotated data. The feasibility of clinical applications and scenarios are also challenges that must be considered in medical image processing. In addition, generalization is a key concern. Medical data may come from different institutions and the instruments used may be biased by specific parameters and operators.

In this dissertation, focusing on developing more practical CAD techniques via analyzing the medical imaging data based on deep learning methods, two significant issues are investigated. One is the pathological diagnosis of lung adenocarcinoma based on computed tomography (CT) images. The other is lung function estimation and corresponding lung disease diagnosis based on CT scans and metadata. To solve the above issues, four research works are conducted in this thesis, which are presented as follows.

Firstly, considering the pathological classification of lung adenocarcinoma, the bilateral-branch network with knowledge distillation procedure (KDBBN) based on CT images is developed. KDBBN can automatically identify adenocarcinoma categories and detect the lesion area that most likely contributes to the identification of specific types of adenocarcinomas based on lung CT images. In addition, a knowledge distillation process is established for the proposed framework to ensure that the developed models can be applied to different datasets. The results of our comprehensive computational study confirm that our method achieves an AUC of 96.8% ± 1.9%. It provides a reliable basis for adenocarcinoma diagnosis supplementary to the pathological examination.

Secondly, extending the scope of the first study, the entire pathological report of lung adenocarcinoma can be inferred. A self-distillation trained multi-task dense-attention network (SD-MdaNet) is proposed for diagnosing the lung adenocarcinoma histopathologically based on CT images. Inferring the pathological report is divided into two tasks, predicting the invasiveness of the lung tumor and inferring growth patterns of tumor cells in a comprehensive histopathological subtyping manner. In the proposed method, the dense-attention module is introduced to better extract features from a small dataset in the main branch of the multi-task dense-attention network (MdaNet). Next, task-specific attention modules are utilized in different branches and finally integrated as a multi-task model. The second task is a blend of classification and regression tasks. Thus, a specialized loss function is developed. In the proposed knowledge distillation process, the MdaNet as well as its main branch trained for solving two single tasks respectively are treated as multiple teachers to produce a student model. A novel knowledge distillation loss function is developed to take the advantage of all the models as well as data with labels and without labels. Experimental results demonstrate that the proposed SD-MdaNet can significantly improve the performance of the lung adenocarcinoma pathological diagnosis using only CT scans. SD-MdaNet achieves an AUC of 98.7% ± 0.4% on invasiveness prediction, and 91.6% ± 1.0% on predominant growth pattern prediction on our dataset.

Thirdly, lung function estimation based on CT images on all kinds of subjects is studied. It is summarized as an image regression problem, and a transformer-based encoder-decoder structure is presented. First, the projection method is proposed to flatten the 3D image onto a 2D plane, while retaining the position information. Next, the MBConv Transformer is designed to ensure that the transformer can still maintain strong inference capability on small data sets. In computational experiments, the proposed regression model outperforms all state-of-the-art image regression models. The MAE for FVC prediction reaches 0.0498 while for FEV1 reaches 0.0489.

Finally, on the basis of the third study, the bi-directional relationship between lung function and CT features is explored. An integrated invertible deep learning algorithm for lung function estimation based on CT images is developed. First, we propose the projection method to flatten the 3D image onto a 2D plane, while preserving the position information. Next, an encoder-decoder structure is adopted to extract features. The encoder-decoder presented in the third study is adopted to extract the CT feature maps. Next, an invertible Normalizing Flow model to infer lung function based on the extracted features is formulated, and two loss functions for two directions are designed. The proposed bidirectional framework can estimate the lung function based on CT images and metadata, as well as generate the corresponding simulated CT image according to the lung function. In computational experiments, the proposed framework can infer the lung function with excellent accuracy. In addition, the simulated images generated by the generative model can be mixed with real images for regression model training to improve model performance. A comprehensive comparative analysis also demonstrates the effectiveness of using generated images and confirms the superiority of the model. The MAE for FVC prediction without generated images reaches 0.0490 while for FEV1 reaches 0.0484. The framework with 500 generated images achieves an MAE of 0.0370.