TY - JOUR
T1 - FAFuse
T2 - A Four-Axis Fusion framework of CNN and Transformer for medical image segmentation
AU - Xu, Shoukun
AU - Xiao, Dehao
AU - Yuan, Baohua
AU - Liu, Yi
AU - Wang, Xueyuan
AU - Li, Ning
AU - Shi, Lin
AU - Chen, Jialu
AU - Zhang, Ju-Xiao
AU - Wang, Yanhao
AU - Cao, Jianfeng
AU - Shao, Yeqin
AU - Jiang, Mingjie
PY - 2023/11
Y1 - 2023/11
N2 - Medical image segmentation is crucial for accurate diagnosis and treatment in the medical field. In recent years, convolutional neural networks (CNNs) and Transformers have been frequently adopted as network architectures in medical image segmentation. The convolution operation is limited in modeling long-range dependencies because it can only extract local information through the limited receptive field. In comparison, Transformers demonstrate excellent capability in modeling long-range dependencies but are less effective in capturing local information. Hence, effectively modeling long-range dependencies while preserving local information is essential for accurate medical image segmentation. In this paper, we propose a four-axis fusion framework called FAFuse, which can exploit the advantages of CNN and Transformer. As the core component of our FAFuse, a Four-Axis Fusion module (FAF) is proposed to efficiently fuse global and local information. FAF combines Four-Axis attention (height, width, main diagonal, and counter diagonal axial attention), a multi-scale convolution, and a residual structure with a depth-separable convolution and a Hadamard product. Furthermore, we also introduce deep supervision to enhance gradient flow and improve overall performance. Our approach achieves state-of-the-art segmentation accuracy on three publicly available medical image segmentation datasets. The code is available at https://github.com/cczu-xiao/FAFuse. © 2023 Elsevier Ltd.
AB - Medical image segmentation is crucial for accurate diagnosis and treatment in the medical field. In recent years, convolutional neural networks (CNNs) and Transformers have been frequently adopted as network architectures in medical image segmentation. The convolution operation is limited in modeling long-range dependencies because it can only extract local information through the limited receptive field. In comparison, Transformers demonstrate excellent capability in modeling long-range dependencies but are less effective in capturing local information. Hence, effectively modeling long-range dependencies while preserving local information is essential for accurate medical image segmentation. In this paper, we propose a four-axis fusion framework called FAFuse, which can exploit the advantages of CNN and Transformer. As the core component of our FAFuse, a Four-Axis Fusion module (FAF) is proposed to efficiently fuse global and local information. FAF combines Four-Axis attention (height, width, main diagonal, and counter diagonal axial attention), a multi-scale convolution, and a residual structure with a depth-separable convolution and a Hadamard product. Furthermore, we also introduce deep supervision to enhance gradient flow and improve overall performance. Our approach achieves state-of-the-art segmentation accuracy on three publicly available medical image segmentation datasets. The code is available at https://github.com/cczu-xiao/FAFuse. © 2023 Elsevier Ltd.
KW - CNN
KW - Feature fusion
KW - Four-Axis attention
KW - Medical image segmentation
KW - Transformer
UR - http://www.scopus.com/inward/record.url?scp=85174063510&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-85174063510&origin=recordpage
U2 - 10.1016/j.compbiomed.2023.107567
DO - 10.1016/j.compbiomed.2023.107567
M3 - RGC 21 - Publication in refereed journal
SN - 0010-4825
VL - 166
JO - Computers in Biology and Medicine
JF - Computers in Biology and Medicine
M1 - 107567
ER -