重要通知 今晚23:00-24:00 进行维护,期间可能无法访问
Cross-Modality Interaction Network for Medical Image Fusion
Multi-modal medical image fusion maximizes the complementary information from diverse modality images by integrating source images. The fused medical image could offer enhanced richness and improved accuracy compared to the source images. Unfortunately, the existing deep learning-based medical image fusion methods generally rely on convolutional operations, which may not effectively capture global information such as spatial relationships or shape features within and across image modalities. To address this problem, we propose a unified AI-Generated Content (AIGC)-based medical image fusion, termed Cross-Modal Interactive Network (CMINet). The CMINet integrates a recursive transformer with an interactive Convolutional Neural Network. Specifically, the recursive transformer is designed to capture extended spatial and temporal dependencies within modalities, while the interactive CNN aims to extract and merge local features across modalities. Benefiting from cross-modality interaction learning, the proposed method can generate fused images with rich structural and functional information. Additionally, the architecture of the recursive network is structured to reduce parameter count, which could be beneficial for deployment on resource-constrained devices. Comprehensive experiments on multi-model medical images (MRI and CT, MRI and PET, and MRI and SPECT) demonstrate that the proposed method outperforms the state-ofthe-art fusion methods subjectively and objectively.
A cascaded framework with cross-modality transfer learning for whole heart segmentation
Automatic and accurate segmentation of the whole heart structure from 3D cardiac images plays an important role in helping physicians diagnose and treat cardiovascular disease. However, the time-consuming and laborious manual labeling of the heart images results in the inefficiency of utilizing the existing CT or MRI for training the deep learning network, which decrease the accuracy of whole heart segmentation. However, multi-modality data contains multi-level information of cardiac images due to different imaging mechanisms, which is beneficial to improve the segmentation accuracy. Therefore, this paper proposes a cascaded framework with cross-modality transfer learning for whole heart segmentation (CM-TranCaF), which consists of three key modules: modality transfer network (MTN), U-shaped multi-attention network (MAUNet) and spatial configuration network (SCN). In MTN, MRI images are transferred from MRI domain to CT domain, to increase the data volume by adopting the idea of adversarial training. The MAUNet is designed based on UNet, while the attention gates (AGs) are integrated into the skip connection to reduce the weight of background pixels. Moreover, to solve the problem of boundary blur, the position attention block (PAB) is also integrated into the bottom layer to aggregate similar features. Finally, the SCN is used to finetune the segmentation results by utilizing the anatomical information between different cardiac substructures. By evaluating the proposed method on the dataset of the MM-WHS challenge, CM-TranCaF achieves a Dice score of 91.1% on the testing dataset. The extensive experimental results prove the effectiveness of the proposed method compared to other state-of-the-art methods.
Brain tumor segmentation based on the dual-path network of multi-modal MRI images
Because of the tumor with infiltrative growth, the glioma boundary is usually fused with the brain tissue, which leads to the failure of accurately segmenting the brain tumor structure through single-modal images. The multi-modal ones are relatively complemented to the inherent heterogeneity and external boundary, which provide complementary features and outlines. Besides, it can retain the structural characteristics of brain diseases from multi angles. However, due to the particularity of multi-modal medical image sampling that increases uneven data density and dense structural vascular tumor mitosis, the glioma may have atypical boundary fuzzy and more noise. To solve this problem, in this paper, the dualpath network based on multi-modal feature fusion (MFF-DNet) is proposed. Firstly, the proposed network uses different kernels multiplexing methods to realize the combination of the large-scale perceptual domain and the non-linear mapping features, which effectively enhances the coherence of information flow. Then, the over-lapping frequency and the vanishing gradient phenomenon are reduced by the residual connection and the dense connection, which alleviate the mutual influence of multi-modal channels. Finally, a dual-path model based on the DenseNet network and the feature pyramid networks (FPN) is established to realize the fusion of low-level, middle-level, and high-level features. Besides, it increases the diversification of glioma non-linear structural features and improves the segmentation precision. A large number of ablation experiments show the effectiveness of the proposed model. The precision of the whole brain tumor and the core tumor can reach 0.92 and 0.90, respectively.
FPL+ Filtered Pseudo Label-Based Unsupervised Cross-Modality Adaptation for 3D Medical Image Segmentation
Adapting a medical image segmentation model to a new domain is important for improving its cross-domain transferability, and due to the expensive annotation process, Unsupervised Domain Adaptation (UDA) is appeal-
ing where only unlabeled images are needed for the adaptation. Existing UDA methods are mainly based on image or feature alignment with adversarial training for regularization, and they are limited by insufficient supervision in the target domain. In this paper, we propose an enhanced Filtered Pseudo Label (FPL+)-based UDA method for 3D medical image segmentation. It first uses cross-domain data augmentation to translate labeled images in the source domain to a dual-domain training set consisting of a pseudo source-domain set andapseudo target-domain set. To leverage the dual-domain augmented images to train a pseudo label generator, domain-specific batch normalization layers are used to deal with the domain shift while learning the
domain-invariant structure features, generating high-quality pseudo labels for target-domain images. We then combine labeled source-domain images and target-domain images with pseudo labels to train a final segmentor, where image-level weighting based on uncertainty estimation and pixel-level weighting based on dual-domain consensus are proposed to mitigate the adverse effect of noisy pseudo labels. Experiments on three public multi-modal datasets for Vestibular Schwannoma, brain tumor and whole heart segmentation show that our method surpassed ten state-of-the-art UDA methods, and it even achieved better results than fully supervised learning in the target domain in some cases.
Flexible Fusion Network for Multi-Modal Brain Tumor Segmentation
Automated brain tumor segmentation is crucial for aiding brain disease diagnosis and evaluating disease progress. Currently, magnetic resonance imaging (MRI) is a routinely adopted approach in the field of brain tumor segmentation that can provide different modality images. It is critical to leverage multi-modal images to boost brain tumor segmentation performance. Existing works commonly concentrate on generating a shared representation by fusing multi-modal data, while few methods take into account modality-specific characteristics. Besides, how to efficiently fuse arbitrary numbers of modalities is still a difficult task. In this study, we present a flexible fusion network (termed F2Net) for multi-modal brain tumor segmentation, which can flexibly fuse arbitrary numbers of multi-modal information to explore complementary information while maintaining the specific characteristics of each modality. Our F2Net is based on the encoder-decoder structure, which utilizes two Transformer-based feature learning streams and a cross-modal shared learning network to extract individual and shared feature representations. To effectively integrate the knowledge from the multi-modality data, we propose a cross-modal feature enhanced module (CFM) and a multi-modal collaboration module (MCM), which aims at fusing the multi-modal features into the shared learning network and incorporating the features from encoders into the shared decoder, respectively. Extensive experimental results on multiple benchmark datasets demonstrate the effectiveness of our F2Net over other state-of-the-art segmentation methods.
MACTFusion Lightweight Cross Transformer for Adaptive Multimodal Medical Image Fusion
Multimodal medical image fusion aims to integrate complementary information from different modalities of medical images. Deep learning methods, especially recent vision Transformers, have effectively improved image fusion performance. However, there are limitations for Transformers in image fusion, such as lacks of local feature extraction and cross-modal feature interaction, resulting in insufficient multimodal feature extraction and integration. In addition, the computational cost of Transformers is higher. To address these challenges, in this work, we develop an adaptive cross-modal fusion strategy for unsupervised multimodal medical image fusion. Specifically, we propose a novel lightweight cross Transformer based on cross multi-axis attention mechanism. It includes cross-window attention and cross-grid attention to mine and integrate both local and global interactions of multimodal features. The cross Transformer is further guided by a spatial adaptation fusion module, which allows the model to focus on the most relevant information. Moreover, we design a special feature extraction module that combines multiple gradient residual dense convolutional and Transformer layers to obtain local features from coarse to fine and capture global features. The proposed strategy significantly boosts the fusion performance while minimizing computational costs. Extensive experiments, including clinical brain tumor image fusion, have shown that our model can achieve clearer texture details and better visual quality than other state-of-the-art fusion methods.
MLFuse Multi-Scenario Feature Joint Learning for Multi-Modality Image Fusion
Multi-modality image fusion (MMIF) entails synthesizing images with detailed textures and prominent objects. Existing methods tend to use general feature extraction to handle different fusion tasks. However, these methods have difficulty breaking fusion barriers across various modalities owing to the lack of targeted learning routes. In this work, we propose a multi-scenario feature joint learning architecture, MLFuse, that employs the commonalities of multi-modality images to deconstruct the fusion progress. Specifically, we construct a cross-modal knowledge reinforcing network that adopts a multipath calibration strategy to promote information communication between different images. In addition, two professional networks are developed to maintain the salient and textural information of fusion results. The spatial-spectral domain optimizing network can learn the vital relationship of the source image context with the help of spatial attention and spectral attention. The edge-guided learning network utilizes the convolution operations of various receptive fields to capture image texture information. The desired fusion results are obtained by aggregating the outputs from the three networks. Extensive experiments demonstrate the superiority of MLFuse for infrared-visible image fusion and medical image fusion. The excellent results of downstream tasks (i.e., object detection and semantic segmentation) further verify the high-quality fusion performance of our method. The code is publicly available at https://github.com/jialei-sc/MLFuse
A nested self-supervised learning framework for 3-D semantic segmentation-driven multi-modal medical image fusion
The successful fusion of 3-D multi-modal medical images depends on both specific characteristics unique to each imaging mode as well as consistent spatial semantic features among all modes. However, the inherent variability in the appearance of these images poses a significant challenge to reliable learning of semantic information. To address this issue, this paper proposes a nested self-supervised learning framework for 3-D semantic segmentation-driven multi-modal medical image fusion. The proposed approach utilizes contrastive learning to effectively extract specified multi-scale features from each mode using U-Net (CU-Net). Subsequently, it employs geometric spatial consistency learning through a fusion convolutional decoder (FCD) and a geometric matching network (GMN) to ensure consistent acquisition of semantic representation within the same 3-D regions across multiple modalities. Additionally, a hybrid multi-level loss is introduced to facilitate the learning process of fused images. Ultimately, we leverage optimally specified multi-modal features for fusion and brain tumor lesion segmentation. The proposed approach enables cooperative learning between 3-D fusion and segmentation tasks by employing an innovative nested self-supervised strategy, thereby successfully striking a harmonious balance between semantic consistency and visual specificity during the extraction of multi-modal features. The fusion results demonstrated a mean classification SSIM, PSNR, NMI,and SFR of 0.9310, 27.8861, 1.5403, and 1.0896 respectively. The segmentation results revealed a mean classification Dice, sensitivity (Sen), specificity (Spe), and accuracy (Acc) of 0.8643, 0.8736, 0.9915, and 0.9911 correspondingly. The experimental findings demonstrate that our approach outperforms 11 other state-of-the-art fusion methods and 5 classical U-Net-based segmentation methods in terms of 4 objective metrics and qualitative evaluation. The code of the proposed method is available at https://github.com/ImZhangyYing/NLSF.