Louaq - 论文阅读笔记

Image-level weakly supervised semantic segmentation has received increasing attention due to its low annotation cost. Existing methods mainly rely on Class Activation Mapping (CAM) to obtain pseudo-labels for training semantic segmentation models. In this work, we are the first to demonstrate that long-tailed distribution in training data can cause the CAM calculated through classifier weights over-activated for head classes and under-activated for tail classes due to the shared features among head- and tail- classes. This degrades pseudo-label quality and further influences final semantic segmentation performance. To address this issue, we propose a Shared Feature Calibration (SFC) method for CAM generation. Specifically, we leverage the class prototypes that carry positive shared features and propose a Multi-Scaled Distribution-Weighted (MSDW) consistency loss for narrowing the gap between the CAMs generated through classifier weights and class prototypes during training. The MSDW loss counterbalances over-activation and under-activation by calibrating the shared features in head-/tail-class classifier weights. Experimental results show that our SFC significantly improves CAM boundaries and achieves new state-of-the-art performances. The project is available at https://github.com/Barrett-python/SFC.

#SFC #semantic segmentation

WeakCLIP Adapting CLIP for Weakly-Supervised Semantic Segmentation

2025-04-09

未分类

2841 字14 分钟

华中科技大学、西北工业大学

#CLIP #Weakly Supervised Semantic Segmentation

All-pairs Consistency Learning for Weakly Supervised Semantic Segmentation

2025-04-07

未分类

2723 字14 分钟

澳大利亚国立大学、OpenNLPLab、上海人工智能实验室、厦门大学、OPPO研究院

#Weakly Supervised Semantic Segmentation

Self-supervised vision transformers for semantic segmentation

2025-04-03

未分类

2233 字11 分钟

Semantic segmentation is a fundamental task in computer vision and it is a building block of many other vision applications. Nevertheless, semantic segmentation annotations are extremely expensive to collect, so using pre-training to alleviate the need for a large number of labeled samples is appealing. Recently, self-supervised learning (SSL) has shown effectiveness in extracting strong representations and has been widely applied to a variety of downstream tasks. However, most works perform sub-optimally in semantic segmentation because they ignore the specific properties of segmentation: (i) the need of pixel level fine-grained understanding; (ii) with the assistance of global context understanding; (iii) both of the above achieve with the dense self-supervisory signal. Based on these key factors, we introduce a systematic self-supervised pre-training framework for semantic segmentation, which consists of a hierarchical encoder–decoder architecture MEVT for generating high-resolution features with global contextual information propagation and a self-supervised training strategy for learning fine-grained semantic features. In our study, our framework shows competitive performance compared with other main self-supervised pre-training methods for semantic segmentation on COCO-Stuff, ADE20K, PASCAL VOC, and Cityscapes datasets. e.g., MEVT achieves the advantage in linear probing by +1.3 mIoU on PASCAL VOC.

#ViT #semantic segmentation

A Transformer-based Adaptive Prototype Matching Network for Few-Shot Semantic Segmentation

2025-04-02

未分类

2519 字13 分钟

近年来，由于深度学习在计算机视觉领域的快速发展，所以传统的语义分割取得了飞速进步。在这种情况下，少样本分割(few-shot segmentation, FSS)被提出用于模拟有限数据和多类别的真实世界场景。

#Transformer #Few-Shot Semantic Segmentation

DSMF-Net Dual Semantic Metric Learning Fusion Network for Few-Shot Aerial Image Semantic Segmentation

2025-04-02

未分类

2515 字13 分钟

Semantic segmentation of aerial images is crucial yet resource-intensive. Inspired by human ability to learn rapidly, few-shot semantic segmentation offers a promising solution by utilizing limited labeled data for efficient model training and generalization. However, the intrinsic complexities of aerial images, compounded by scarce samples, often result in inadequate feature representation and semantic ambiguity, detracting from themodel’s performance. In this article, we propose to tackle these challenging problems via dual semantic metric learning and multisemantic features fusion and introduce a novel few-shot segmentation Network (DSMF-Net). On the one hand, we consider the inherent semantic gap between the feature of graph and grid structures and metric learning of few-shot segmentation. To exploit multiscale global semantic context, we construct scale-aware graph prototypes from different stages of the feature layers based on graph convolutional networks (GCNs), while also incorporating prior-guided metric learning to further enhance context at the high-level convolution features. On the other hand, we design a pyramid-based fusion and condensa- tion mechanism to adaptively merge and couple the multisemantic information from support and query images. The indication and fusion of different semantic features can effectively emphasize the representation and coupling abilities of the network. We have conducted extensive experiments over the challenging iSAID-5i andDLRSD benchmarks. The experiments have demonstrated our network’s effectiveness and efficiency, yielding on-par performance with the state-of-the-art methods.

#DSMF-Net #Semantic Segmentation

Kill Two Birds with One Stone Domain Generalization for Semantic Segmentation via Network Pruning

2025-04-02

未分类

2577 字13 分钟

::: tip

#Domain Generalization

Stronger, Fewer, & Superior Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation（DGSS）

2025-04-01

未分类

1525 字8 分钟

https://github.com/w1oves/Rein.git