Skip to content

Code Repository for the ICCV 2025 paper: "Know Your Attention Maps: Class-specific Token Masking for Weakly Supervised Semantic Segmentation".

Notifications You must be signed in to change notification settings

HSG-AIML/TokenMasking-WSSS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Know Your Attention Maps: Class-Specific Token Masking for Weakly Supervised Semantic Segmentation

This repository provides the official implementation of our ICCV 2025 paper: "Know Your Attention Maps: Class-Specific Token Masking for Weakly Supervised Semantic Segmentation". Our approach introduces a simple yet powerful modification of Vision Transformers for Weakly Supervised Semantic Segmentation (WSSS). By assigning one [CLS] token per class, enforcing class-specific masking, and leveraging attention-based class activation, we generate high-resolution pseudo-masks directly from transformer attention—without CAMs or post-processing.

The paper can be found here: ICCV proceedings | arxiv.


Summary

We revisit the role of [CLS] tokens in multi-label classification and show that:

  • A transformer with one [CLS] token per class can learn structured, interpretable attention.
  • Introducing random token masking encourages each class token to specialize.
  • Class-specific attention maps can be converted into dense pseudo-masks, suitable for training segmentation models.
  • Optional attention head pruning (via Hard Concrete gates) further sharpens attention and improves pseudo-mask quality.

The result is a clean, single-stage WSSS pipeline that achieves competitive pseudo-mask quality across diverse domains.


Code Structure

.
├── run.py                        # Main training script (classification + token masking)
├── generate_pseudomasks.py       # Produce pseudo-masks from class tokens + attention
├── model.py                      # ViTWithTokenDropout and supporting modules
├── recorder_tokendropout.py      # Extracts attention maps from ViT layers
├── datasets/
│   ├── dfc.py                    # DFC2020 dataset loader
│   ├── ade.py                    # ADE20K dataset loader
│   └── ...                       # Add your own dataset here
├── checkpoints/                  # Saved checkpoints
├── assets/                      
└── README.md

Training

Training is performed using the unified script run.py.

Example Training Command

A typical configuration for the DFC2020 dataset:

python run.py \
    --dataset dfc \
    --train_batch_size 4 \
    --eval_batch_size 4 \
    --learning_rate 0.000001 \
    --patch_size 16 \
    --opt adam \
    --lr_scheduler \
    --imgsize 224 224 \
    --num_channels 13 \
    --num_classes 8 \
    --num_epochs 500 \
    --arch tokendropout \
    --diversify \
    --exp_name dropout_token \
    --dp_rate 0.0

This launches training with:

  • multi-class token ViT architecture,
  • random token masking,
  • optional attention-head pruning,
  • logging and checkpointing

Generating Pseudo-Masks

Once training is completed, pseudo-masks can be generated with:

python generate_pseudomasks.py --dataset dfc --checkpoint <path_to_ckpt>

This script:

  1. Loads the trained ViTWithTokenDropout model
  2. Extracts class-specific attention maps
  3. Converts attention into dense pseudo-masks
  4. Saves:
pms_<dataset>.npy    # pseudo-masks
imgs_<dataset>.npy   # raw images
masks_<dataset>.npy  # ground-truth masks (if provided)

Reference

If you use this repository in your research, please cite:

@InProceedings{Hanna_2025_ICCV,
    author    = {Hanna, Jo\"elle and Borth, Damian},
    title     = {Know Your Attention Maps: Class-specific Token Masking for Weakly Supervised Semantic Segmentation},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2025},
    pages     = {23763-23772}
}

Contact

For questions, issues, or discussions:

Joëlle Hanna University of St. Gallen [email protected]

Code

This repository incorporates code from the following sources:

About

Code Repository for the ICCV 2025 paper: "Know Your Attention Maps: Class-specific Token Masking for Weakly Supervised Semantic Segmentation".

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages