Know Your Attention Maps: Class-Specific Token Masking for Weakly Supervised Semantic Segmentation

This repository provides the official implementation of our ICCV 2025 paper: "Know Your Attention Maps: Class-Specific Token Masking for Weakly Supervised Semantic Segmentation". Our approach introduces a simple yet powerful modification of Vision Transformers for Weakly Supervised Semantic Segmentation (WSSS). By assigning one [CLS] token per class, enforcing class-specific masking, and leveraging attention-based class activation, we generate high-resolution pseudo-masks directly from transformer attention—without CAMs or post-processing.

The paper can be found here: ICCV proceedings | arxiv.

Summary

We revisit the role of [CLS] tokens in multi-label classification and show that:

A transformer with one [CLS] token per class can learn structured, interpretable attention.
Introducing random token masking encourages each class token to specialize.
Class-specific attention maps can be converted into dense pseudo-masks, suitable for training segmentation models.
Optional attention head pruning (via Hard Concrete gates) further sharpens attention and improves pseudo-mask quality.

The result is a clean, single-stage WSSS pipeline that achieves competitive pseudo-mask quality across diverse domains.

Code Structure

.
├── run.py                        # Main training script (classification + token masking)
├── generate_pseudomasks.py       # Produce pseudo-masks from class tokens + attention
├── model.py                      # ViTWithTokenDropout and supporting modules
├── recorder_tokendropout.py      # Extracts attention maps from ViT layers
├── datasets/
│   ├── dfc.py                    # DFC2020 dataset loader
│   ├── ade.py                    # ADE20K dataset loader
│   └── ...                       # Add your own dataset here
├── checkpoints/                  # Saved checkpoints
├── assets/                      
└── README.md

Training

Training is performed using the unified script run.py.

Example Training Command

A typical configuration for the DFC2020 dataset:

python run.py \
    --dataset dfc \
    --train_batch_size 4 \
    --eval_batch_size 4 \
    --learning_rate 0.000001 \
    --patch_size 16 \
    --opt adam \
    --lr_scheduler \
    --imgsize 224 224 \
    --num_channels 13 \
    --num_classes 8 \
    --num_epochs 500 \
    --arch tokendropout \
    --diversify \
    --exp_name dropout_token \
    --dp_rate 0.0

This launches training with:

multi-class token ViT architecture,
random token masking,
optional attention-head pruning,
logging and checkpointing

Generating Pseudo-Masks

Once training is completed, pseudo-masks can be generated with:

python generate_pseudomasks.py --dataset dfc --checkpoint <path_to_ckpt>

This script:

Loads the trained ViTWithTokenDropout model
Extracts class-specific attention maps
Converts attention into dense pseudo-masks
Saves:

pms_<dataset>.npy    # pseudo-masks
imgs_<dataset>.npy   # raw images
masks_<dataset>.npy  # ground-truth masks (if provided)

Reference

If you use this repository in your research, please cite:

@InProceedings{Hanna_2025_ICCV,
    author    = {Hanna, Jo\"elle and Borth, Damian},
    title     = {Know Your Attention Maps: Class-specific Token Masking for Weakly Supervised Semantic Segmentation},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2025},
    pages     = {23763-23772}
}

Contact

For questions, issues, or discussions:

Joëlle Hanna University of St. Gallen [email protected]

Code

This repository incorporates code from the following sources:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Know Your Attention Maps: Class-Specific Token Masking for Weakly Supervised Semantic Segmentation

Summary

Code Structure

Training

Example Training Command

Generating Pseudo-Masks

Reference

Contact

Code

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
datasets		datasets
models		models
README.md		README.md
generate_pseudomasks.py		generate_pseudomasks.py
model.py		model.py
recorder_tokendropout.py		recorder_tokendropout.py
run.py		run.py

HSG-AIML/TokenMasking-WSSS

Folders and files

Latest commit

History

Repository files navigation

Know Your Attention Maps: Class-Specific Token Masking for Weakly Supervised Semantic Segmentation

Summary

Code Structure

Training

Example Training Command

Generating Pseudo-Masks

Reference

Contact

Code

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages