|
1 | | -# MambaOut |
2 | | -MambaOut: Do We Really Need Mamba for Vision? |
| 1 | +# [MambaOut: Do We Really Need Mamba for Vision?](https://arxiv.org/abs/2405.xxxxx) |
| 2 | + |
| 3 | +<p align="left"> |
| 4 | +<a href="https://arxiv.org/abs/2405.xxxxx" alt="arXiv"> |
| 5 | + <img src="https://img.shields.io/badge/arXiv-2203.16900-b31b1b.svg?style=flat" /></a> |
| 6 | +<a href="https://colab.research.google.com/drive/" alt="Colab"> |
| 7 | + <img src="https://colab.research.google.com/assets/colab-badge.svg" /></a> |
| 8 | +</p> |
| 9 | + |
| 10 | +<p align="center"><em>In memory of Kobe Bryant</em></p> |
| 11 | +> "What can I say, Mamba out." |
| 12 | +> |
| 13 | +> — *Kobe Bryant, NBA farewell speech, 2016* |
| 14 | +
|
| 15 | + |
| 16 | + |
| 17 | +This is a PyTorch implementation of MambaOut proposed by our paper "[MambaOut: Do We Really Need Mamba for Vision?](https://arxiv.org/abs/2303.16900)". |
| 18 | + |
| 19 | + |
| 20 | +## Requirements |
| 21 | +PyTorch and timm 0.6.11 (`pip install timm==0.6.11`). |
| 22 | + |
| 23 | +Data preparation: ImageNet with the following folder structure, you can extract ImageNet by this [script](https://gist.github.com/BIGBALLON/8a71d225eff18d88e469e6ea9b39cef4). |
| 24 | + |
| 25 | +``` |
| 26 | +│imagenet/ |
| 27 | +├──train/ |
| 28 | +│ ├── n01440764 |
| 29 | +│ │ ├── n01440764_10026.JPEG |
| 30 | +│ │ ├── n01440764_10027.JPEG |
| 31 | +│ │ ├── ...... |
| 32 | +│ ├── ...... |
| 33 | +├──val/ |
| 34 | +│ ├── n01440764 |
| 35 | +│ │ ├── ILSVRC2012_val_00000293.JPEG |
| 36 | +│ │ ├── ILSVRC2012_val_00002138.JPEG |
| 37 | +│ │ ├── ...... |
| 38 | +│ ├── ...... |
| 39 | +``` |
| 40 | + |
| 41 | + |
| 42 | +## Models |
| 43 | +### MambaOut trained on ImageNet |
| 44 | +| Model | Resolution | Params | MACs | Top1 Acc | |
| 45 | +| :--- | :---: | :---: | :---: | :---: | |
| 46 | +| [mambaout_femto](https://github.com/yuweihao/MambaOut/releases/download/model/mambaout_femto.pth) | 224 | 7.3M | 1.2G | 78.9 | |
| 47 | +| [mambaout_tiny](https://github.com/yuweihao/MambaOut/releases/download/model/mambaout_tiny.pth) | 224 | 26.5M | 4.5G | 82.7 | |
| 48 | +| [mambaout_small](https://github.com/yuweihao/MambaOut/releases/download/model/mambaout_small.pth) | 224 | 48.5M | 9.0G | 84.1 | |
| 49 | +| [mambaout_base](https://github.com/yuweihao/MambaOut/releases/download/model/mambaout_base.pth) | 224 | 84.8M | 15.8G | 84.2 | |
| 50 | + |
| 51 | + |
| 52 | +#### Usage |
| 53 | +We also provide a Colab notebook which runs the steps to perform inference with MambaOut: [](https://colab.research.google.com/drive/) |
| 54 | + |
| 55 | + |
| 56 | +## Validation |
| 57 | + |
| 58 | +To evaluate models, run: |
| 59 | + |
| 60 | +```bash |
| 61 | +MODEL=mambaout_tiny |
| 62 | +python3 validate.py /path/to/imagenet --model $MODEL -b 128 \ |
| 63 | + --pretrained |
| 64 | +``` |
| 65 | + |
| 66 | +## Train |
| 67 | +We use batch size of 4096 by default and we show how to train models with 8 GPUs. For multi-node training, adjust `--grad-accum-steps` according to your situations. |
| 68 | + |
| 69 | + |
| 70 | +```bash |
| 71 | +DATA_PATH=/path/to/imagenet |
| 72 | +CODE_PATH=/path/to/code/MambaOut # modify code path here |
| 73 | + |
| 74 | + |
| 75 | +ALL_BATCH_SIZE=4096 |
| 76 | +NUM_GPU=8 |
| 77 | +GRAD_ACCUM_STEPS=4 # Adjust according to your GPU numbers and memory size. |
| 78 | +let BATCH_SIZE=ALL_BATCH_SIZE/NUM_GPU/GRAD_ACCUM_STEPS |
| 79 | + |
| 80 | + |
| 81 | +MODEL=mambaout_tiny |
| 82 | +DROP_PATH=0.2 |
| 83 | + |
| 84 | + |
| 85 | +cd $CODE_PATH && sh distributed_train.sh $NUM_GPU $DATA_PATH \ |
| 86 | +--model $MODEL --opt adamw --lr 4e-3 --warmup-epochs 20 \ |
| 87 | +-b $BATCH_SIZE --grad-accum-steps $GRAD_ACCUM_STEPS \ |
| 88 | +--drop-path $DROP_PATH |
| 89 | +``` |
| 90 | +Training scripts of other models are shown in [scripts](/scripts/). |
| 91 | + |
| 92 | + |
| 93 | +## Bibtex |
| 94 | +``` |
| 95 | +@article{yu2024mambaout, |
| 96 | + title={MambaOut: Do We Really Need Mamba for Vision?}, |
| 97 | + author={Yu, Weihao and and Wang, Xinchao}, |
| 98 | + journal={arXiv preprint arXiv:2405.xxxxx}, |
| 99 | + year={2024} |
| 100 | +} |
| 101 | +``` |
| 102 | + |
| 103 | +## Acknowledgment |
| 104 | +Weihao was partly supported by Snap Research Fellowship, Google TPU Research Cloud (TRC), and Google Cloud Research Credits program. We thank Dongze Lian, Qiuhong Shen, Xingyi Yang, and Gongfan Fang for valuable discussions. |
| 105 | + |
| 106 | +Our implementation is based on [pytorch-image-models](https://github.com/huggingface/pytorch-image-models), [poolformer](https://github.com/sail-sg/poolformer), [ConvNeXt](https://github.com/facebookresearch/ConvNeXt), [metaformer](https://github.com/sail-sg/metaformer) and [inceptionnext](https://github.com/sail-sg/inceptionnext). |
0 commit comments