Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unavailable dataset #13

Open
friedrice231 opened this issue Apr 11, 2024 · 8 comments
Open

Unavailable dataset #13

friedrice231 opened this issue Apr 11, 2024 · 8 comments

Comments

@friedrice231
Copy link

friedrice231 commented Apr 11, 2024

Hello! I tried run_office_home.sh for CDAN_MCC_SDAT and I ran into the error below. Is it possible thtat the download list is outdated? I've tried manually downloading OfficeHome and setting the download boolean in utils.py to False but it doesn't seem to work with raw OfficeHome download. Could you provide more details on the dataset for it?

Downloading image_list
Fail to download image_list.zip from url link https://cloud.tsinghua.edu.cn/f/ca3a3b6a8d554905b4cd/?dl=1
Please check you internet connection.Simply trying again may be fine.

@rangwani-harsh
Copy link
Contributor

rangwani-harsh commented Apr 11, 2024

Hi, I have updated the office-home dataset links, can you pull the latest changes from the library and check again?

@friedrice231
Copy link
Author

Great, the links work now. However, I run into the error below. How do I fix this?

[INFORMATION] The bottleneck dim is 256
lr_bbone: 0.0002
lr_btlnck: 0.002
Traceback (most recent call last):
File "/home/lyeekai/SDAT/examples/cdan_mcc_sdat.py", line 352, in
main(args)
File "/home/lyeekai/SDAT/examples/cdan_mcc_sdat.py", line 158, in main
train(train_source_iter, train_target_iter, classifier, domain_adv, mcc_loss, optimizer, ad_optimizer,
File "/home/lyeekai/SDAT/examples/cdan_mcc_sdat.py", line 219, in train
y, f = model(x)
File "/home/lyeekai/.conda/envs/clip/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/lyeekai/SDAT/examples/common/modules/classifier.py", line 80, in forward
f = self.bottleneck(f)
File "/home/lyeekai/.conda/envs/clip/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/lyeekai/.conda/envs/clip/lib/python3.10/site-packages/torch/nn/modules/container.py", line 204, in forward
input = module(input)
File "/home/lyeekai/.conda/envs/clip/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/lyeekai/.conda/envs/clip/lib/python3.10/site-packages/torch/nn/modules/batchnorm.py", line 171, in forward
return F.batch_norm(
File "/home/lyeekai/.conda/envs/clip/lib/python3.10/site-packages/torch/nn/functional.py", line 2450, in batch_norm
return torch.batch_norm(
RuntimeError: running_mean should contain 197 elements not 256

@Ambuj-Choudha
Copy link

Ambuj-Choudha commented Jan 26, 2025

Hey hi I am also facing the same error!

The only thing I have changed to make it work was in the file "resnet.py" (SDAT/common/vision/models/resnet.py) line 4 and 5.
Change note:
(original)
from torchvision.models.utils import load_state_dict_from_url
#from torch.hub import load_state_dict_from_url

(changed version)
#from torchvision.models.utils import load_state_dict_from_url
from torch.hub import load_state_dict_from_url

@rangwani-harsh did you ever faced this during development?
@friedrice231 were you able to resolve this error?

@Ambuj-Choudha
Copy link

Ambuj-Choudha commented Jan 26, 2025

Okay so I tried to do debug and I have some details to add,

  1. I checked the code in classifier.py from TL lib, and there were no changes there, so I don't think there's any issue there

  2. after that I tried to re-install pytorch and tried to run it with the version on which TL lib was working properly as well (v2.2), this also doesn't resolve the issue (so the error doesn't seem to be originating from there

  3. Lastly, I tried to run cdan_sdat.py in TL lib repo, by changing the references and made changes to a couple of lines in the, inside the training loop (line 204 and 205)
    original version:
    x_s, labels_s = next(train_source_iter)
    x_t, _ = next(train_target_iter)
    modified version:
    x_s, labels_s = next(train_source_iter)[:2]
    x_t, = next(train_target_iter)[:1]

And also, I made it consistent with the version on TL lib, and suprising that works!!! (But I had to use ResNet instead of ViT)

  1. removed the --no-pool flag (as in TL lib scripts)
  2. Changed references (see image below)
    Image
    So, at this point I am really not sure what actually solved it because apart from making these small modifications like changing the references to 'dalib' to 'tllib' in the import statements, the --no-pool flag in training script and the two lines above I didn't change anything at all.

Result: Solved (but ResNet instead of ViT), although do not know how and why!

@rangwani-harsh
Copy link
Contributor

Hi @Ambuj-Choudha,

Thanks for your work, it would be great if you can consolidate the changes and make a pull request for the above changes, will be a great help to community.

Also would be great to know if you are able to reproduce the ResNet results (with and without SDAT)?

Meanwhile I will also try to see that if the changes you mentioned above are correct or not.

Thanks
Harsh

@Ambuj-Choudha
Copy link

Hey hi @rangwani-harsh,

Thanks a lot for taking your time to reply this late. Yeah I will do the needful (after the training finishes), although I am not sure, whether it would make it work. Right now I am running it in TL lib, I will update you on the results as soon as possible, then later perhaps I can try a couple of things with SDAT.

Regards,
Ambuj

@Ambuj-Choudha
Copy link

First of all, update on the results I was able to replicate the results when I tried to run cdan_sdat.py inside TL lib.

Image

I think improvements are because of improved backbones.

Additionally, I think I was able to isolate the issue, so this issue only comes in ViT and not in ResNets.

Although,
removing the --no-pool flag (as in TL lib scripts) is neccesary to run ResNet based training.
This is the reason why the problem didn't appear inside TL lib is because I changed architechture.
At this point, I do not think that there's any explicit pull request neccesary except a bit of note on this in the readme.

Regards,
Ambuj Choudha

@rangwani-harsh
Copy link
Contributor

Hi @Ambuj-Choudha thanks for sharing the result, they look good and working as expected.

I took a look at the PR #14 as well, I agree that till we figure out how to run ViT code with tllib we have to allow users a choice on versions.

Thanks
Harsh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants