Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with parallelization in sea ice notebook? #186

Open
mnlevy1981 opened this issue Feb 7, 2025 · 0 comments
Open

Issue with parallelization in sea ice notebook? #186

mnlevy1981 opened this issue Feb 7, 2025 · 0 comments
Labels
bug Something isn't working cice

Comments

@mnlevy1981
Copy link
Collaborator

Describe the bug
@dabail10 has been reporting occasional RuntimeError: NetCDF: Not a valid ID errors when running a notebook in parallel. I found a two-year-old discussion from Australia, but their fix was to run with a single thread per worker and we already set threads_per_worker=1 when creating the LocalCluster. This is verified in the client object, where there are 16 processes and 16 threads:

<Client: 'tcp://127.0.0.1:39305' processes=16 threads=16, memory=120.00 GiB>

To Reproduce
On casper or derecho, run cupid-diagnostics --ice on a compute node with multiple cores; sometimes the Hemis_seaice_visual_compare_contour.ipynb fails.

Expected behavior
This error should not crop up.

Additional context
There's additional discussion at pydata/xarray#7079 (I found that issue because it is mentioned in the Australian discussion), but it all seems to claim the issue is limited to threads_per_worker>1 so maybe it's not related to Dave's problem?

Running cupid-diagnostics --ice --serial will avoid the issue, but it will also run slower because it won't be parallelized.

@TeaganKing TeaganKing added bug Something isn't working cice labels Feb 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cice
Projects
None yet
Development

No branches or pull requests

2 participants