-
Notifications
You must be signed in to change notification settings - Fork 405
MSC Checkpointing Changes #789
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
NickGeneva
merged 47 commits into
NVIDIA:main
from
chris-hawes:chawes/initial-msc-checkpointing
Apr 2, 2025
Merged
Changes from 30 commits
Commits
Show all changes
47 commits
Select commit
Hold shift + click to select a range
c2e5db8
Initial commit
chris-hawes 61742fc
Working changes to be cleaned up.
chris-hawes 9d74552
Rename msc_config.yaml
chris-hawes 12c8c88
Fixed pytorch test issue by removing MSC Cache
chris-hawes 4e90e42
Clean up
chris-hawes 08b1557
Clean up
chris-hawes 546285c
Clean up
chris-hawes cbbdf11
Merge branch 'main' into chawes/initial-msc-checkpointing
chris-hawes 5226264
Updated project dependencies
chris-hawes 6be3b18
Find MSC config using absolute path.
chris-hawes d681bec
Re-added cuda test parameter.
chris-hawes ef07910
Moved MSC Config file
chris-hawes 35c75f5
Rename MSC config file
chris-hawes 1c68712
Add test to read from public S3 bucket using MSC.
chris-hawes 3906bb8
Added MSC comment
chris-hawes 8b1f17f
Clean up
chris-hawes a9f0f29
Revert save_checkpoint_freq value.
chris-hawes 82b1885
Remove temporary printing
chris-hawes 1a5dd4f
Remove unnecessary dependency
chris-hawes 0f70aac
Switched to use consistent mechanism for detecting msc URIs
chris-hawes e70fc58
Changes from code review.
chris-hawes 68f88b3
Fix missing variable.
chris-hawes 3624a88
Fix missing variable.
chris-hawes 9b48137
Moved fsspec.filesystem logic into filesystem.py
chris-hawes ba294c9
Change to cache for non-file protocols when reading non-modulus models.
chris-hawes e0c0881
Moved code to generate checkpoint directory.directory
chris-hawes afedf58
Added get_checkpoint_dir import
chris-hawes 8a57a9f
Address review feedback.
chris-hawes d238968
Changes from code review.
chris-hawes 4643d83
Add comment per code review.:w
chris-hawes 38b73d7
Addressed file test issue from review.
chris-hawes b0d1db8
Fix to file existence check.
chris-hawes 77025fb
Merge branch 'main' into chawes/initial-msc-checkpointing
chris-hawes 450ea6e
Fix merge conflicts due to project name change.
chris-hawes 19b6f78
Merge branch 'main' into chawes/initial-msc-checkpointing
chris-hawes f61124c
Merge branch 'main' into chawes/initial-msc-checkpointing
chris-hawes e4a9a90
Updated CHANGELOG.
chris-hawes 48b76be
Added Multi-Storage Client to allow checkpointing to/from Object Storage
chris-hawes 089b6f6
Addressed issues identified by pre-commit.
chris-hawes 9f3f59b
Merge branch 'main' into chawes/initial-msc-checkpointing
chris-hawes d36cc64
Merge branch 'main' into chawes/initial-msc-checkpointing
NickGeneva e746142
Update filesystem.py
NickGeneva 13f7fb0
Update __init__.py
NickGeneva 782cb46
Update Dockerfile
NickGeneva c3eed5e
Merge branch 'main' into chawes/initial-msc-checkpointing
NickGeneva 8c74f19
Update Dockerfile
NickGeneva 82e503b
Merge branch 'main' into chawes/initial-msc-checkpointing
NickGeneva File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
# SPDX-FileCopyrightText: Copyright (c) 2023 - 2024 NVIDIA CORPORATION & AFFILIATES. | ||
# SPDX-FileCopyrightText: All rights reserved. | ||
# SPDX-License-Identifier: Apache-2.0 | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
|
||
# This is an example MSC configuration file for testing checkpoint logic. | ||
profiles: | ||
checkpoint-test: | ||
storage_provider: | ||
type: s3 | ||
options: | ||
region_name: us-east-1 | ||
base_path: checkpoint-test-bucket | ||
credentials_provider: | ||
type: S3Credentials | ||
options: | ||
access_key: "access-key-id" | ||
secret_key: "secret-access-key" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
# SPDX-FileCopyrightText: Copyright (c) 2023 - 2024 NVIDIA CORPORATION & AFFILIATES. | ||
# SPDX-FileCopyrightText: All rights reserved. | ||
# SPDX-License-Identifier: Apache-2.0 | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
|
||
# This is an example MSC configuration file for accessing the CMIP6 archive on AWS: | ||
# https://registry.opendata.aws/cmip6/ | ||
profiles: | ||
cmip6-pds: | ||
storage_provider: | ||
type: s3 | ||
options: | ||
region_name: us-west-2 | ||
base_path: cmip6-pds | ||
signature_version: UNSIGNED | ||
cache: | ||
location: /tmp/.cache | ||
size_mb: 5000 | ||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.