-
Notifications
You must be signed in to change notification settings - Fork 318
MSC Checkpointing Changes #789
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MSC Checkpointing Changes #789
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, looks good to me. Had a few stylistic comments and only one other important comment about unifying the filesystem abstractions used in the code base.
Fix merge conflicts
Signed-off-by: Chris Hawes <[email protected]>
/blossom-ci |
/blossom-ci |
/blossom-ci |
/blossom-ci |
2 similar comments
/blossom-ci |
/blossom-ci |
/blossom-ci |
/blossom-ci |
2 similar comments
/blossom-ci |
/blossom-ci |
/blossom-ci |
* Working changes to be cleaned up. * Rename msc_config.yaml * Fixed pytorch test issue by removing MSC Cache * Updated project dependencies * Find MSC config using absolute path. * Re-added cuda test parameter. * Add test to read from public S3 bucket using MSC. * Revert save_checkpoint_freq value. * Remove temporary printing * Remove unnecessary dependency * Switched to use consistent mechanism for detecting msc URIs * Moved fsspec.filesystem logic into filesystem.py * Change to cache for non-file protocols when reading non-modulus models. * Moved code to generate checkpoint directory.directory * Added get_checkpoint_dir import * Address review feedback. * Changes from code review. * Addressed file test issue from review. * Fix to file existence check. * Fix merge conflicts due to project name change. * Updated CHANGELOG. * Added Multi-Storage Client to allow checkpointing to/from Object Storage Signed-off-by: Chris Hawes <[email protected]> * Addressed issues identified by pre-commit. * Update filesystem.py * Update __init__.py * Update Dockerfile --------- Signed-off-by: Chris Hawes <[email protected]> Co-authored-by: Nicholas Geneva <[email protected]>
Modulus Pull Request
Description
Checklist
Dependencies