-
Notifications
You must be signed in to change notification settings - Fork 3.6k
fix(fabric): forward set_epoch to underlying sampler in DistributedSamplerWrapper #21456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(fabric): forward set_epoch to underlying sampler in DistributedSamplerWrapper #21456
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #21456 +/- ##
=======================================
Coverage 87% 87%
=======================================
Files 270 270
Lines 24061 24067 +6
=======================================
+ Hits 20857 20863 +6
Misses 3204 3204 |
8be4812 to
f308cef
Compare
|
Please update changelog too. :) |
…mplerWrapper The DistributedSamplerWrapper now forwards set_epoch() calls to the underlying sampler if it supports the method. This fix is generic and works for any sampler subclass that implements set_epoch(), not just specific implementations. This is important for samplers that use the epoch for shuffling or other epoch-dependent behavior in distributed training. Fixes Lightning-AI#21454
f308cef to
7f28441
Compare
changelog is the main source of the merge conflict. So I am kind of hesitate to update it until someone reviewed. Not sure what is the best way to handle it. Updated. |
|
Yeah, but without it, we sometimes merge the PR without updates in changelog. |
|
|
there're multiple maintainers maintaining this and other Lightning AI's OSS projects, I doubt if one will even remember, lol. |
bhimrazy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Over all looks good.
tchaton
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM !
What does this PR do?
The
DistributedSamplerWrappernow forwardsset_epoch()calls to the original sampler if it supports the method.Problem
When a custom sampler is wrapped by
DistributedSamplerWrapper, callingset_epoch()on the wrapper does not propagate to the original sampler. This breaks samplers that rely onset_epoch()for epoch-dependent behavior like shuffling.Solution
Override
set_epoch()inDistributedSamplerWrapperto:super().set_epoch(epoch)to set the epoch on the wrapper itselfset_epochmethodThis fix is generic and works for any sampler subclass that implements
set_epoch(), not just specific implementations. It uses duck typing to check for the method's existence and callability.Changes
src/lightning/fabric/utilities/distributed.py: Addedset_epoch()override toDistributedSamplerWrappersrc/lightning/fabric/CHANGELOG.md: Added changelog entrytests/tests_fabric/utilities/test_distributed.py: Addedtest_distributed_sampler_wrapper_set_epochwith 100% branch coverage:set_epochmethod - verifies forwarding worksset_epochmethod - verifies graceful handlingset_epochattribute - verifies it's skippedFixes #21454