Skip to content

Conversation

@littlebullGit
Copy link
Contributor

@littlebullGit littlebullGit commented Dec 28, 2025

What does this PR do?

The DistributedSamplerWrapper now forwards set_epoch() calls to the original sampler if it supports the method.

Problem

When a custom sampler is wrapped by DistributedSamplerWrapper, calling set_epoch() on the wrapper does not propagate to the original sampler. This breaks samplers that rely on set_epoch() for epoch-dependent behavior like shuffling.

Solution

Override set_epoch() in DistributedSamplerWrapper to:

  1. Call super().set_epoch(epoch) to set the epoch on the wrapper itself
  2. Forward the call to the original sampler if it has a callable set_epoch method

This fix is generic and works for any sampler subclass that implements set_epoch(), not just specific implementations. It uses duck typing to check for the method's existence and callability.

Changes

  • src/lightning/fabric/utilities/distributed.py: Added set_epoch() override to DistributedSamplerWrapper
  • src/lightning/fabric/CHANGELOG.md: Added changelog entry
  • tests/tests_fabric/utilities/test_distributed.py: Added test_distributed_sampler_wrapper_set_epoch with 100% branch coverage:
    • Case 1: Sampler WITH set_epoch method - verifies forwarding works
    • Case 2: Sampler WITHOUT set_epoch method - verifies graceful handling
    • Case 3: Sampler with non-callable set_epoch attribute - verifies it's skipped

Fixes #21454

@github-actions github-actions bot added the fabric lightning.fabric.Fabric label Dec 28, 2025
@codecov
Copy link

codecov bot commented Dec 28, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87%. Comparing base (aa0ee0d) to head (b27b18d).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #21456   +/-   ##
=======================================
  Coverage      87%      87%           
=======================================
  Files         270      270           
  Lines       24061    24067    +6     
=======================================
+ Hits        20857    20863    +6     
  Misses       3204     3204           

@littlebullGit littlebullGit force-pushed the fix/21454-distributed-sampler-set-epoch branch from 8be4812 to f308cef Compare December 28, 2025 20:45
@deependujha
Copy link
Collaborator

Please update changelog too. :)

…mplerWrapper

The DistributedSamplerWrapper now forwards set_epoch() calls to the
underlying sampler if it supports the method. This fix is generic and
works for any sampler subclass that implements set_epoch(), not just
specific implementations.

This is important for samplers that use the epoch for shuffling or
other epoch-dependent behavior in distributed training.

Fixes Lightning-AI#21454
@littlebullGit littlebullGit force-pushed the fix/21454-distributed-sampler-set-epoch branch from f308cef to 7f28441 Compare December 28, 2025 20:51
@littlebullGit
Copy link
Contributor Author

Please update changelog too. :)

changelog is the main source of the merge conflict. So I am kind of hesitate to update it until someone reviewed. Not sure what is the best way to handle it. Updated.

@deependujha
Copy link
Collaborator

Yeah, but without it, we sometimes merge the PR without updates in changelog.

@littlebullGit
Copy link
Contributor Author

Yeah, but without it, we sometimes merge the PR without updates in changelog.
Maybe we should add the changelog entry in the PR description and whoever merge it take it and add the the changelog ? just a thought.

@deependujha
Copy link
Collaborator

deependujha commented Dec 28, 2025

there're multiple maintainers maintaining this and other Lightning AI's OSS projects, I doubt if one will even remember, lol.

Copy link
Collaborator

@bhimrazy bhimrazy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Over all looks good.

Copy link
Contributor

@tchaton tchaton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM !

@deependujha deependujha merged commit 79a39c0 into Lightning-AI:master Jan 14, 2026
124 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fabric lightning.fabric.Fabric

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DistributedSamplerWrapper does not pass on the .set_epoch call to the underlying sampler

4 participants