fix: AdvancedProfiler: ValueError: Attempting to stop recording a #21451
+736
−1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Now I have a complete understanding of the changes. Let me generate the PR description:
Fix AdvancedProfiler ValueError when stopping non-started actions
Closes #9136
Summary
The
AdvancedProfilerraised aValueErrorwhen attempting to stop profiling an action that was never started. This commonly occurred when using multiple Trainers with a shared profiler instance (e.g., during grid search) where only one trainer runs the test phase. The fix changes thestop()method to gracefully handle this case by logging a debug message and returning early instead of raising an exception.Changes Made
src/lightning/pytorch/profilers/advanced.py:stop()method to log a debug message and return gracefully when attempting to stop an action that was never started, instead of raisingValueErrortests/tests_pytorch/profilers/test_profiler.py:test_advanced_profiler_multiple_trainers_test_only_one: Reproduces the exact bug scenario from issue AdvancedProfiler: ValueError: Attempting to stop recording an action (run_test_evaluation) which was never started. #9136 with multiple trainers sharing a profiler where only one runs testtest_advanced_profiler_reused_trainer_test: Tests reusing a trainer for multiple test calls with profilingtest_advanced_profiler_stop_nonexistent_action_no_error: Verifies that stopping non-existent actions doesn't raise errors and the profiler remains functionalTesting
The fix can be verified by:
Running the new test cases:
Running the full profiler test suite:
Checklist
📚 Documentation preview 📚: https://pytorch-lightning--21451.org.readthedocs.build/en/21451/