-
Notifications
You must be signed in to change notification settings - Fork 3.2k
RTC adjustments. Bug fix & Alex Soare optimization #2499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
RTC adjustments. Bug fix & Alex Soare optimization #2499
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @helper2424 for opening this PR! I have a few comments which will hopefully clarify, my article's guidance.
The main points are:
- I think you meant
variance_clipping_factorto be σ_d from my article judging by its default value. I think the name needs revising. See my inline comments. max_guidance_weightshould probably default to benum_steps._- You don't need any
use_soare_optimizationguards. σ_d = 1.0 covers the default RTC implementation. σ_d < 1.0 (and a good value might be 0.2) covers my article.
I'm also available offline to discuss :)
| time, | ||
| original_denoise_step_partial, | ||
| execution_horizon=None, | ||
| num_flow_matching_steps=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest not passing this here, as the total number of denoising steps shouldn't be the concern of a single denoising step. My article's guidance is to set max_guidance_weight = num_steps. I'm fairly convinced that is the correct thing to do, enough so that I would recommend just forcing to default to this unless the user explicitly provides max_guidance_weight. I also note that you have already defaulted it to 10, which is not the value of 5 that the original RTC paper suggests (which is fine IMO, but just showing that there are already deviations from the original specification, so we might as well ground it with my article's guidance).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I used 10, because by default, num_inference_steps: int = 10 , so 5 won't work well for Lerobot policies with default config. Probably, PI used 5 during testing RTC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made it, and after this returned the logic back. pi0.x pass num steps as parameter to the predic_action_chunk
| use_soare_optimization: bool = True | ||
| variance_clipping_factor: float = 0.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you meant: sigma_d: float = 1.0 instead of variance_clipping_factor (or if you prefer a more descriptive parameter, prior_variance, but be careful because "variance" is sigma_d ** 2). That parameter is used in all cases. When it is 1.0 you are not using the improvement suggested in my article. Otherwise you are. And therefore, you can drop use_soare_optimization altogether, and don't need to guard any code with if use_soare_optimization
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note, as per my article, it's max_guidance_weight that you would set equal to num_flow_matching_steps. In the RTC paper they don't give guidance for that, and just suggest setting it to 5.0.
| tau_tensor = torch.as_tensor(tau) | ||
| squared_one_minus_tau = (1 - tau_tensor) ** 2 | ||
| inv_r2 = (squared_one_minus_tau + tau_tensor**2) / (squared_one_minus_tau) | ||
| if self.config.use_soare_optimization: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on my comments above. This whole block won't need this if else guard for use_soare_optimization.
You just need
inv_r2 = (squared_one_minus_tau + tau_tensor ** 2 * sigma_d ** 2) / (squared_one_minus_tau * sigma_d ** 2)
or if you are going to call it prior_variance instead, since that's already σ² it would be:
inv_r2 = (squared_one_minus_tau + tau_tensor ** 2 * prior_variance) / (squared_one_minus_tau * prior_variance)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then setting sigma_d = 1.0 reverts to the original RTC implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw this is just eqn 8 in my article
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
|
|
||
| **`inference_delay`**: How many timesteps of inference latency your system has. This is passed to `predict_action_chunk()` rather than the config, since it may vary at runtime. | ||
|
|
||
| **`sigma_d`**: The variance of the prior distribution. This is a hyperparameter that can be tuned to balance the smoothness of the transitions and the reactivity of the policy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: sigma is not "variance", but "standard deviation" rather.
| ``` | ||
|
|
||
| **`max_guidance_weight`**: How strongly to enforce consistency with the previous chunk. This is a hyperparameter that can be tuned to balance the smoothness of the transitions and the reactivity of the policy. For 10 steps flow matching (SmolVLA, Pi0, Pi0.5), a value of 10.0 is a optimal value. | ||
| **`max_guidance_weight`**: How strongly to enforce consistency with the previous chunk. This is a hyperparameter that can be tuned to balance the smoothness of the transitions and the reactivity of the policy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a clipping parameter, no the actual guidance weight. You might modify the sentence to say include something like "a clipping parameter on the computed guidance weight. Ensures stability."
examples/rtc/eval_dataset.py
Outdated
| }, | ||
| ) | ||
|
|
||
| sample_correlation_shift: int | None = field( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: that a good value here is something less than the chunk size. For example, you might want to simulate a chunk size of 50 where you begin inference for the next chunk at the 25th step.
What this does
Implement suggestion from the https://alexander-soare.github.io/robotics/2025/08/05/smooth-as-butter-robot-policies.html.
Two important changes:
max_guidanceparam and use number of flow matching steps as basic clipping parameter.sigma_d. The default value is 1.0, so the default behavior is equal to roginal paper. But library users can adjust this valueHow it was tested
pytest tests/policies/pi0_pi05pytest tests/policies/smolvlapytest tests/policies/rtcSome reports after test script run
Check - https://huggingface.co/spaces/helper2424/rtc_tests
SmolVLA; n_step=2; sigma_d=0.1


SmolVLA; n_step=5; sigma_d=1.0


SmolVLA; n_step=50; sigma_d=0.2


Pi0.5; n_steps=5, sigma=0.8


Pi0.5; n_steps=10, sigma=0.2

