Skip to content

Conversation

@helper2424
Copy link
Contributor

@helper2424 helper2424 commented Nov 21, 2025

What this does

Implement suggestion from the https://alexander-soare.github.io/robotics/2025/08/05/smooth-as-butter-robot-policies.html.

Two important changes:

  • Added the logic that allows skipping max_guidance param and use number of flow matching steps as basic clipping parameter.
  • The guidance calucaltion is extended with sigma_d. The default value is 1.0, so the default behavior is equal to roginal paper. But library users can adjust this value
  • Fix the bug with in-painting algorithm

How it was tested

  • Run pytest tests/policies/pi0_pi05
  • Run pytest tests/policies/smolvla
  • Run pytest tests/policies/rtc
  • Also, was run the following script to check different params for RTC:
#!/bin/bash

# Script to run RTC evaluation experiments with different parameters
# This script tests various combinations of:
# - num_inference_steps (flow matching steps)
# - sigma_d (variance clipping parameter)
# - Different policies (SmolVLA and PI0.5)

set -e  # Exit on error

# Color codes for output
GREEN='\033[0;32m'
BLUE='\033[0;34m'
NC='\033[0m' # No Color

# Configuration arrays
NUM_STEPS=(2 5 10 20 50 100)
SIGMA_D_VALUES=(0.1 0.2 0.5 0.8 1.0 1.2 1.5)

# Model configurations
SMOLVLA_POLICY="helper2424/smolvla_check_rtc_last3"
SMOLVLA_DATASET="helper2424/check_rtc"
PI05_POLICY="lerobot/pi05_libero_finetuned"
PI05_DATASET="HuggingFaceVLA/libero"

# Common parameters
DEVICE="mps"
SEED=10
EXECUTION_HORIZON=8
ATTENTION_SCHEDULE="EXP"
INFERENCE_DELAY=4

# Create results directory
RESULTS_DIR="rtc_experiments_results"
mkdir -p "$RESULTS_DIR"

# Log file
LOG_FILE="$RESULTS_DIR/experiment_log.txt"
echo "RTC Evaluation Experiments - $(date)" > "$LOG_FILE"
echo "======================================" >> "$LOG_FILE"

# Function to run a single experiment
run_experiment() {
    local model_name=$1
    local policy_path=$2
    local dataset_repo=$3
    local num_steps=$4
    local sigma_d=$5

    local output_dir="${RESULTS_DIR}/${model_name}_steps_${num_steps}_sigma_${sigma_d}"

    echo -e "${BLUE}Running: ${model_name} | steps=${num_steps} | sigma_d=${sigma_d}${NC}"
    echo "$(date): Starting ${model_name} steps=${num_steps} sigma_d=${sigma_d}" >> "$LOG_FILE"

    # Run the evaluation
    uv run python examples/rtc/eval_dataset.py \
        --policy.path="$policy_path" \
        --dataset.repo_id="$dataset_repo" \
        --rtc.execution_horizon="$EXECUTION_HORIZON" \
        --rtc.sigma_d="$sigma_d" \
        --device="$DEVICE" \
        --rtc.prefix_attention_schedule="$ATTENTION_SCHEDULE" \
        --seed="$SEED" \
        --num_inference_steps="$num_steps" \
        --inference_delay="$INFERENCE_DELAY" \
        --output_dir="$output_dir" 2>&1 | tee -a "$LOG_FILE"

    if [ $? -eq 0 ]; then
        echo -e "${GREEN}✓ Completed: ${model_name} | steps=${num_steps} | sigma_d=${sigma_d}${NC}"
        echo "$(date): SUCCESS ${model_name} steps=${num_steps} sigma_d=${sigma_d}" >> "$LOG_FILE"
    else
        echo "ERROR: Failed ${model_name} steps=${num_steps} sigma_d=${sigma_d}" >> "$LOG_FILE"
        echo "Continuing with next experiment..."
    fi

    echo "" >> "$LOG_FILE"
}

# Main execution loop
echo "Starting RTC evaluation experiments..."
echo "Results will be saved to: $RESULTS_DIR"
echo ""

# # Run experiments for SmolVLA
# echo "=========================================="
# echo "Running SmolVLA experiments"
# echo "=========================================="
# for num_steps in "${NUM_STEPS[@]}"; do
#     for sigma_d in "${SIGMA_D_VALUES[@]}"; do
#         run_experiment \
#             "smolvla" \
#             "$SMOLVLA_POLICY" \
#             "$SMOLVLA_DATASET" \
#             "$num_steps" \
#             "$sigma_d"
#     done
# done

# Run experiments for PI0.5
echo "=========================================="
echo "Running PI0.5 experiments"
echo "=========================================="
for num_steps in "${NUM_STEPS[@]}"; do
    for sigma_d in "${SIGMA_D_VALUES[@]}"; do
        run_experiment \
            "pi05" \
            "$PI05_POLICY" \
            "$PI05_DATASET" \
            "$num_steps" \
            "$sigma_d"
    done
done

echo ""
echo "=========================================="
echo "All experiments completed!"
echo "Results saved to: $RESULTS_DIR"
echo "Log file: $LOG_FILE"
echo "=========================================="

# Generate summary
SUMMARY_FILE="$RESULTS_DIR/summary.txt"
echo "Experiment Summary - $(date)" > "$SUMMARY_FILE"
echo "======================================" >> "$SUMMARY_FILE"
echo "" >> "$SUMMARY_FILE"
echo "Total experiments: $(( ${#NUM_STEPS[@]} * ${#SIGMA_D_VALUES[@]} * 2 ))" >> "$SUMMARY_FILE"
echo "Models tested: SmolVLA, PI0.5" >> "$SUMMARY_FILE"
echo "Num steps tested: ${NUM_STEPS[*]}" >> "$SUMMARY_FILE"
echo "Sigma_d values tested: ${SIGMA_D_VALUES[*]}" >> "$SUMMARY_FILE"
echo "" >> "$SUMMARY_FILE"
echo "Results directory structure:" >> "$SUMMARY_FILE"
find "$RESULTS_DIR" -type d -name "*_steps_*" | sort >> "$SUMMARY_FILE"

echo ""
echo "Summary saved to: $SUMMARY_FILE"

Some reports after test script run

Check - https://huggingface.co/spaces/helper2424/rtc_tests

SmolVLA; n_step=2; sigma_d=0.1
denoising_xt_comparison
final_actions_comparison

SmolVLA; n_step=5; sigma_d=1.0
denoising_xt_comparison
final_actions_comparison

SmolVLA; n_step=50; sigma_d=0.2
denoising_xt_comparison
final_actions_comparison

Pi0.5; n_steps=5, sigma=0.8
denoising_xt_comparison
final_actions_comparison

Pi0.5; n_steps=10, sigma=0.2
denoising_xt_comparison
final_actions_comparison

Copy link

@alexander-cobot alexander-cobot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @helper2424 for opening this PR! I have a few comments which will hopefully clarify, my article's guidance.

The main points are:

  1. I think you meant variance_clipping_factor to be σ_d from my article judging by its default value. I think the name needs revising. See my inline comments.
  2. max_guidance_weight should probably default to be num_steps._
  3. You don't need any use_soare_optimization guards. σ_d = 1.0 covers the default RTC implementation. σ_d < 1.0 (and a good value might be 0.2) covers my article.

I'm also available offline to discuss :)

time,
original_denoise_step_partial,
execution_horizon=None,
num_flow_matching_steps=None,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest not passing this here, as the total number of denoising steps shouldn't be the concern of a single denoising step. My article's guidance is to set max_guidance_weight = num_steps. I'm fairly convinced that is the correct thing to do, enough so that I would recommend just forcing to default to this unless the user explicitly provides max_guidance_weight. I also note that you have already defaulted it to 10, which is not the value of 5 that the original RTC paper suggests (which is fine IMO, but just showing that there are already deviations from the original specification, so we might as well ground it with my article's guidance).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I used 10, because by default, num_inference_steps: int = 10 , so 5 won't work well for Lerobot policies with default config. Probably, PI used 5 during testing RTC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made it, and after this returned the logic back. pi0.x pass num steps as parameter to the predic_action_chunk

Comment on lines 50 to 51
use_soare_optimization: bool = True
variance_clipping_factor: float = 0.2

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you meant: sigma_d: float = 1.0 instead of variance_clipping_factor (or if you prefer a more descriptive parameter, prior_variance, but be careful because "variance" is sigma_d ** 2). That parameter is used in all cases. When it is 1.0 you are not using the improvement suggested in my article. Otherwise you are. And therefore, you can drop use_soare_optimization altogether, and don't need to guard any code with if use_soare_optimization

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, as per my article, it's max_guidance_weight that you would set equal to num_flow_matching_steps. In the RTC paper they don't give guidance for that, and just suggest setting it to 5.0.

tau_tensor = torch.as_tensor(tau)
squared_one_minus_tau = (1 - tau_tensor) ** 2
inv_r2 = (squared_one_minus_tau + tau_tensor**2) / (squared_one_minus_tau)
if self.config.use_soare_optimization:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on my comments above. This whole block won't need this if else guard for use_soare_optimization.
You just need
inv_r2 = (squared_one_minus_tau + tau_tensor ** 2 * sigma_d ** 2) / (squared_one_minus_tau * sigma_d ** 2)
or if you are going to call it prior_variance instead, since that's already σ² it would be:
inv_r2 = (squared_one_minus_tau + tau_tensor ** 2 * prior_variance) / (squared_one_minus_tau * prior_variance)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then setting sigma_d = 1.0 reverts to the original RTC implementation.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw this is just eqn 8 in my article

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed


**`inference_delay`**: How many timesteps of inference latency your system has. This is passed to `predict_action_chunk()` rather than the config, since it may vary at runtime.

**`sigma_d`**: The variance of the prior distribution. This is a hyperparameter that can be tuned to balance the smoothness of the transitions and the reactivity of the policy.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: sigma is not "variance", but "standard deviation" rather.

```

**`max_guidance_weight`**: How strongly to enforce consistency with the previous chunk. This is a hyperparameter that can be tuned to balance the smoothness of the transitions and the reactivity of the policy. For 10 steps flow matching (SmolVLA, Pi0, Pi0.5), a value of 10.0 is a optimal value.
**`max_guidance_weight`**: How strongly to enforce consistency with the previous chunk. This is a hyperparameter that can be tuned to balance the smoothness of the transitions and the reactivity of the policy.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a clipping parameter, no the actual guidance weight. You might modify the sentence to say include something like "a clipping parameter on the computed guidance weight. Ensures stability."

},
)

sample_correlation_shift: int | None = field(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: that a good value here is something less than the chunk size. For example, you might want to simulate a chunk size of 50 where you begin inference for the next chunk at the 25th step.

@helper2424 helper2424 changed the title RTC optimization from Alex Soare RTC adjustments. Bug fix & Alex Soare optimization Nov 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants