Skip to content

Improve BrowserGym examples for latest OpenEnv version#5568

Open
sergiopaniego wants to merge 9 commits intomainfrom
update-browsergym-examples
Open

Improve BrowserGym examples for latest OpenEnv version#5568
sergiopaniego wants to merge 9 commits intomainfrom
update-browsergym-examples

Conversation

@sergiopaniego
Copy link
Copy Markdown
Member

@sergiopaniego sergiopaniego commented Apr 16, 2026

What does this PR do?

I've rerun the scripts to check the latest changes in both TRL+OpenEnv

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline, Pull Request section?
  • Was this discussed/approved via a GitHub issue? Please add a link to it if that's the case.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

AI writing disclosure

We welcome the use of AI tools to help with contributions. For transparency and to help us improve our review process, please indicate the level of AI involvement in this PR.

  • No AI usage: the PR was written entirely by a human.
  • AI-assisted: some parts were suggested or improved by AI, but the PR was written and reviewed by a human.
  • AI-generated: the PR was mostly or fully generated by an AI tool.

Who can review?

@qgallouedec @kashif


Note

Medium Risk
Primarily affects example scripts, but it patches internal WebSocket client state (client._ws.protocol) and changes reward shaping/training defaults, which could impact runtime behavior and results.

Overview
Updates the BrowserGym GRPO example scripts to be more robust with newer OpenEnv/websockets behavior and model tool-calling outputs.

The VLM script now patches the underlying WebSocket max_size/max_message_size to avoid 1MB observation truncation, normalizes bid values (int/[13]-style) before building tool actions, returns a friendly message instead of raising when an episode is already done, and adds an extra reward_efficiency shaping term while simplifying the system prompt and tuning defaults (notably gradient_accumulation_steps=1).

The LLM script is simplified (docs + CLI surface + inline GRPOConfig), aligns reward function signature/usage, applies the same WebSocket size patch + done-handling change, and adjusts logging/Trackio settings (e.g., enabling log_completions and using a stable trackio_space_id).

Reviewed by Cursor Bugbot for commit 3ec8707. Bugbot is set up for automated code reviews on this repo. Configure here.

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Comment thread examples/scripts/openenv/browsergym.py Outdated
Comment thread examples/scripts/openenv/browsergym.py Outdated
Comment thread examples/scripts/openenv/browsergym.py Outdated
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 3d39fba. Configure here.

Comment thread examples/scripts/openenv/browsergym.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants