Improve BrowserGym examples for latest OpenEnv version by sergiopaniego · Pull Request #5568 · huggingface/trl

sergiopaniego · 2026-04-16T09:51:50Z

What does this PR do?

I've rerun the scripts to check the latest changes in both TRL+OpenEnv

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline, Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

AI writing disclosure

We welcome the use of AI tools to help with contributions. For transparency and to help us improve our review process, please indicate the level of AI involvement in this PR.

No AI usage: the PR was written entirely by a human.
AI-assisted: some parts were suggested or improved by AI, but the PR was written and reviewed by a human.
AI-generated: the PR was mostly or fully generated by an AI tool.

Who can review?

@qgallouedec @kashif

Note

Medium Risk
Primarily affects example scripts, but it patches internal WebSocket client state (client._ws.protocol) and changes reward shaping/training defaults, which could impact runtime behavior and results.

Overview
Updates the BrowserGym GRPO example scripts to be more robust with newer OpenEnv/websockets behavior and model tool-calling outputs.

The VLM script now patches the underlying WebSocket max_size/max_message_size to avoid 1MB observation truncation, normalizes bid values (int/[13]-style) before building tool actions, returns a friendly message instead of raising when an episode is already done, and adds an extra reward_efficiency shaping term while simplifying the system prompt and tuning defaults (notably gradient_accumulation_steps=1).

The LLM script is simplified (docs + CLI surface + inline GRPOConfig), aligns reward function signature/usage, applies the same WebSocket size patch + done-handling change, and adjusts logging/Trackio settings (e.g., enabling log_completions and using a stable trackio_space_id).

^{Reviewed by Cursor Bugbot for commit 3ec8707. Bugbot is set up for automated code reviews on this repo. Configure here.}

HuggingFaceDocBuilderDev · 2026-04-16T09:54:18Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

…rl into update-browsergym-examples

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 3d39fba. Configure here.}

Improve BrowserGym examples for latest OpenEnv version

e51f2dc

sergiopaniego added 2 commits April 16, 2026 11:55

Removed unneeded reward

9fa32be

Merge branch 'main' into update-browsergym-examples

9362cee

cursor Bot reviewed Apr 16, 2026

View reviewed changes

Comment thread examples/scripts/openenv/browsergym.py Outdated

Comment thread examples/scripts/openenv/browsergym.py Outdated

sergiopaniego added 2 commits April 16, 2026 12:40

Update based on Cursor

30c82de

Merge branch 'update-browsergym-examples' of github.com:huggingface/t…

030f1b6

…rl into update-browsergym-examples

cursor Bot reviewed Apr 16, 2026

View reviewed changes

Comment thread examples/scripts/openenv/browsergym.py Outdated

update based on cursor review

3d39fba

cursor Bot reviewed Apr 16, 2026

View reviewed changes

Comment thread examples/scripts/openenv/browsergym.py

sergiopaniego added 3 commits April 16, 2026 17:43

Merge branch 'main' into update-browsergym-examples

e48ceed

Merge branch 'main' into update-browsergym-examples

4e8044f

Merge branch 'main' into update-browsergym-examples

3ec8707

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve BrowserGym examples for latest OpenEnv version#5568

Improve BrowserGym examples for latest OpenEnv version#5568
sergiopaniego wants to merge 9 commits intomainfrom
update-browsergym-examples

sergiopaniego commented Apr 16, 2026 •

edited by cursor Bot

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Apr 16, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sergiopaniego commented Apr 16, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

AI writing disclosure

Who can review?

Uh oh!

HuggingFaceDocBuilderDev commented Apr 16, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sergiopaniego commented Apr 16, 2026 •

edited by cursor Bot

Loading