-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add mouse position into env observation #282
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice feature! See my comments to make it robust to iFrames
@@ -271,7 +268,7 @@ def override_property(task, env, property): | |||
window.addEventListener("focusin", () => {window.browsergym_page_activated();}, {capture: true}); | |||
window.addEventListener("load", () => {window.browsergym_page_activated();}, {capture: true}); | |||
window.addEventListener("pageshow", () => {window.browsergym_page_activated();}, {capture: true}); | |||
window.addEventListener("mousemove", () => {window.browsergym_page_activated();}, {capture: true}); | |||
window.addEventListener("mousemove", (event) => {window.browsergym_page_activated(); window.pageX = event.clientX; window.pageY = event.clientY;}, {capture: true}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clean and simple, I like this
Returns: | ||
An array of the x and y coordinates of the mouse location. | ||
""" | ||
position = page.evaluate("""() => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will work for simple pages, but I'm worried about iframes. Here is something that could work:
- in the JS callback (mousemove), record the position in JS in the
window
object, and also record which page / frame received this event, in Python with a method similar to_activate_page_from_js()
. - to extract the mouse position in the browser viewport, take the latest mouse position (last iframe that received a mousemove event), and work your way up the frame hierarchy to reconstruct the current mouse position. See how we do that to get the coordinates of all elements in all iframes here:
tests/core/test_actions_highlevel.py
Outdated
@@ -1141,6 +1142,7 @@ def get_checkbox_elem(obs): | |||
|
|||
obs, reward, term, trunc, info = env.step(action) | |||
checkbox = get_checkbox_elem(obs) | |||
assert obs['mouse_position'] == [x, y] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good test, can you do the same for other pages which have iFrames, and check you get the correct coordinates when clicking on elements inside the iframe? (clicking with coordinates, and with bid)
cd33d61
to
a701498
Compare
BTW, a cool way to try this feature is to run an openended agent on a whiteboard and ask it to draw simple forms, like we did for the demo video here |
Seems like there is pageX, pageY but also clientX, clientY https://developer.mozilla.org/en-US/docs/Web/API/MouseEvent/clientX Only way to know how / which one of these to use is to write some tests :) |
From the blog seems like clientX/clientY is relative to viewport, and pageX/pageY is relative to the whole webpage. I think clientX/clientY is closer to what we want 🤔 |
Hi, thanks for the project! I'm trying to implement and experiment with coordinate-based actions from
browsergym
and it would be useful if the environment exposes this info via the observation. Not sure what the team thinks about this?One quirk is seems like there're no direct ways to get the mouse position from Playwright so I use a kinda hacky way to get that info.