You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Web browsing is a highly specific task, requiring very different tools and state than other agentic tasks.
There are two big avenues for developing web browsers:
text-based
vision-based
At the moment, from internal tests, text works better than a raw vision from a base VLM that has no labelling done to help him click on screenshots.
But browser-use has lots of success with such a scaffolding for vision models, so it could be something to try out.
The text was updated successfully, but these errors were encountered:
Web browsing is a highly specific task, requiring very different tools and state than other agentic tasks.
There are two big avenues for developing web browsers:
At the moment, from internal tests, text works better than a raw vision from a base VLM that has no labelling done to help him click on screenshots.
But browser-use has lots of success with such a scaffolding for vision models, so it could be something to try out.
The text was updated successfully, but these errors were encountered: