Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make a performant WebBrowserAgent class #400

Open
aymeric-roucher opened this issue Jan 28, 2025 · 0 comments
Open

Make a performant WebBrowserAgent class #400

aymeric-roucher opened this issue Jan 28, 2025 · 0 comments

Comments

@aymeric-roucher
Copy link
Collaborator

Web browsing is a highly specific task, requiring very different tools and state than other agentic tasks.
There are two big avenues for developing web browsers:

  • text-based
  • vision-based

At the moment, from internal tests, text works better than a raw vision from a base VLM that has no labelling done to help him click on screenshots.
But browser-use has lots of success with such a scaffolding for vision models, so it could be something to try out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

1 participant