Make a performant WebBrowserAgent class #400

aymeric-roucher · 2025-01-28T16:06:35Z

Web browsing is a highly specific task, requiring very different tools and state than other agentic tasks.
There are two big avenues for developing web browsers:

text-based
vision-based

At the moment, from internal tests, text works better than a raw vision from a base VLM that has no labelling done to help him click on screenshots.
But browser-use has lots of success with such a scaffolding for vision models, so it could be something to try out.

github-project-automation bot added this to Smolagents Roadmap Jan 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make a performant WebBrowserAgent class #400

Make a performant WebBrowserAgent class #400

aymeric-roucher commented Jan 28, 2025

Make a performant WebBrowserAgent class #400

Make a performant WebBrowserAgent class #400

Comments

aymeric-roucher commented Jan 28, 2025