Skip to content

Commit

Permalink
Update doc webbrowser agent
Browse files Browse the repository at this point in the history
  • Loading branch information
aymeric-roucher committed Jan 31, 2025
1 parent 445ee0d commit cac8044
Show file tree
Hide file tree
Showing 2 changed files with 11 additions and 18 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -148,11 +148,11 @@ model = AzureOpenAIServerModel(

## Command Line Interface

You can accomplish multi-step agentic tasks using two commands: `smolagent` and `webagent`. `smolagent` is a more generalist command to run a multi-step `CodeAgent` that can be equipped with various tools, meanwhile `webagent` is an agent equipped with web browsing tools using [helium](https://github.com/helium).
You can run agents from CLI using two commands: `smolagent` and `webagent`. `smolagent` is a generalist command to run a multi-step `CodeAgent` that can be equipped with various tools, meanwhile `webagent` is a specific web-browsing agent using [helium](https://github.com/helium).

**Web Browser Agent in CLI**

`webagent` allows users to automate web browsing tasks. It uses the Helium library to interact with web pages and uses defined tools to browse the web. Read more about it [here](https://github.com/huggingface/smolagents/blob/main/src/smolagents/vision_web_browser.py).
`webagent` allows users to automate web browsing tasks. It uses the [helium](https://github.com/helium) library to interact with web pages and uses defined tools to browse the web. Read more about this agent [here](https://github.com/huggingface/smolagents/blob/main/src/smolagents/vision_web_browser.py).

Run the following command to get started:
```bash
Expand Down
25 changes: 9 additions & 16 deletions docs/source/en/examples/web_browser.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,19 +5,19 @@
In this notebook, we'll create an **agent-powered web browser automation system**! This system can navigate websites, interact with elements, and extract information automatically.

The agent will be able to:
✅ Navigate to web pages
✅ Click on elements
✅ Search within pages
✅ Handle popups and modals
✅ Take screenshots
✅ Extract information

Let's set up this system step by step.
- [x] Navigate to web pages
- [x] Click on elements
- [x] Search within pages
- [x] Handle popups and modals
- [x] Extract information

Let's set up this system step by step!

First, run these lines to install the required dependencies:

```bash
pip install smolagents selenium helium pillow python-dotenv -q
pip install smolagents selenium helium pillow -q
```

Let's import our required libraries and set up environment variables:
Expand Down Expand Up @@ -208,11 +208,4 @@ The system is particularly effective for tasks like:
- Data extraction from websites
- Web research automation
- UI testing and verification
- Content monitoring

Best Practices:
1. Always provide clear, specific instructions
2. Use the screenshot callback for debugging
3. Handle errors gracefully
4. Clean up old screenshots to manage memory
5. Set reasonable step limits for your tasks
- Content monitoring

0 comments on commit cac8044

Please sign in to comment.