-
Notifications
You must be signed in to change notification settings - Fork 204
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
ccdf721
commit 4e5c3c1
Showing
2 changed files
with
195 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,186 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Adding Browsing Capabilities to AG2\n", | ||
"\n", | ||
"Previously, in our [Cross-Framework LLM Tool Integration](https://github.com/ag2ai/ag2/blob/main/notebook/tools_interoperability.ipynb) guide, we combined tools from frameworks like **LangChain**, **CrewAI**, and **PydanticAI** to enhance AG2.\n", | ||
"\n", | ||
"Now, we have taken AG2 to the next level by integrating the [`browser-use`](https://github.com/browser-use/browser-use) framework.\n", | ||
"\n", | ||
"With `browser-use`,your agents can navigate websites, gather dynamic content, and interact with web pages. This opens up new possibilities for tasks like data collection, web automation, and more. \n", | ||
"\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Instalation\n", | ||
"\n", | ||
"To get started with the `browser-use` integration in AG2, follow these steps:\n", | ||
"\n", | ||
"1. Install AG2 with the `browser-use` extra:\n", | ||
" ```bash\n", | ||
" pip install ag2[browser-use]\n", | ||
" ````\n", | ||
"2. Set up Playwright:\n", | ||
" \n", | ||
" ```bash\n", | ||
" playwright install\n", | ||
" ```\n", | ||
"\n", | ||
"You're all set! Now you can start using browsing features in AG2.\n", | ||
"\n", | ||
"## Imports" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import os\n", | ||
"\n", | ||
"from autogen import AssistantAgent, UserProxyAgent\n", | ||
"from autogen.tools.experimental.browser_use import BrowserUseTool" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### Agent Configuration\n", | ||
"\n", | ||
"Configure the agents for the interaction.\n", | ||
"\n", | ||
"- `config_list` defines the LLM configurations, including the model and API key.\n", | ||
"- `UserProxyAgent` simulates user inputs without requiring actual human interaction (set to `NEVER`).\n", | ||
"- `AssistantAgent` represents the AI agent, configured with the LLM settings." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 2, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"config_list = [{\"model\": \"gpt-4o-mini\", \"api_key\": os.environ[\"OPENAI_API_KEY\"]}]\n", | ||
"llm_config = {\n", | ||
" \"config_list\": config_list,\n", | ||
"}\n", | ||
"\n", | ||
"user_proxy = UserProxyAgent(name=\"user_proxy\", human_input_mode=\"NEVER\")\n", | ||
"assistant = AssistantAgent(name=\"assistant\", llm_config=llm_config)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Integrating Web Browsing with BrowserUseTool\n", | ||
"\n", | ||
"\n", | ||
"The `BrowserUseTool` enables agents to interact with web browsers, allowing them to access, navigate, and perform actions on websites as part of their tasks. It acts as a bridge between the language model and the browser, empowering the agent to browse the web, search for information, and interact with dynamic web content.\n", | ||
"\n", | ||
"To see what the agents are doing in real-time, set the `headless` option within the `browser_config` to `False`. This ensures that the browser runs in a visible window, allowing you to observe the agents' interactions with the websites. By default, setting `headless=True` would run the browser in the background without a GUI, useful for automated tasks where visibility is not necessary." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"browser_use_tool = BrowserUseTool(llm_config=llm_config, browser_config={\"headless\": False})\n", | ||
"\n", | ||
"browser_use_tool.register_for_execution(user_proxy)\n", | ||
"browser_use_tool.register_for_llm(assistant)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Initiate Chat\n", | ||
"\n", | ||
"For running the code in Jupyther, use `nest_asyncio` to allow nested event loops.\n", | ||
"\n", | ||
"```bash\n", | ||
"pip install nest_asyncio\n", | ||
"```" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 4, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import nest_asyncio\n", | ||
"\n", | ||
"nest_asyncio.apply()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"The `user_proxy.initiate_chat()` method triggers the assistant to perform a web browsing task, such as searching for \"ag2\" on Reddit, clicking the first post, and extracting the first comment. The assistant then executes the task using the BrowserUseTool and returns the extracted content to the user." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"result = user_proxy.initiate_chat(\n", | ||
" recipient=assistant,\n", | ||
" message=\"Go to Reddit, search for 'ag2' in the search bar, click on the first post and return the first comment.\",\n", | ||
" max_turns=2,\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [] | ||
} | ||
], | ||
"metadata": { | ||
"front_matter": { | ||
"description": "Tools browser-use Integration", | ||
"tags": [ | ||
"tools", | ||
"browser-use", | ||
"webscraping", | ||
"function calling" | ||
] | ||
}, | ||
"kernelspec": { | ||
"display_name": ".venv-browser-use", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.11.8" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |