Skip to content

Commit 8d6009d

Browse files
author
Samuele Giampieri
committed
vers 1.7.0
1 parent 75bc9ed commit 8d6009d

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

43 files changed

+1074
-371
lines changed

CHANGELOG.md

+12
Original file line numberDiff line numberDiff line change
@@ -92,3 +92,15 @@ All notable changes to this project will be documented in this file.
9292
- Supports all Ollama model configuration options including context size, repetition penalties, and sampling parameters.
9393
- Enables fine-tuned control over model behavior while maintaining the simplicity of the local integration.
9494
- Configuration options can be set through the API for advanced model tuning.
95+
96+
## [1.7.0] - 2025-03-23
97+
### Added
98+
- **OpenAI Computer Use Integration:**
99+
- Integrated a new tool for browser automation, OpenAI Computer Use, which mirrors the capabilities of the OpenAI Operator. This tool is designed to execute advanced browser automation tasks with enhanced efficiency. You can find the new tool in default_tools.py under the name browser_navigation_cua.
100+
101+
- **WebSocket Integration:**
102+
- Added a WebSocket module to provide real-time updates to the frontend. This integration continuously streams all reasoning steps and backend operations, improving transparency and user interaction.
103+
104+
- **OpenAI Web Search Tool Integration:**
105+
- Integrated a new web search tool that utilizes the Chat Completions API. With this integration, the model retrieves information from the web before responding to queries, leveraging fine-tuned models and tools similar to those used in Search in ChatGPT. You can find the new tool in default_tools.py under the name helper_model_web_search.
106+

Dockerfile

+1
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
3737
libxtst6 \
3838
fonts-noto \
3939
x11-apps \
40+
portaudio19-dev \
4041
&& rm -rf /var/lib/apt/lists/*
4142

4243
WORKDIR /app

README.md

+102-8
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
![AutoCode Agent Global Workflow](./static/images/autocode.png)
33

44
# AutoCodeAgent - An innovative AI agent powered by IntelliChain, Deep Search, and multi-RAG techniques
5-
![version](https://img.shields.io/badge/version-1.6.0-blue)
5+
![version](https://img.shields.io/badge/version-1.7.0-blue)
66

77
## One agent, Infinite possibilities
88
AutoCodeAgent redefines AI-powered problem solving by seamlessly integrating three groundbreaking modes:
@@ -12,7 +12,7 @@ Break down complex tasks with surgical precision through dynamic task decomposit
1212

1313
### Deep Search
1414
Harness the power of autonomous, real-time web research to extract the most current and comprehensive information. Deep Search navigates diverse online sources, transforming raw data into actionable intelligence with minimal human intervention.
15-
15+
1616
### Multi-RAG
1717
Enhance information retrieval through an innovative multi-RAG framework that includes many different RAG techniques. This multi-faceted approach delivers contextually rich, accurate, and coherent results, even when working with varied document types and complex knowledge structures. The incredible innovation is that these RAG techniques have been implemented as tools, so they can be used like any other tool in the project.
1818
You can also benefit from these techniques for educational purposes, as each one is conceptually well-explained in the .ipynb files located in the folders within /tools/rag.
@@ -62,9 +62,13 @@ Description of the tools that are included by default in the project.
6262
LangChain tools are integrated in the project, in this section you will learn how to add them easily.
6363

6464
[SurfAi Integration](#surfai-integration)
65-
NEW! Integration of SurfAi as an Automated Web Navigation Tool (local function type)
65+
Integration of SurfAi as an Automated Web Navigation Tool (local function type)
6666
We have integrated SurfAi into our suite as a powerful automated web navigation tool. This enhancement enables seamless interaction with web pages, efficient extraction of data and images, and supports the resolution of more complex tasks through intelligent automation.
6767

68+
[Computer use OpenAi integration](#computer-use-openai-integration)
69+
We integrated a Computer-Using Agent (CUA) tool in Intellichain to automate computer interactions such as clicking, typing, and scrolling. The tool leverages OpenAI’s visual understanding and decision-making to navigate browsers or virtual machines, extract and analyze data and images. It provides real-time updates and screenshots via WebSocket, streamlining web navigation and data extraction tasks. This integration enhances workflow efficiency significantly.
70+
It operates similarly to Surf-ai but offers enhanced capabilities with a higher success rate for completing tasks.
71+
6872

6973
## Deep Search sections
7074

@@ -176,12 +180,12 @@ If you want to rebuild and restart the application, and optimize docker space:
176180
docker-compose down
177181
docker-compose build --no-cache
178182
docker-compose up -d
179-
docker builder prune -a -f
183+
docker builder prune -a -f
180184
docker logs -f flask_app
181185
```
182186
Is a good idea to always check docker space usage after building and starting the application:
183187
```bash
184-
docker system df
188+
docker system df
185189
```
186190

187191
8. Access the AI Agent chat interface:
@@ -202,7 +206,7 @@ http://localhost:7474/browser/
202206
The backend logic is managed by a Flask application. The main Flask app is located at:
203207
```bash
204208
/app.py
205-
```
209+
```
206210
This file orchestrates the interaction between the AI agent, tool execution, and the integration of various services (like Neo4j, Redis, and Docker containers).
207211

208212
### Frontend chat interface
@@ -310,6 +314,7 @@ A function validator inspects each subtask’s code (via AST analysis) for synta
310314

311315
## All Ways to Add Custom Tools
312316
In addition to the default tools, users can create custom tools by describing the function and specifying the libraries to be used.
317+
You can manage custom tools in the file `/tools/custom_tools.py`.
313318
There are several ways to create custom tools:
314319

315320
1) **ONLY LIBRARY NAME**:
@@ -524,6 +529,7 @@ Discover the capabilities of AutoCodeAgent with those videos:<br>
524529
[Hybrid Vector Graph RAG Video Demo](https://youtu.be/a9Ul6CxYsFM).<br>
525530
[Integration with SurfAi Video Demo 1](https://youtu.be/b5OPk7-FPrk).<br>
526531
[Integration with SurfAi Video Demo 2](https://youtu.be/zpTthh2wKds).<br>
532+
[Integration Open AI Computer use automation](https://youtu.be/A5pjtwJrZx0).<br>
527533
[LangChain Toolbox Video Demo](https://youtu.be/sUKiN_qp750).<br>
528534

529535

@@ -550,10 +556,14 @@ The default tools are pre-implemented and fully functional, supporting the agent
550556
These default tools are listed below and can be found in the file:
551557
/code_agent/default_tools.py
552558

553-
- browser_navigation
559+
- browser_navigation_surf_ai
554560
- integration of SurfAi for web navigation, data and image extraction, with multimodal text + vision capabilities
561+
- browser_navigation_cua
562+
- Computer-Using Agent (CUA) that automates computer interactions like clicking, typing, and scrolling. Leverages OpenAI's visual capabilities to navigate interfaces, extract data, and provide real-time feedback.
555563
- helper_model
556564
- An LLM useful for processing the output of a subtask
565+
- helper_model_web_search
566+
- This new tool provided by OpenAI responds to your requests with information retrieved from the web in real-time.
557567
- ingest_simple_rag
558568
- A tool for ingesting text into a ChromaDB Vector database with simple RAG
559569
- retrieve_simple_rag
@@ -570,7 +580,7 @@ These default tools are listed below and can be found in the file:
570580
- A tool for retrieving the most similar documents to a with the HyDe RAG technique
571581
- retrieve_adaptive_rag
572582
- A tool for retrieving the most similar documents to a with the Adaptive RAG technique
573-
- search_web
583+
- web_search
574584
- A tool for searching information on the web
575585
- send_email
576586
- A tool for sending an email
@@ -609,6 +619,89 @@ http://localhost:6901/vnc.html
609619
```
610620
You can find the screenshots generated during navigation at the following path: /tools/surf_ai/screenshots
611621

622+
Important: To avoid confusion for the planner agent in Intellichain, activate only one browser automation tool at a time.
623+
In the default_tools.py file, set these parameters:
624+
```json
625+
"browser_navigation_surf_ai": True,
626+
"browser_navigation_cua": False,
627+
```
628+
629+
## Computer use OpenAi integration s
630+
631+
OpenAI's Computer-Using Agent (CUA) automates computer interactions like clicking, typing, and scrolling. It leverages visual understanding and intelligent decision-making to efficiently handle tasks typically performed manually.
632+
The tool doesn't just navigate and interact with web pages, but also engages with the user through chat, requesting additional instructions whenever it encounters problems or uncertainties during navigation.
633+
634+
### Integration Overview
635+
636+
CUA integration involves a feedback loop:
637+
638+
1. Setup the Environment (Browser or Virtual Machine).
639+
2. Send Initial Instructions to the CUA model.
640+
3. Process Model Responses for suggested actions.
641+
4. Execute Actions within your environment.
642+
5. Capture Screenshots after actions.
643+
6. Repeat until task completion.
644+
645+
![CUA Workflow](./static/images/cua_diagram.png)
646+
647+
### Video Demo
648+
[Open Ai Computer use automation](https://youtu.be/A5pjtwJrZx0).
649+
650+
For CUA integration, we've implemented a default tool called `browser_navigation_cua` which is available in the `default_tools.py` file. This tool enables automated browser navigation and interaction with web content through OpenAI's Computer-Using Agent capabilities.
651+
652+
```json
653+
{
654+
"tool_name": "browser_navigation_cua",
655+
"lib_names": ["tools.cua.engine", "asyncio"],
656+
"instructions": ("This is an agent that automates browser navigation. Use it to interact with the browser and extract data during navigation.\n"
657+
"From the original prompt, reformulate it with input containing only the instructions for navigation, vision capablity and text data extraction.\n"
658+
"It also has visual capabilities, so it can be used to analyze the graphics of the web page and images.\n"
659+
"For example: \n"
660+
"Initial user prompt: use the browser navigator to go to Wikipedia, search for Elon Musk, extract all the information from the page, and analyze with your vision capability his image, and send a summary of the extracted information via email to [email protected]\n"
661+
"Input prompt for browser navigation: go to Wikipedia, search for Elon Musk, extract all the information from the page, and analyze with your vision capability his image.\n"
662+
"**Never forget important instructions on navigation and data extraction.**"),
663+
"use_exactly_code_example": True,
664+
"code_example": """
665+
def browser_navigation_cua(previous_output):
666+
import asyncio
667+
from tools.cua.engine import CUAEngine
668+
669+
try:
670+
updated_dict = previous_output.copy()
671+
672+
prompt: str = updated_dict.get("prompt", "")
673+
cua_engine = CUAEngine(prompt, session_id, socketio)
674+
final_answer_message: str = asyncio.run(cua_engine.run())
675+
updated_dict["result"] = final_answer_message
676+
return updated_dict
677+
except Exception as e:
678+
logger.error(f"Error browser navigation: {e}")
679+
return previous_output
680+
"""
681+
},
682+
```
683+
684+
Important: To avoid confusion for the planner agent in Intellichain, activate only one browser automation tool at a time.
685+
In the default_tools.py file, set these parameters:
686+
```json
687+
"browser_navigation_surf_ai": False,
688+
"browser_navigation_cua": True,
689+
```
690+
691+
Thanks to the integrated WebSocket, during navigation phases you'll see real-time updates in the chat with all actions performed by the agent on web pages.
692+
The agent thinks intelligently and critically about various pages, and if it encounters problems, it communicates in the chat requesting clarification or additional information to complete a navigation task.
693+
694+
If the tool is invoked, you can view the navigation by accessing:
695+
```bash
696+
http://localhost:6901/vnc.html
697+
```
698+
You can find the screenshots generated during navigation at the following path: /tools/cua/screenshots
699+
To use this tool, make sure you have added your OPENAI_API_KEY in the .env file
700+
701+
For technical documentation on OpenAI's API integration, please refer to:
702+
https://platform.openai.com/docs/guides/tools-computer-use
703+
704+
612705

613706
## LangChain Tools
614707

@@ -650,6 +743,7 @@ In this example, the system would:
650743
This demonstrates the flexibility and strength of LangChain's integration capabilities in orchestrating multiple tools to achieve a complex, multi-step task.
651744

652745

746+
653747
# Deep Search
654748

655749
## Introduction to Deep Search

0 commit comments

Comments
 (0)