You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: CHANGELOG.md
+12
Original file line number
Diff line number
Diff line change
@@ -92,3 +92,15 @@ All notable changes to this project will be documented in this file.
92
92
- Supports all Ollama model configuration options including context size, repetition penalties, and sampling parameters.
93
93
- Enables fine-tuned control over model behavior while maintaining the simplicity of the local integration.
94
94
- Configuration options can be set through the API for advanced model tuning.
95
+
96
+
## [1.7.0] - 2025-03-23
97
+
### Added
98
+
-**OpenAI Computer Use Integration:**
99
+
- Integrated a new tool for browser automation, OpenAI Computer Use, which mirrors the capabilities of the OpenAI Operator. This tool is designed to execute advanced browser automation tasks with enhanced efficiency. You can find the new tool in default_tools.py under the name browser_navigation_cua.
100
+
101
+
-**WebSocket Integration:**
102
+
- Added a WebSocket module to provide real-time updates to the frontend. This integration continuously streams all reasoning steps and backend operations, improving transparency and user interaction.
103
+
104
+
-**OpenAI Web Search Tool Integration:**
105
+
- Integrated a new web search tool that utilizes the Chat Completions API. With this integration, the model retrieves information from the web before responding to queries, leveraging fine-tuned models and tools similar to those used in Search in ChatGPT. You can find the new tool in default_tools.py under the name helper_model_web_search.
AutoCodeAgent redefines AI-powered problem solving by seamlessly integrating three groundbreaking modes:
@@ -12,7 +12,7 @@ Break down complex tasks with surgical precision through dynamic task decomposit
12
12
13
13
### Deep Search
14
14
Harness the power of autonomous, real-time web research to extract the most current and comprehensive information. Deep Search navigates diverse online sources, transforming raw data into actionable intelligence with minimal human intervention.
15
-
15
+
16
16
### Multi-RAG
17
17
Enhance information retrieval through an innovative multi-RAG framework that includes many different RAG techniques. This multi-faceted approach delivers contextually rich, accurate, and coherent results, even when working with varied document types and complex knowledge structures. The incredible innovation is that these RAG techniques have been implemented as tools, so they can be used like any other tool in the project.
18
18
You can also benefit from these techniques for educational purposes, as each one is conceptually well-explained in the .ipynb files located in the folders within /tools/rag.
@@ -62,9 +62,13 @@ Description of the tools that are included by default in the project.
62
62
LangChain tools are integrated in the project, in this section you will learn how to add them easily.
63
63
64
64
[SurfAi Integration](#surfai-integration)
65
-
NEW! Integration of SurfAi as an Automated Web Navigation Tool (local function type)
65
+
Integration of SurfAi as an Automated Web Navigation Tool (local function type)
66
66
We have integrated SurfAi into our suite as a powerful automated web navigation tool. This enhancement enables seamless interaction with web pages, efficient extraction of data and images, and supports the resolution of more complex tasks through intelligent automation.
67
67
68
+
[Computer use OpenAi integration](#computer-use-openai-integration)
69
+
We integrated a Computer-Using Agent (CUA) tool in Intellichain to automate computer interactions such as clicking, typing, and scrolling. The tool leverages OpenAI’s visual understanding and decision-making to navigate browsers or virtual machines, extract and analyze data and images. It provides real-time updates and screenshots via WebSocket, streamlining web navigation and data extraction tasks. This integration enhances workflow efficiency significantly.
70
+
It operates similarly to Surf-ai but offers enhanced capabilities with a higher success rate for completing tasks.
71
+
68
72
69
73
## Deep Search sections
70
74
@@ -176,12 +180,12 @@ If you want to rebuild and restart the application, and optimize docker space:
176
180
docker-compose down
177
181
docker-compose build --no-cache
178
182
docker-compose up -d
179
-
docker builder prune -a -f
183
+
docker builder prune -a -f
180
184
docker logs -f flask_app
181
185
```
182
186
Is a good idea to always check docker space usage after building and starting the application:
The backend logic is managed by a Flask application. The main Flask app is located at:
203
207
```bash
204
208
/app.py
205
-
```
209
+
```
206
210
This file orchestrates the interaction between the AI agent, tool execution, and the integration of various services (like Neo4j, Redis, and Docker containers).
207
211
208
212
### Frontend chat interface
@@ -310,6 +314,7 @@ A function validator inspects each subtask’s code (via AST analysis) for synta
310
314
311
315
## All Ways to Add Custom Tools
312
316
In addition to the default tools, users can create custom tools by describing the function and specifying the libraries to be used.
317
+
You can manage custom tools in the file `/tools/custom_tools.py`.
313
318
There are several ways to create custom tools:
314
319
315
320
1)**ONLY LIBRARY NAME**:
@@ -524,6 +529,7 @@ Discover the capabilities of AutoCodeAgent with those videos:<br>
524
529
[Hybrid Vector Graph RAG Video Demo](https://youtu.be/a9Ul6CxYsFM).<br>
525
530
[Integration with SurfAi Video Demo 1](https://youtu.be/b5OPk7-FPrk).<br>
526
531
[Integration with SurfAi Video Demo 2](https://youtu.be/zpTthh2wKds).<br>
532
+
[Integration Open AI Computer use automation](https://youtu.be/A5pjtwJrZx0).<br>
527
533
[LangChain Toolbox Video Demo](https://youtu.be/sUKiN_qp750).<br>
528
534
529
535
@@ -550,10 +556,14 @@ The default tools are pre-implemented and fully functional, supporting the agent
550
556
These default tools are listed below and can be found in the file:
551
557
/code_agent/default_tools.py
552
558
553
-
-browser_navigation
559
+
-browser_navigation_surf_ai
554
560
- integration of SurfAi for web navigation, data and image extraction, with multimodal text + vision capabilities
561
+
- browser_navigation_cua
562
+
- Computer-Using Agent (CUA) that automates computer interactions like clicking, typing, and scrolling. Leverages OpenAI's visual capabilities to navigate interfaces, extract data, and provide real-time feedback.
555
563
- helper_model
556
564
- An LLM useful for processing the output of a subtask
565
+
- helper_model_web_search
566
+
- This new tool provided by OpenAI responds to your requests with information retrieved from the web in real-time.
557
567
- ingest_simple_rag
558
568
- A tool for ingesting text into a ChromaDB Vector database with simple RAG
559
569
- retrieve_simple_rag
@@ -570,7 +580,7 @@ These default tools are listed below and can be found in the file:
570
580
- A tool for retrieving the most similar documents to a with the HyDe RAG technique
571
581
- retrieve_adaptive_rag
572
582
- A tool for retrieving the most similar documents to a with the Adaptive RAG technique
You can find the screenshots generated during navigation at the following path: /tools/surf_ai/screenshots
611
621
622
+
Important: To avoid confusion for the planner agent in Intellichain, activate only one browser automation tool at a time.
623
+
In the default_tools.py file, set these parameters:
624
+
```json
625
+
"browser_navigation_surf_ai": True,
626
+
"browser_navigation_cua": False,
627
+
```
628
+
629
+
## Computer use OpenAi integration s
630
+
631
+
OpenAI's Computer-Using Agent (CUA) automates computer interactions like clicking, typing, and scrolling. It leverages visual understanding and intelligent decision-making to efficiently handle tasks typically performed manually.
632
+
The tool doesn't just navigate and interact with web pages, but also engages with the user through chat, requesting additional instructions whenever it encounters problems or uncertainties during navigation.
633
+
634
+
### Integration Overview
635
+
636
+
CUA integration involves a feedback loop:
637
+
638
+
1. Setup the Environment (Browser or Virtual Machine).
639
+
2. Send Initial Instructions to the CUA model.
640
+
3. Process Model Responses for suggested actions.
641
+
4. Execute Actions within your environment.
642
+
5. Capture Screenshots after actions.
643
+
6. Repeat until task completion.
644
+
645
+

646
+
647
+
### Video Demo
648
+
[Open Ai Computer use automation](https://youtu.be/A5pjtwJrZx0).
649
+
650
+
For CUA integration, we've implemented a default tool called `browser_navigation_cua` which is available in the `default_tools.py` file. This tool enables automated browser navigation and interaction with web content through OpenAI's Computer-Using Agent capabilities.
651
+
652
+
```json
653
+
{
654
+
"tool_name": "browser_navigation_cua",
655
+
"lib_names": ["tools.cua.engine", "asyncio"],
656
+
"instructions": ("This is an agent that automates browser navigation. Use it to interact with the browser and extract data during navigation.\n"
657
+
"From the original prompt, reformulate it with input containing only the instructions for navigation, vision capablity and text data extraction.\n"
658
+
"It also has visual capabilities, so it can be used to analyze the graphics of the web page and images.\n"
659
+
"For example: \n"
660
+
"Initial user prompt: use the browser navigator to go to Wikipedia, search for Elon Musk, extract all the information from the page, and analyze with your vision capability his image, and send a summary of the extracted information via email to [email protected]\n"
661
+
"Input prompt for browser navigation: go to Wikipedia, search for Elon Musk, extract all the information from the page, and analyze with your vision capability his image.\n"
662
+
"**Never forget important instructions on navigation and data extraction.**"),
Important: To avoid confusion for the planner agent in Intellichain, activate only one browser automation tool at a time.
685
+
In the default_tools.py file, set these parameters:
686
+
```json
687
+
"browser_navigation_surf_ai": False,
688
+
"browser_navigation_cua": True,
689
+
```
690
+
691
+
Thanks to the integrated WebSocket, during navigation phases you'll see real-time updates in the chat with all actions performed by the agent on web pages.
692
+
The agent thinks intelligently and critically about various pages, and if it encounters problems, it communicates in the chat requesting clarification or additional information to complete a navigation task.
693
+
694
+
If the tool is invoked, you can view the navigation by accessing:
695
+
```bash
696
+
http://localhost:6901/vnc.html
697
+
```
698
+
You can find the screenshots generated during navigation at the following path: /tools/cua/screenshots
699
+
To use this tool, make sure you have added your OPENAI_API_KEY in the .env file
700
+
701
+
For technical documentation on OpenAI's API integration, please refer to:
@@ -650,6 +743,7 @@ In this example, the system would:
650
743
This demonstrates the flexibility and strength of LangChain's integration capabilities in orchestrating multiple tools to achieve a complex, multi-step task.
0 commit comments