[WIP]support agent from API_URL_BASE #484

yulangz · 2025-10-27T03:52:12Z

Description

Draft support agent from API_URL_BASE.
Support RL for any Agent framework by setting API_URL_BASE.

Related Issue

Type of Change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Breaking change (fix or feature that would cause existing functionality to not
work as expected)
Documentation update
Code refactoring (no functional changes)
Performance improvement
Test coverage improvement

Checklist

I have read the Contributing Guide
I have run formatting tools (pre-commit or manual)
I have run relevant unit tests and they pass
I have added tests for new functionality
I have updated documentation if needed
My branch is up to date with main
This PR introduces breaking changes (if yes, fill out details below)
If this PR changes documentation, I have built and previewed it locally with
jb build docs
No critical issues raised by AI reviewers (/gemini review)

Breaking Change Details (if applicable):

Additional Context

Need help? Check the Contributing Guide or ask in
GitHub Discussions!

gemini-code-assist · 2025-10-27T03:52:50Z

Summary of Changes

Hello @yulangz, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a robust and flexible architecture for integrating external agent frameworks with the areal inference engine, with a primary focus on enabling Reinforcement Learning. By providing an OpenAI-compatible API proxy, it abstracts the underlying inference engine, allowing agents to interact through a familiar interface. The changes include dynamic configuration injection during agent runs, comprehensive reward management, and utility features like web page caching, all demonstrated through a suite of new agent examples covering search and multi-turn problem-solving.

Highlights

OpenAI API Proxy for Agent Frameworks: Introduced a new ProxyServer that emulates the OpenAI API, allowing external agent frameworks to seamlessly interact with the areal inference engine. This enables a flexible integration point for various agent designs.
Dynamic RunConfig Patching: Implemented AReaLOpenAIClientContext, an asynchronous context manager that dynamically patches the OpenAIRunner.run method. This allows for the merging of RunConfig settings, providing fine-grained control over generation parameters (like stop sequences or max tokens) during agent execution within an RL loop.
Reinforcement Learning Integration: Enhanced the system to support Reinforcement Learning for agents by adding mechanisms to set and retrieve reward signals (set_final_reward, get_final_reward) and manage trajectories within the ProxyServer and ArealOpenAI client. This includes applying reward discounting across turns.
Comprehensive Agent Examples: Added several new examples demonstrating the application of these features, including an ASearcher agent for complex search and web access tasks, and agents for single-turn and multi-turn math problem-solving. These examples showcase how to leverage the new proxy and patching for diverse agent behaviors.
Web Page Caching: Integrated a WebPageCache with thread-safe operations and file-based persistence to improve the efficiency of web access tools by caching frequently retrieved web page content.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a significant feature to support agent-based reinforcement learning by proxying an OpenAI-compatible API. This is a powerful abstraction that enables integration with various agent frameworks. The implementation includes a FastAPI-based proxy server, monkey-patching of the agents library, and several new example training scripts for math and search agents.

The overall approach is sound, but as this is a work-in-progress, there are several areas that need attention. I've identified some potential bugs, resource management issues, and opportunities for code improvement. My detailed comments below address these points, focusing on correctness, maintainability, and robustness.

gemini-code-assist · 2025-10-27T03:54:48Z

areal/utils/stats_logger.py

        if self.config.wandb.wandb_base_url:
            os.environ["WANDB_API_KEY"] = self.config.wandb.wandb_api_key
        if self.config.wandb.wandb_api_key:
            os.environ["WANDB_BASE_URL"] = self.config.wandb.wandb_base_url


The logic for setting the WANDB_API_KEY and WANDB_BASE_URL environment variables appears to be swapped. The API key is set based on the presence of the base URL, and vice versa. This will cause wandb.login() to fail or connect to the wrong endpoint if only one of the two is configured.

Suggested change

if self.config.wandb.wandb_base_url:

os.environ["WANDB_API_KEY"] = self.config.wandb.wandb_api_key

if self.config.wandb.wandb_api_key:

os.environ["WANDB_BASE_URL"] = self.config.wandb.wandb_base_url

if self.config.wandb.wandb_api_key:

os.environ["WANDB_API_KEY"] = self.config.wandb.wandb_api_key

if self.config.wandb.wandb_base_url:

os.environ["WANDB_BASE_URL"] = self.config.wandb.wandb_base_url

gemini-code-assist · 2025-10-27T03:54:48Z

examples/openai-agents/asearcher/train_asearcher_new.py

+                completion_str = resp.final_output
+
+                # agent extracts tool callings from the llm response
+                tool_calls = agent.consume_llm_response(resp, completion_str)


The method agent.consume_llm_response is called with two arguments, resp and completion_str. However, its definition in examples/openai-agents/asearcher/agent/search_agent.py only accepts one argument, completion_text. This will cause a TypeError at runtime.

Suggested change

tool_calls = agent.consume_llm_response(resp, completion_str)

tool_calls = agent.consume_llm_response(completion_str)

gemini-code-assist · 2025-10-27T03:54:48Z

examples/openai-agents/asearcher/train_asearcher_new.py

+                # call tool and compute reward
+                if tool_calls is not None and len(tool_calls) > 0:
+                    tool_call = tool_calls[0]
+                    res = (await self.toolbox.step((qid, [tool_call])))[0]


The run_agent method calls self.toolbox.step, but self.toolbox is not initialized anywhere in the ASearcherAgent class. This will result in an AttributeError at runtime. The SearchToolBox should be initialized in ASearcherAgent.__init__.

gemini-code-assist · 2025-10-27T03:54:49Z

examples/openai-agents/asearcher/train_asearcher_new.py

+            if self.search_only
+            else SEARCH_ACCESS_PROMPT_TEMPLATE
+        )
+        prompt = prompt_template.format(question=data["question"])
+        valid_inst: bool = np.random.uniform(0, 1) <= self.valid_inst_ratio


The arun_episode method in ASearcherWorkflow uses self.search_only and self.valid_inst_ratio, but these attributes are not defined in the class. This will cause an AttributeError. These should be added as parameters to the __init__ method and likely configured in ASearcherRLConfig.

gemini-code-assist · 2025-10-27T03:54:49Z

areal/experimental/openai/proxy.py

+    def __del__(self):
+        """Ensure socket is closed on deletion."""
+        if self.server:
+            self.server.should_exit = True
+            self.server = None
+        if self.sock:
+            self.sock.close()
+            self.sock = None


Using __del__ to close sockets and shut down the uvicorn server is unreliable. The __del__ method is not guaranteed to be called when an object's reference count drops to zero, which can lead to resource leaks (e.g., open sockets, running server threads).

It would be more robust to provide an explicit shutdown method that can be called deterministically by the owner of the ProxyServer instance to ensure resources are properly released.

Suggested change

def __del__(self):

"""Ensure socket is closed on deletion."""

if self.server:

self.server.should_exit = True

self.server = None

if self.sock:

self.sock.close()

self.sock = None

def shutdown(self):

"""Ensure socket is closed on deletion."""

if self.server:

self.server.should_exit = True

self.server = None

if self.sock:

self.sock.close()

self.sock = None

def __del__(self):

self.shutdown()

gemini-code-assist · 2025-10-27T03:54:49Z

areal/utils/network.py

+    except Exception as e:
+        # 如果出现问题，确保关闭socket
+        if "sock" in locals() and sock:
+            sock.close()
+        raise e


In the except block, if socket.socket() fails, the sock variable will not be defined, leading to a NameError when if "sock" in locals() and sock: is executed. It's safer to initialize sock to None before the try block to prevent this crash.

Additionally, the comment on line 88 is in Chinese, which is inconsistent with the English comments in the rest of the file. It should be translated for consistency.

Suggested change

except Exception as e:

# 如果出现问题，确保关闭socket

if "sock" in locals() and sock:

sock.close()

raise e

except Exception as e:

# Ensure the socket is closed if an issue occurs.

if 'sock' in locals() and sock:

sock.close()

raise e

gemini-code-assist · 2025-10-27T03:54:49Z

examples/openai-agents/asearcher/tools/search_tool.py

+    def process_webpage(self, content):
+        keys = [
+            ("title", "title"),
+            ("p", "p"),
+            ("li", "li", lambda c: "\n" not in c),
+            ("td", "td"),
+            ("tr", "tr"),
+        ]
+        content_list = []
+        init_length = len(content)
+        while any([f"<{k[0]}" in content and f"</{k[1]}>" in content for k in keys]):
+            klr = []
+            for k in keys:
+                start = 0
+                # print(k)
+                while True:
+                    ls = [content[start:].find(f"<{k[0]}{c}") for c in [">", " "]]
+                    ls = [l for l in ls if l != -1]
+                    l = -1 if len(ls) == 0 else min(ls)
+                    # print(ls)
+                    if l == -1:
+                        break
+                    l += start
+                    r = content[l:].find(f"</{k[1]}>")
+                    if r == -1:
+                        break
+                    if (len(k) <= 2) or (len(k) >= 3 and k[2](content[l : l + r])):
+                        # print(k, l, l+r)
+                        klr.append((k, l, l + r))
+                        break
+                    start = l + r
+
+            if len(klr) == 0:
+                break
+            klr = sorted(klr, key=lambda x: x[1])
+            k, l, r = klr[0]
+            content_list.append(content[l : r + len(f"</{k[1]}>")])
+            # print(content_list[-1])
+            # input("stop...")
+            if k[0] == "p":
+                content_list[-1] += "\n\n"
+            elif k[0] == "li":
+                content_list[-1] += "\n"
+            content = content[r:]
+        content = "".join(content_list)
+        final_length = len(content)
+        logger.info(
+            f"process the webpage: {init_length} -> {final_length}. {content[:100]}"
+        )
+        return content


The process_webpage method uses string searching (find) and manual slicing to parse HTML content. This approach is very brittle and can easily break with small variations in HTML structure. Using a dedicated HTML parsing library like BeautifulSoup would be far more robust and maintainable.

Here is an example of how you could implement this with BeautifulSoup:

from bs4 import BeautifulSoup def process_webpage(self, content): soup = BeautifulSoup(content, 'html.parser') # Extract text from relevant tags texts = [] for tag in soup.find_all(['title', 'p', 'li', 'td', 'tr']): texts.append(tag.get_text(separator=' ', strip=True)) processed_content = "\n\n".join(texts) logger.info( f"process the webpage: {len(content)} -> {len(processed_content)}. {processed_content[:100]}" ) return processed_content

gemini-code-assist · 2025-10-27T03:54:49Z

examples/openai-agents/multiturn_proxy_math/train_agents_proxy_server.py

+            proxy_thread = threading.Thread(
+                target=self.proxy_server.run, args=(sock,), daemon=True
+            )
+            logger.info(f"[wht debug] Starting proxy server on port {port}")
+            proxy_thread.start()


The ProxyServer is started in a daemon thread, but there is no corresponding mechanism to explicitly shut it down when the training finishes or an error occurs. Relying on __del__ for cleanup is not reliable and can lead to leaked resources like sockets and threads. The MultiturnRLVRAgentWorkflow should manage the lifecycle of the ProxyServer and ensure its shutdown method (which should be implemented) is called.

gemini-code-assist · 2025-10-27T03:54:49Z

areal/experimental/openai/proxy.py

+    # Many of this code are copied from areal/experimental/openai/client.py
+    # I only add lock for thread safety


The comment on line 228 highlights that a significant amount of code for reward and completion management (set_reward, apply_reward_discount, export_completions, etc.) is duplicated from areal/experimental/openai/client.py. This duplication makes the code harder to maintain, as changes will need to be made in two places.

Consider refactoring this logic into a shared CompletionCacheManager class that both ArealOpenAI and ProxyServer can use. ProxyServer could then manage a dictionary of these managers, keyed by task_id, to handle concurrent tasks.

fishcrap and others added 6 commits October 15, 2025 16:01

[FEAT] use openai-agents

cc4798e

use chat_completion

a4c4826

patch agent Runner.run

d538434

fix

41fa14b

draft Asearcher

32446da

support agent from API_URL_BASE

7ab3d64

yulangz had a problem deploying to AReaL-unittests October 27, 2025 03:52 — with GitHub Actions Failure

gemini-code-assist bot reviewed Oct 27, 2025

View reviewed changes

update example agent

e2cad05

yulangz had a problem deploying to AReaL-unittests October 27, 2025 04:10 — with GitHub Actions Failure

yulangz closed this Oct 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[WIP]support agent from API_URL_BASE #484

[WIP]support agent from API_URL_BASE #484

Uh oh!

yulangz commented Oct 27, 2025

Uh oh!

gemini-code-assist bot commented Oct 27, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 27, 2025

Uh oh!

gemini-code-assist bot Oct 27, 2025

Uh oh!

gemini-code-assist bot Oct 27, 2025

Uh oh!

gemini-code-assist bot Oct 27, 2025

Uh oh!

gemini-code-assist bot Oct 27, 2025

Uh oh!

gemini-code-assist bot Oct 27, 2025

Uh oh!

gemini-code-assist bot Oct 27, 2025

Uh oh!

gemini-code-assist bot Oct 27, 2025

Uh oh!

gemini-code-assist bot Oct 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	tool_calls = agent.consume_llm_response(resp, completion_str)
	tool_calls = agent.consume_llm_response(completion_str)

		# Many of this code are copied from areal/experimental/openai/client.py
		# I only add lock for thread safety

Uh oh!

[WIP]support agent from API_URL_BASE #484

[WIP]support agent from API_URL_BASE #484

Uh oh!

Conversation

yulangz commented Oct 27, 2025

Description

Related Issue

Type of Change

Checklist

Additional Context

Uh oh!

gemini-code-assist bot commented Oct 27, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants