Skip to content

Deduplicate MCP tool names #1571

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Jul 17, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
16 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 66 additions & 0 deletions codex-rs/Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions codex-rs/core/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ rand = "0.9"
reqwest = { version = "0.12", features = ["json", "stream"] }
serde = { version = "1", features = ["derive"] }
serde_json = "1"
sha1 = "0.10.6"
strum_macros = "0.27.1"
thiserror = "2.0.12"
time = { version = "0.3", features = ["formatting", "local-offset", "macros"] }
Expand Down
3 changes: 1 addition & 2 deletions codex-rs/core/src/codex.rs
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,6 @@ use crate::exec::process_exec_tool_call;
use crate::exec_env::create_env;
use crate::flags::OPENAI_STREAM_MAX_RETRIES;
use crate::mcp_connection_manager::McpConnectionManager;
use crate::mcp_connection_manager::try_parse_fully_qualified_tool_name;
use crate::mcp_tool_call::handle_mcp_tool_call;
use crate::models::ContentItem;
use crate::models::FunctionCallOutputPayload;
Expand Down Expand Up @@ -1292,7 +1291,7 @@ async fn handle_function_call(
handle_container_exec_with_params(params, sess, sub_id, call_id).await
}
_ => {
match try_parse_fully_qualified_tool_name(&name) {
match sess.mcp_connection_manager.parse_tool_name(&name) {
Some((server, tool_name)) => {
// TODO(mbolin): Determine appropriate timeout for tool call.
let timeout = None;
Expand Down
171 changes: 151 additions & 20 deletions codex-rs/core/src/mcp_connection_manager.rs
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
//! `"<server><MCP_TOOL_NAME_DELIMITER><tool>"` as the key.

use std::collections::HashMap;
use std::collections::HashSet;
use std::time::Duration;

use anyhow::Context;
Expand All @@ -16,8 +17,12 @@ use codex_mcp_client::McpClient;
use mcp_types::ClientCapabilities;
use mcp_types::Implementation;
use mcp_types::Tool;

use sha1::Digest;
use sha1::Sha1;
use tokio::task::JoinSet;
use tracing::info;
use tracing::warn;

use crate::config_types::McpServerConfig;

Expand All @@ -26,7 +31,8 @@ use crate::config_types::McpServerConfig;
///
/// OpenAI requires tool names to conform to `^[a-zA-Z0-9_-]+$`, so we must
/// choose a delimiter from this character set.
const MCP_TOOL_NAME_DELIMITER: &str = "__OAI_CODEX_MCP__";
const MCP_TOOL_NAME_DELIMITER: &str = "__";
const MAX_TOOL_NAME_LENGTH: usize = 64;

/// Timeout for the `tools/list` request.
const LIST_TOOLS_TIMEOUT: Duration = Duration::from_secs(10);
Expand All @@ -35,16 +41,42 @@ const LIST_TOOLS_TIMEOUT: Duration = Duration::from_secs(10);
/// spawned successfully.
pub type ClientStartErrors = HashMap<String, anyhow::Error>;

fn fully_qualified_tool_name(server: &str, tool: &str) -> String {
format!("{server}{MCP_TOOL_NAME_DELIMITER}{tool}")
}
fn qualify_tools(tools: Vec<ToolInfo>) -> HashMap<String, ToolInfo> {
let mut used_names = HashSet::new();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it could be slightly problematic to add a random suffix at the end of the string because if you are trying to do analysis across rollouts, the tool name will not be consistent. What do you think about the following strategy (now that we are keeping track of used names):

  1. If tool.tool_name exceeds MAX_TOOL_NAME_LENGTH, go to Step 4.
  2. If tool.tool_name is not in used_names, just use tool.tool_name as the qualified name.
  3. If {server_name}__{tool_name} is not in used_names and does not exceed MAX_TOOL_NAME_LENGTH, use that as the qualified name.
    4a. (Option 1) Take the hash (BLAKE3 or SHA256) of {server_name}__{tool_name} and use the 64-digit hex string as the qualified name.
    4b. (Option 2) Take the first 24 characters of tool_name and concatenate it with the SHA1 of {server_name}__{tool_name}, which is a maximum of 64 digits.

Alternatively, we could just always hash, but I expect that makes things harder to debug.

I feel like Option 2 provides a good balance between consistency and readability. What do you think?

Copy link
Collaborator Author

@pakrym-oai pakrym-oai Jul 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with your comment. What I don't fully like in the approach is that tool names are order dependent.

I'd rather always used {server_name}__{tool_name} and if it's too long appended SHA1 of {server_name}__{tool_name}.

How does this sound?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great point about the tool ordering: I like your solution!

let mut qualified_tools = HashMap::new();
for tool in tools {
let mut qualified_name = format!(
"{}{}{}",
tool.server_name, MCP_TOOL_NAME_DELIMITER, tool.tool_name
);
if qualified_name.len() > MAX_TOOL_NAME_LENGTH {
let mut hasher = Sha1::new();
hasher.update(qualified_name.as_bytes());
let sha1 = hasher.finalize();
let sha1_str = format!("{sha1:x}");

// Truncate to make room for the hash suffix
let prefix_len = MAX_TOOL_NAME_LENGTH - sha1_str.len();

qualified_name = format!("{}{}", &qualified_name[..prefix_len], sha1_str);
}

pub(crate) fn try_parse_fully_qualified_tool_name(fq_name: &str) -> Option<(String, String)> {
let (server, tool) = fq_name.split_once(MCP_TOOL_NAME_DELIMITER)?;
if server.is_empty() || tool.is_empty() {
return None;
if used_names.contains(&qualified_name) {
warn!("skipping duplicated tool {}", qualified_name);
continue;
}

used_names.insert(qualified_name.clone());
qualified_tools.insert(qualified_name, tool);
}
Some((server.to_string(), tool.to_string()))

qualified_tools
}

struct ToolInfo {
server_name: String,
tool_name: String,
tool: Tool,
}

/// A thin wrapper around a set of running [`McpClient`] instances.
Expand All @@ -57,7 +89,7 @@ pub(crate) struct McpConnectionManager {
clients: HashMap<String, std::sync::Arc<McpClient>>,

/// Fully qualified tool name -> tool instance.
tools: HashMap<String, Tool>,
tools: HashMap<String, ToolInfo>,
}

impl McpConnectionManager {
Expand Down Expand Up @@ -141,15 +173,20 @@ impl McpConnectionManager {
}
}

let tools = list_all_tools(&clients).await?;
let all_tools = list_all_tools(&clients).await?;

let tools = qualify_tools(all_tools);

Ok((Self { clients, tools }, errors))
}

/// Returns a single map that contains **all** tools. Each key is the
/// fully-qualified name for the tool.
pub fn list_all_tools(&self) -> HashMap<String, Tool> {
self.tools.clone()
self.tools
.iter()
.map(|(name, tool)| (name.clone(), tool.tool.clone()))
.collect()
}

/// Invoke the tool indicated by the (server, tool) pair.
Expand All @@ -171,13 +208,19 @@ impl McpConnectionManager {
.await
.with_context(|| format!("tool call failed for `{server}/{tool}`"))
}

pub fn parse_tool_name(&self, tool_name: &str) -> Option<(String, String)> {
self.tools
.get(tool_name)
.map(|tool| (tool.server_name.clone(), tool.tool_name.clone()))
}
}

/// Query every server for its available tools and return a single map that
/// contains **all** tools. Each key is the fully-qualified name for the tool.
pub async fn list_all_tools(
async fn list_all_tools(
clients: &HashMap<String, std::sync::Arc<McpClient>>,
) -> Result<HashMap<String, Tool>> {
) -> Result<Vec<ToolInfo>> {
let mut join_set = JoinSet::new();

// Spawn one task per server so we can query them concurrently. This
Expand All @@ -194,18 +237,19 @@ pub async fn list_all_tools(
});
}

let mut aggregated: HashMap<String, Tool> = HashMap::with_capacity(join_set.len());
let mut aggregated: Vec<ToolInfo> = Vec::with_capacity(join_set.len());

while let Some(join_res) = join_set.join_next().await {
let (server_name, list_result) = join_res?;
let list_result = list_result?;

for tool in list_result.tools {
// TODO(mbolin): escape tool names that contain invalid characters.
let fq_name = fully_qualified_tool_name(&server_name, &tool.name);
if aggregated.insert(fq_name.clone(), tool).is_some() {
panic!("tool name collision for '{fq_name}': suspicious");
}
let tool_info = ToolInfo {
server_name: server_name.clone(),
tool_name: tool.name.clone(),
tool,
};
aggregated.push(tool_info);
}
}

Expand All @@ -224,3 +268,90 @@ fn is_valid_mcp_server_name(server_name: &str) -> bool {
.chars()
.all(|c| c.is_ascii_alphanumeric() || c == '_' || c == '-')
}

#[cfg(test)]
#[allow(clippy::unwrap_used)]
mod tests {
use super::*;
use mcp_types::ToolInputSchema;

fn create_test_tool(server_name: &str, tool_name: &str) -> ToolInfo {
ToolInfo {
server_name: server_name.to_string(),
tool_name: tool_name.to_string(),
tool: Tool {
annotations: None,
description: Some(format!("Test tool: {tool_name}")),
input_schema: ToolInputSchema {
properties: None,
required: None,
r#type: "object".to_string(),
},
name: tool_name.to_string(),
},
}
}

#[test]
fn test_qualify_tools_short_non_duplicated_names() {
let tools = vec![
create_test_tool("server1", "tool1"),
create_test_tool("server1", "tool2"),
];

let qualified_tools = qualify_tools(tools);

assert_eq!(qualified_tools.len(), 2);
assert!(qualified_tools.contains_key("server1__tool1"));
assert!(qualified_tools.contains_key("server1__tool2"));
}

#[test]
fn test_qualify_tools_duplicated_names_skipped() {
let tools = vec![
create_test_tool("server1", "duplicate_tool"),
create_test_tool("server1", "duplicate_tool"),
];

let qualified_tools = qualify_tools(tools);

// Only the first tool should remain, the second is skipped
assert_eq!(qualified_tools.len(), 1);
assert!(qualified_tools.contains_key("server1__duplicate_tool"));
}

#[test]
fn test_qualify_tools_long_names_same_server() {
let server_name = "my_server";

let tools = vec![
create_test_tool(
server_name,
"extremely_lengthy_function_name_that_absolutely_surpasses_all_reasonable_limits",
),
create_test_tool(
server_name,
"yet_another_extremely_lengthy_function_name_that_absolutely_surpasses_all_reasonable_limits",
),
];

let qualified_tools = qualify_tools(tools);

assert_eq!(qualified_tools.len(), 2);

let mut keys: Vec<_> = qualified_tools.keys().cloned().collect();
keys.sort();

assert_eq!(keys[0].len(), 64);
assert_eq!(
keys[0],
"my_server__extremely_lena02e507efc5a9de88637e436690364fd4219e4ef"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing this an imagining trying to read it in logs or whatever, I wonder whether we would rather use 15 characters of prefix, _ as a delimiter, and then the hash, so this would be:

my_server__extremely_le_a02e507efc5a9de88637e436690364fd4219e4ef

Though I guess that's not materially more readable? Just a thought: fine either way.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd go with existing option and iterate if we feel the pain.

);

assert_eq!(keys[1].len(), 64);
assert_eq!(
keys[1],
"my_server__yet_another_e1c3987bd9c50b826cbe1687966f79f0c602d19ca"
);
}
}