Skip to content

Deduplicate MCP tool names #1571

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Jul 17, 2025
Merged

Deduplicate MCP tool names #1571

merged 16 commits into from
Jul 17, 2025

Conversation

pakrym-oai
Copy link
Collaborator

@pakrym-oai pakrym-oai commented Jul 14, 2025

Store fully qualified names along with tool entries so we don't have to re-parse them.

Fixes: #1289

Copy link

github-actions bot commented Jul 14, 2025

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

- Reformatted code to improve readability and maintain consistent style
- Adjusted line breaks, indentation, and brace placement
- No logic changes, only formatting and style improvements
@pakrym-oai
Copy link
Collaborator Author

I have read the CLA Document and I hereby sign the CLA

github-actions bot added a commit that referenced this pull request Jul 14, 2025
@pakrym-oai pakrym-oai marked this pull request as ready for review July 15, 2025 04:12
Copy link

Summary

This PR de-duplicates MCP tool names by shortening the delimiter to “__”, trimming long names, and appending a random suffix when collisions occur. The connection manager, name-parsing logic, Cargo deps, and a new test suite are updated accordingly.

Review

Nice improvement—unique, spec-compliant tool names with thorough tests.

  • qualify_tools correctly enforces 64-char limit and retries until the name is unique.
  • Added rand dependency and unit tests; coverage looks solid.
  • API surface stays compatible via parse_tool_name, but old try_parse_fully_qualified_tool_name is removed—confirm no external crates relied on it.
  • Minor: consider seeding rand::thread_rng() explicitly for clarity (rand::rng() is fine but less common).
  • Docs/comments are clear; cargo-toml clean-up good.

Overall, well-structured change—ship it after the small naming nit if desired.


View workflow run

@bolinfest
Copy link
Collaborator

I haven't had a chance to review this properly yet, but FYI, you probably have to rebase on 0bc7ee9

format!("{server}{MCP_TOOL_NAME_DELIMITER}{tool}")
}
fn qualify_tools(tools: Vec<ToolInfo>) -> HashMap<String, ToolInfo> {
let mut used_names = HashSet::new();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it could be slightly problematic to add a random suffix at the end of the string because if you are trying to do analysis across rollouts, the tool name will not be consistent. What do you think about the following strategy (now that we are keeping track of used names):

  1. If tool.tool_name exceeds MAX_TOOL_NAME_LENGTH, go to Step 4.
  2. If tool.tool_name is not in used_names, just use tool.tool_name as the qualified name.
  3. If {server_name}__{tool_name} is not in used_names and does not exceed MAX_TOOL_NAME_LENGTH, use that as the qualified name.
    4a. (Option 1) Take the hash (BLAKE3 or SHA256) of {server_name}__{tool_name} and use the 64-digit hex string as the qualified name.
    4b. (Option 2) Take the first 24 characters of tool_name and concatenate it with the SHA1 of {server_name}__{tool_name}, which is a maximum of 64 digits.

Alternatively, we could just always hash, but I expect that makes things harder to debug.

I feel like Option 2 provides a good balance between consistency and readability. What do you think?

Copy link
Collaborator Author

@pakrym-oai pakrym-oai Jul 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with your comment. What I don't fully like in the approach is that tool names are order dependent.

I'd rather always used {server_name}__{tool_name} and if it's too long appended SHA1 of {server_name}__{tool_name}.

How does this sound?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great point about the tool ordering: I like your solution!

# Conflicts:
#	codex-rs/core/src/mcp_connection_manager.rs
- Switch tool name conflict resolution from random suffixes to SHA1 hashes for determinism.
- Names longer than MAX_TOOL_NAME_LENGTH are truncated and suffixed with SHA1 of original.
- Duplicated tool names are now logged and skipped.
- Updated tests for new logic and dropped randomization.
@pakrym-oai pakrym-oai requested a review from bolinfest July 17, 2025 18:14
Copy link
Collaborator

@bolinfest bolinfest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM module the codex.js change!

@@ -28,13 +28,14 @@ const __dirname = path.dirname(__filename);
// For the @native release of the Node module, the `use-native` file is added,
// indicating we should default to the native binary. For other releases,
// setting CODEX_RUST=1 will opt-in to the native binary, if included.
const wantsNative = fs.existsSync(path.join(__dirname, "use-native")) ||
const wantsNative =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an unrelated change: would you mind removing this from the PR?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

format!("{server}{MCP_TOOL_NAME_DELIMITER}{tool}")
}
fn qualify_tools(tools: Vec<ToolInfo>) -> HashMap<String, ToolInfo> {
let mut used_names = HashSet::new();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great point about the tool ordering: I like your solution!

assert_eq!(keys[0].len(), 64);
assert_eq!(
keys[0],
"my_server__extremely_lena02e507efc5a9de88637e436690364fd4219e4ef"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing this an imagining trying to read it in logs or whatever, I wonder whether we would rather use 15 characters of prefix, _ as a delimiter, and then the hash, so this would be:

my_server__extremely_le_a02e507efc5a9de88637e436690364fd4219e4ef

Though I guess that's not materially more readable? Just a thought: fine either way.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd go with existing option and iterate if we feel the pain.

@pakrym-oai pakrym-oai merged commit b95a010 into main Jul 17, 2025
11 checks passed
@pakrym-oai pakrym-oai deleted the pakrym/deduplicate-mcp-tool-names branch July 17, 2025 18:35
@github-actions github-actions bot locked and limited conversation to collaborators Jul 17, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

generated MCP tool names are too long
2 participants