Feat/mcp readability generator#469
Merged
akangsha7 merged 6 commits intoJul 1, 2026
Merged
Conversation
Fetch an MCP endpoint's tools and render them as man-page markup for the readability scorer. The system-under-test is pluggable via tools_source.type: - http: live tools/list over Streamable HTTP (official mcp SDK), no auth - stdio: launch a local MCP server and speak MCP over its stdio pipes - file: load a raw tools/list JSON dump (offline / deterministic runs) URL normalization adds scheme + /mcp suffix. Registers "mcp_tools" in get_generator. Includes unit tests for all three sources, URL sanitization, error paths, and the generator entry point.
Collaborator
|
/gcbrun |
IsmailMehdi
previously approved these changes
Jun 29, 2026
IsmailMehdi
left a comment
Collaborator
There was a problem hiding this comment.
it would be good to have a look at a run config for this.
Collaborator
|
/gcbrun |
added 2 commits
June 29, 2026 16:42
… exceptions
Bundle illustrative input files under datasets/mcp_readability/ so the readability
eval is documented and the mcp_tools generator has something real to run against:
- run_config.yaml sample run config (orchestrator + judge land later)
- endpoints.yaml http / stdio / file tools_source examples
- style_guide.md rules with priority + {#tag} anchors
- exceptions.yaml per-endpoint rule waivers
- tools_generator.yaml mcp_tools generator config
- sample_tools.json raw tools/list dump for the file source
The file source runs end-to-end against sample_tools.json; YAML/JSON validated.
Per review: an endpoint's source is now defined entirely by tools_source — the http URL lives in tools_source.url. Drop the top-level endpoint_url fallback from the http fetch (and its docstring/test); identity is (product_name, endpoint_type).
4599056 to
f84de29
Compare
Collaborator
Author
|
Added sample run config |
This was referenced Jul 1, 2026
akangsha7
pushed a commit
to akangsha7/evalbench
that referenced
this pull request
Jul 1, 2026
…s scorer Adds the MCP readability evaluation on top of the mcp_tools generator (GoogleCloudPlatform#469): - McpReadabilityOrchestrator (orchestrator: mcp_readability): per endpoint, fetch tools via the generator, compute deterministic size metrics, gather applicable waivers, and judge the man page against the style guide with an LLM; emits one result row per endpoint via the shared reporters. - McpToolMetricsScorer: deterministic tool count / estimated tokens / token-budget usage. - McpStyleComplianceScorer: LLM judge that scores the man page vs the style guide (P0/P1/P2 findings, compliance score, waivers) with JSON output. - enums + exceptions helpers (endpoint_type / product_name matching, aligned to the GoogleCloudPlatform#469 endpoints/exceptions schema and readability_judge run config). - evalbench.py: dataset_config is now optional and orchestrators may emit results without NL2SQL scores (None-guarded reporter writes). Consumes GoogleCloudPlatform#469's generator/formatter/datasets as-is; drives off GoogleCloudPlatform#469's run_config.yaml. Tests cover enums, exceptions, metrics, formatter, file-source generator, scorer parsing, and an offline end-to-end orchestrator run.
helloeve
approved these changes
Jul 1, 2026
| # ------------------------------------------------------------------ | ||
| @staticmethod | ||
| def _resolve_source(endpoint: dict) -> dict: | ||
| """The endpoint's ``tools_source`` (or ``{}`` to default to ``http``).""" |
Collaborator
There was a problem hiding this comment.
nit: the default to http seems to be a logic in fetch_tools rather than this helper function
Collaborator
|
/gcbrun |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds
McpToolsGenerator, the fetch half of the MCP readability check. Itobtains an endpoint's tools and renders them as man-page markup (via the
formatter from #463) for the readability scorer.
The system-under-test is pluggable via
tools_source.type:tools/listover Streamable HTTP (officialmcpSDK), no authtools/listJSON dump (offline / deterministic runs)URL normalization adds scheme +
/mcpsuffix. Registers"mcp_tools"inget_generator.Tests
Unit tests for all three sources, URL sanitization, error paths, and the
generator entry point.
22 passed(with the formatter suite).Note
Stacked on #463 (the man-page formatter). Until #463 merges, this PR's diff
includes that commit; it collapses to just the generator once #463 lands.