Feat/mcp readability generator by akangsha7 · Pull Request #469 · GoogleCloudPlatform/evalbench

akangsha7 · 2026-06-29T18:47:30Z

Adds McpToolsGenerator, the fetch half of the MCP readability check. It
obtains an endpoint's tools and renders them as man-page markup (via the
formatter from #463) for the readability scorer.

The system-under-test is pluggable via tools_source.type:

http — live tools/list over Streamable HTTP (official mcp SDK), no auth
stdio — launch a local MCP server and speak MCP over its stdio pipes
file — load a raw tools/list JSON dump (offline / deterministic runs)

URL normalization adds scheme + /mcp suffix. Registers "mcp_tools" in
get_generator.

Tests

Unit tests for all three sources, URL sanitization, error paths, and the
generator entry point. 22 passed (with the formatter suite).

Note

Stacked on #463 (the man-page formatter). Until #463 merges, this PR's diff
includes that commit; it collapses to just the generator once #463 lands.

Fetch an MCP endpoint's tools and render them as man-page markup for the readability scorer. The system-under-test is pluggable via tools_source.type: - http: live tools/list over Streamable HTTP (official mcp SDK), no auth - stdio: launch a local MCP server and speak MCP over its stdio pipes - file: load a raw tools/list JSON dump (offline / deterministic runs) URL normalization adds scheme + /mcp suffix. Registers "mcp_tools" in get_generator. Includes unit tests for all three sources, URL sanitization, error paths, and the generator entry point.

helloeve · 2026-06-29T20:33:09Z

/gcbrun

IsmailMehdi

it would be good to have a look at a run config for this.

IsmailMehdi · 2026-06-29T21:19:27Z

/gcbrun

… exceptions Bundle illustrative input files under datasets/mcp_readability/ so the readability eval is documented and the mcp_tools generator has something real to run against: - run_config.yaml sample run config (orchestrator + judge land later) - endpoints.yaml http / stdio / file tools_source examples - style_guide.md rules with priority + {#tag} anchors - exceptions.yaml per-endpoint rule waivers - tools_generator.yaml mcp_tools generator config - sample_tools.json raw tools/list dump for the file source The file source runs end-to-end against sample_tools.json; YAML/JSON validated.

Per review: an endpoint's source is now defined entirely by tools_source — the http URL lives in tools_source.url. Drop the top-level endpoint_url fallback from the http fetch (and its docstring/test); identity is (product_name, endpoint_type).

akangsha7 · 2026-06-30T04:45:38Z

Added sample run config

…s scorer Adds the MCP readability evaluation on top of the mcp_tools generator (GoogleCloudPlatform#469): - McpReadabilityOrchestrator (orchestrator: mcp_readability): per endpoint, fetch tools via the generator, compute deterministic size metrics, gather applicable waivers, and judge the man page against the style guide with an LLM; emits one result row per endpoint via the shared reporters. - McpToolMetricsScorer: deterministic tool count / estimated tokens / token-budget usage. - McpStyleComplianceScorer: LLM judge that scores the man page vs the style guide (P0/P1/P2 findings, compliance score, waivers) with JSON output. - enums + exceptions helpers (endpoint_type / product_name matching, aligned to the GoogleCloudPlatform#469 endpoints/exceptions schema and readability_judge run config). - evalbench.py: dataset_config is now optional and orchestrators may emit results without NL2SQL scores (None-guarded reporter writes). Consumes GoogleCloudPlatform#469's generator/formatter/datasets as-is; drives off GoogleCloudPlatform#469's run_config.yaml. Tests cover enums, exceptions, metrics, formatter, file-source generator, scorer parsing, and an offline end-to-end orchestrator run.

helloeve · 2026-07-01T14:23:44Z

+    # ------------------------------------------------------------------
+    @staticmethod
+    def _resolve_source(endpoint: dict) -> dict:
+        """The endpoint's ``tools_source`` (or ``{}`` to default to ``http``)."""


nit: the default to http seems to be a logic in fetch_tools rather than this helper function

helloeve · 2026-07-01T14:27:47Z

/gcbrun

akangsha7 requested a review from IsmailMehdi as a code owner June 29, 2026 18:47

Merge branch 'main' into feat/mcp-readability-generator

b3f35b2

Merge branch 'main' into feat/mcp-readability-generator

aebf5b1

IsmailMehdi previously approved these changes Jun 29, 2026

View reviewed changes

akangsha7 dismissed IsmailMehdi’s stale review via 4599056 June 29, 2026 22:59

akangsha7 requested a review from helloeve June 29, 2026 23:12

akangsha7 self-assigned this Jun 29, 2026

Akangsha Goel added 2 commits June 29, 2026 16:42

akangsha7 force-pushed the feat/mcp-readability-generator branch from 4599056 to f84de29 Compare June 30, 2026 04:21

akangsha7 requested a review from IsmailMehdi June 30, 2026 04:45

Merge branch 'main' into feat/mcp-readability-generator

61d7c68

This was referenced Jul 1, 2026

feat(mcp-readability): compliance orchestrator, LLM judge, and metrics scorer #472

Draft

Feat/mcp readability metrics scorer #470

Closed

helloeve approved these changes Jul 1, 2026

View reviewed changes

akangsha7 merged commit ab55d7e into GoogleCloudPlatform:main Jul 1, 2026
9 checks passed

akangsha7 deleted the feat/mcp-readability-generator branch July 1, 2026 15:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat/mcp readability generator#469

Feat/mcp readability generator#469
akangsha7 merged 6 commits into
GoogleCloudPlatform:mainfrom
akangsha7:feat/mcp-readability-generator

akangsha7 commented Jun 29, 2026

Uh oh!

helloeve commented Jun 29, 2026

Uh oh!

IsmailMehdi left a comment

Uh oh!

IsmailMehdi commented Jun 29, 2026

Uh oh!

akangsha7 commented Jun 30, 2026

Uh oh!

helloeve Jul 1, 2026

Uh oh!

helloeve commented Jul 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

akangsha7 commented Jun 29, 2026

Tests

Note

Uh oh!

helloeve commented Jun 29, 2026

Uh oh!

IsmailMehdi left a comment

Choose a reason for hiding this comment

Uh oh!

IsmailMehdi commented Jun 29, 2026

Uh oh!

akangsha7 commented Jun 30, 2026

Uh oh!

helloeve Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

helloeve commented Jul 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants