Skip to content

Feat/mcp readability generator#469

Merged
akangsha7 merged 6 commits into
GoogleCloudPlatform:mainfrom
akangsha7:feat/mcp-readability-generator
Jul 1, 2026
Merged

Feat/mcp readability generator#469
akangsha7 merged 6 commits into
GoogleCloudPlatform:mainfrom
akangsha7:feat/mcp-readability-generator

Conversation

@akangsha7

Copy link
Copy Markdown
Collaborator

Adds McpToolsGenerator, the fetch half of the MCP readability check. It
obtains an endpoint's tools and renders them as man-page markup (via the
formatter from #463) for the readability scorer.

The system-under-test is pluggable via tools_source.type:

  • http — live tools/list over Streamable HTTP (official mcp SDK), no auth
  • stdio — launch a local MCP server and speak MCP over its stdio pipes
  • file — load a raw tools/list JSON dump (offline / deterministic runs)

URL normalization adds scheme + /mcp suffix. Registers "mcp_tools" in
get_generator.

Tests

Unit tests for all three sources, URL sanitization, error paths, and the
generator entry point. 22 passed (with the formatter suite).

Note

Stacked on #463 (the man-page formatter). Until #463 merges, this PR's diff
includes that commit; it collapses to just the generator once #463 lands.

Fetch an MCP endpoint's tools and render them as man-page markup for the
readability scorer. The system-under-test is pluggable via tools_source.type:
  - http:  live tools/list over Streamable HTTP (official mcp SDK), no auth
  - stdio: launch a local MCP server and speak MCP over its stdio pipes
  - file:  load a raw tools/list JSON dump (offline / deterministic runs)

URL normalization adds scheme + /mcp suffix. Registers "mcp_tools" in
get_generator. Includes unit tests for all three sources, URL sanitization,
error paths, and the generator entry point.
@akangsha7 akangsha7 requested a review from IsmailMehdi as a code owner June 29, 2026 18:47
@helloeve

Copy link
Copy Markdown
Collaborator

/gcbrun

IsmailMehdi
IsmailMehdi previously approved these changes Jun 29, 2026

@IsmailMehdi IsmailMehdi left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be good to have a look at a run config for this.

@IsmailMehdi

Copy link
Copy Markdown
Collaborator

/gcbrun

Akangsha Goel added 2 commits June 29, 2026 16:42
… exceptions

Bundle illustrative input files under datasets/mcp_readability/ so the readability
eval is documented and the mcp_tools generator has something real to run against:
  - run_config.yaml          sample run config (orchestrator + judge land later)
  - endpoints.yaml           http / stdio / file tools_source examples
  - style_guide.md           rules with priority + {#tag} anchors
  - exceptions.yaml          per-endpoint rule waivers
  - tools_generator.yaml     mcp_tools generator config
  - sample_tools.json        raw tools/list dump for the file source

The file source runs end-to-end against sample_tools.json; YAML/JSON validated.
Per review: an endpoint's source is now defined entirely by tools_source — the
http URL lives in tools_source.url. Drop the top-level endpoint_url fallback from
the http fetch (and its docstring/test); identity is (product_name, endpoint_type).
@akangsha7 akangsha7 force-pushed the feat/mcp-readability-generator branch from 4599056 to f84de29 Compare June 30, 2026 04:21
@akangsha7

Copy link
Copy Markdown
Collaborator Author

Added sample run config

@akangsha7 akangsha7 requested a review from IsmailMehdi June 30, 2026 04:45
akangsha7 pushed a commit to akangsha7/evalbench that referenced this pull request Jul 1, 2026
…s scorer

Adds the MCP readability evaluation on top of the mcp_tools generator (GoogleCloudPlatform#469):

- McpReadabilityOrchestrator (orchestrator: mcp_readability): per endpoint,
  fetch tools via the generator, compute deterministic size metrics, gather
  applicable waivers, and judge the man page against the style guide with an
  LLM; emits one result row per endpoint via the shared reporters.
- McpToolMetricsScorer: deterministic tool count / estimated tokens /
  token-budget usage.
- McpStyleComplianceScorer: LLM judge that scores the man page vs the style
  guide (P0/P1/P2 findings, compliance score, waivers) with JSON output.
- enums + exceptions helpers (endpoint_type / product_name matching, aligned
  to the GoogleCloudPlatform#469 endpoints/exceptions schema and readability_judge run config).
- evalbench.py: dataset_config is now optional and orchestrators may emit
  results without NL2SQL scores (None-guarded reporter writes).

Consumes GoogleCloudPlatform#469's generator/formatter/datasets as-is; drives off GoogleCloudPlatform#469's
run_config.yaml. Tests cover enums, exceptions, metrics, formatter, file-source
generator, scorer parsing, and an offline end-to-end orchestrator run.
# ------------------------------------------------------------------
@staticmethod
def _resolve_source(endpoint: dict) -> dict:
"""The endpoint's ``tools_source`` (or ``{}`` to default to ``http``)."""

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the default to http seems to be a logic in fetch_tools rather than this helper function

@helloeve

helloeve commented Jul 1, 2026

Copy link
Copy Markdown
Collaborator

/gcbrun

@akangsha7 akangsha7 merged commit ab55d7e into GoogleCloudPlatform:main Jul 1, 2026
9 checks passed
@akangsha7 akangsha7 deleted the feat/mcp-readability-generator branch July 1, 2026 15:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants