Skip to content

Conversation

@blink-so
Copy link
Contributor

@blink-so blink-so bot commented Sep 15, 2025

Summary

  • Add NL→SQL E2E evals that exercise the agent end-to-end (NL → SQL → DB)
  • Three tasks against the seeded dataset:
    1. Count field changes for Proj A within the fixed window
    2. Current Status for ITEM_A_1 in Proj A
    3. Deletion event presence (old_value=true, new_value=null)
  • Tests call an OpenAI-compatible gateway using AI_GATEWAY_* envs; output SQL is extracted from code fences and executed with runQuery
  • Per-test timeout set to 30s; tests auto-skip when AI gateway key is missing

CI

  • Pass AI_GATEWAY_* envs through the evals workflow
  • E2E runs alongside contract/scenario tests; skips when key is absent

Config expected

  • Variable or secret: AI_GATEWAY_API_KEY (prefer secret)
  • Variable: AI_GATEWAY_MODEL (e.g., anthropic/claude-sonnet-4)
  • Optional variable: AI_GATEWAY_BASE_URL (default https://api.openai.com/v1)

Notes

  • This measures real NL→SQL behavior; safety rails (SELECT-only, single statement, clamping) are still enforced by runQuery

Co-authored by Matt Vollmer

blink-so bot and others added 9 commits September 15, 2025 20:35
- Introduce tests/e2e_nl_sql.spec.ts with 3 tasks (count changes, current status, deletion events)\n- Uses OpenAI-compatible gateway via AI_GATEWAY_* envs\n- Increases per-test timeout to 30s\n\nci: add AI_GATEWAY_* envs to evals workflow\n- Pass through API key (secret or var), model, and base URL\n- E2E runs with other tests; will skip when gateway key is absent\n\nCo-authored-by: mattvollmer <[email protected]>
…t OpenAI base so E2E skips when unconfigured

- Gate NL→SQL tests on AI_GATEWAY_API_KEY, AI_GATEWAY_BASE_URL, and AI_GATEWAY_MODEL\n- Workflow now passes base URL without default\n\nCo-authored-by: mattvollmer <[email protected]>
…ET placeholders after max index; relax contract regex

Co-authored-by: mattvollmer <[email protected]>
…ms to runQuery; enforce error if either block missing

Co-authored-by: mattvollmer <[email protected]>
… add debug, increase timeouts

- Add scripts test:core and test:e2e\n- Workflow runs core tests as blocking and E2E with continue-on-error\n- Strengthen E2E instructions, add model reply preview and SQL logs\n- Increase test and query timeouts\n\nCo-authored-by: mattvollmer <[email protected]>
@mattvollmer mattvollmer marked this pull request as ready for review September 15, 2025 22:12
@mattvollmer mattvollmer merged commit f260cc2 into main Sep 15, 2025
1 check passed
@mattvollmer mattvollmer deleted the blink/nl2sql-e2e branch September 15, 2025 22:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants