Commit 965dce0
feat(v0.20): Phase A pagination + Phase 0 unified scorer
PHASE A (exhaustive recall):
- cursor / has_more / next_cursor / offset on filter_nodes,
aggregate_nodes, top_nodes, join_related (4 structured tools).
Stateless re-call protocol: agent re-issues same tool with
cursor=<prior next_cursor>; pages disjoint, no dedup needed.
- Adaptive turn budget: enumeration markers (모두/전체/목록/리스트/
list all/every/all the/모든-leading) bump max_turns 5→15 so the
agent can walk pagination cursors. Caller max_turns >5 wins.
- Honest truncation signaling: project_tool_result now records
_truncated_from per list (was: bare _trimmed_for_context bool).
- Cursor follow-through guidance added to both AGENT_SYSTEM
prompts with worked multi-step example.
- 14 + 20 + 1 = 35 new tests (pagination contract, classifier,
truncation signal). Full suite: 940 pass.
PHASE 0 (unified validation):
- eval/unified.py: dimension classifier (lang / recall_type /
hop_count / structured_pct / enumeration / cross_domain /
cross_language) + weighted UnifiedScore composite + per-axis
breakdown + per-bench legacy view. CLI scorer + JSON report
+ diff-vs-baseline mode.
- Critical invariant: classifier.enumeration is bit-aligned with
agent_loop._is_enumeration_query (test-locked) so scoring matches
the upstream budget decision.
- 22 new tests covering weight normalisation, axis no-coverage
flagging, multi-language detection, alignment with agent loop.
HONEST MEASUREMENT:
Phase A regresses on per-bench scoring (KRRA Hard 34→31, assort
Hard 30→29) — adaptive budget fires correctly on h012 but agent
paginates wrong-set (phrase hubs, not docs); also deterministic
prompt-shift reroutes h012/h025/h031. Single-bench scoring can't
tell us if enumeration recall up > broad-topical down. The unified
scorer is the path out: Phase 0.3 / 0.4 will fill cross-domain +
cross-language coverage gaps so every future Phase competes on a
single number that captures the real trade-offs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent 108a44a commit 965dce0
9 files changed
Lines changed: 1661 additions & 23 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
9 | 150 | | |
10 | 151 | | |
11 | 152 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
643 | 643 | | |
644 | 644 | | |
645 | 645 | | |
646 | | - | |
647 | | - | |
648 | | - | |
649 | | - | |
650 | | - | |
651 | | - | |
| 646 | + | |
| 647 | + | |
| 648 | + | |
| 649 | + | |
| 650 | + | |
| 651 | + | |
| 652 | + | |
| 653 | + | |
| 654 | + | |
| 655 | + | |
| 656 | + | |
| 657 | + | |
| 658 | + | |
| 659 | + | |
| 660 | + | |
| 661 | + | |
| 662 | + | |
| 663 | + | |
| 664 | + | |
| 665 | + | |
| 666 | + | |
| 667 | + | |
| 668 | + | |
| 669 | + | |
| 670 | + | |
| 671 | + | |
| 672 | + | |
| 673 | + | |
| 674 | + | |
| 675 | + | |
| 676 | + | |
| 677 | + | |
652 | 678 | | |
653 | 679 | | |
654 | 680 | | |
| |||
720 | 746 | | |
721 | 747 | | |
722 | 748 | | |
| 749 | + | |
| 750 | + | |
| 751 | + | |
| 752 | + | |
723 | 753 | | |
724 | 754 | | |
725 | 755 | | |
| |||
759 | 789 | | |
760 | 790 | | |
761 | 791 | | |
| 792 | + | |
| 793 | + | |
| 794 | + | |
| 795 | + | |
762 | 796 | | |
763 | 797 | | |
764 | 798 | | |
| |||
789 | 823 | | |
790 | 824 | | |
791 | 825 | | |
| 826 | + | |
| 827 | + | |
| 828 | + | |
| 829 | + | |
792 | 830 | | |
793 | 831 | | |
794 | 832 | | |
| |||
820 | 858 | | |
821 | 859 | | |
822 | 860 | | |
| 861 | + | |
| 862 | + | |
| 863 | + | |
| 864 | + | |
823 | 865 | | |
824 | 866 | | |
825 | 867 | | |
| |||
1050 | 1092 | | |
1051 | 1093 | | |
1052 | 1094 | | |
| 1095 | + | |
1053 | 1096 | | |
1054 | 1097 | | |
1055 | 1098 | | |
| |||
1065 | 1108 | | |
1066 | 1109 | | |
1067 | 1110 | | |
| 1111 | + | |
1068 | 1112 | | |
1069 | 1113 | | |
1070 | 1114 | | |
| |||
1076 | 1120 | | |
1077 | 1121 | | |
1078 | 1122 | | |
| 1123 | + | |
1079 | 1124 | | |
1080 | 1125 | | |
1081 | 1126 | | |
| |||
1088 | 1133 | | |
1089 | 1134 | | |
1090 | 1135 | | |
| 1136 | + | |
1091 | 1137 | | |
1092 | 1138 | | |
1093 | 1139 | | |
| |||
1198 | 1244 | | |
1199 | 1245 | | |
1200 | 1246 | | |
1201 | | - | |
| 1247 | + | |
| 1248 | + | |
| 1249 | + | |
| 1250 | + | |
| 1251 | + | |
| 1252 | + | |
| 1253 | + | |
| 1254 | + | |
| 1255 | + | |
1202 | 1256 | | |
1203 | 1257 | | |
1204 | 1258 | | |
| |||
1208 | 1262 | | |
1209 | 1263 | | |
1210 | 1264 | | |
1211 | | - | |
| 1265 | + | |
1212 | 1266 | | |
1213 | 1267 | | |
1214 | 1268 | | |
| |||
1443 | 1497 | | |
1444 | 1498 | | |
1445 | 1499 | | |
1446 | | - | |
| 1500 | + | |
| 1501 | + | |
| 1502 | + | |
| 1503 | + | |
| 1504 | + | |
| 1505 | + | |
| 1506 | + | |
| 1507 | + | |
| 1508 | + | |
1447 | 1509 | | |
1448 | 1510 | | |
1449 | 1511 | | |
| |||
1453 | 1515 | | |
1454 | 1516 | | |
1455 | 1517 | | |
1456 | | - | |
| 1518 | + | |
1457 | 1519 | | |
1458 | 1520 | | |
1459 | 1521 | | |
| |||
0 commit comments