Skip to content

Feature/0.8.0#6

Merged
hakimjonas merged 13 commits into
mainfrom
feature/0.8.0
May 2, 2026
Merged

Feature/0.8.0#6
hakimjonas merged 13 commits into
mainfrom
feature/0.8.0

Conversation

@hakimjonas
Copy link
Copy Markdown
Owner

@hakimjonas hakimjonas commented May 2, 2026

Summary

Lambë 0.8.0. Element-level shape checking for CSV/TSV output, union headers across heterogeneous-keyed rows, static warnings for provably-empty filters in --explain, and line-aware parser diagnostics.

  • MustBeFlatList rejects list-of-maps and list-of-lists whose cells aren't scalar. Previously these passed the shape check and the writer fell back to Dart's toString(), emitting cells like "[{key: rumil, value: ^0.6.0}, ...]". Defensive _scalarCell guard catches the SAny cases the shape language can't prove.
  • Writer unions headers across rows in first-seen order. [{a:1}, {b:2}] used to lose the b column silently; now renders a,b\n1,\n,2\n.
  • canWriteAs and formatOutput pinned in a 100-case consistency matrix, structural complement to pipe_ops_consistency_test.dart.
  • --explain warns on filter, filter_values, and filter_keys predicates that are provably empty: missing field on a known-closed shape, or a non-boolean predicate shape.
  • Parse errors now report line:column with a gutter-prefixed excerpt and caret; multi-line queries get one line of context on either side. Empty-expression message replaced the 30-token expected-list with parse error: expression is empty. --explain path goes through parseAst so both CLI modes get the same diagnostic.

Three MCP server error sites fixed to use e.message instead of $e (was doubling QueryError: over the CLI's Error: prefix). .pubignore added to keep the published archive at 133 KB (was 3 MB from shipping a stale local lam-mcp AOT; per-platform binaries are built in CI on tag push).

Breaking

  • Pipelines that relied on silent CSV garbage now raise OutputShapeError.
  • Writer output for heterogeneous-keyed list-of-maps is lossless where it was previously lossy.
  • requirementFor(OutputFormat.csv) and requirementFor(OutputFormat.tsv) return MustBeFlatList instead of MustBeList.

Full details in CHANGELOG 0.8.0 section.

Verification

  • 1105 tests green (+134 new)
  • dart analyze --fatal-infos clean
  • dart format --set-exit-if-changed clean
  • dart doc --dry-run 0 warnings
  • pana 160/160
  • dart pub publish --dry-run 0 warnings

Test plan

  • CI green on all platforms
  • Manual smoke: lam --to csv '.dependencies | as(csv) | as(toml) | as(csv)' /tmp/data.yaml raises OutputShapeError (previously emitted garbage)
  • Manual smoke: lam --explain '.users | filter(.missing)' users.json shows the predicate warning
  • Manual smoke: multi-line query with a typo produces a jq-style excerpt with caret
  • After merge and tag, confirm .github/workflows/release.yml builds per-platform lam-mcp and publishes to MCP registry

hakimjonas added 13 commits May 2, 2026 01:52
Opens the 0.8.0 branch with a full handover document covering:

- The concrete CSV round-trip bug driving the element-level shape
  fix (with reproduction, stage-by-stage trace, and the three
  responsible layers identified).
- Option A vs. Option B design decision for element-level shape
  constraints (recommends A: richer ShapeRequirement subclasses).
- Work items in dependency order: driving test, stricter
  MustBeList, defensive _toCsv assertion, consistency matrix
  extension, inner-expression shape validation, parser error
  recovery / multi-line diagnostics.
- Uncommitted work in sibling repos (rem validator rewrite,
  arda-web lambe 0.7.1 cutover) that needs to be decided on
  separately.
- Starting state and launch checklist for the next session.

No code changes yet. Branch base: main at 49b2ad4 (0.7.1 merged).
MustBeList.accepts checks only that the root is a list; it does not
walk element shapes. A list of maps whose cells hold nested lists or
maps slips past the check, and _toCsv falls back to Dart's default
toString() for the cell — producing output like
items,"[{key: rumil, value: ^0.6.0}, ...]".

Pins: crafted list-of-maps with a list-valued cell (and map-valued
cell, and list-of-lists with a nested-list element) must raise
OutputShapeError under both csv and tsv; the end-to-end
.dependencies | as(csv) | as(toml) | as(csv) pipeline must either
succeed with scalar-only cells or raise OutputShapeError — not
silently emit Dart toString() garbage. Legal scalar-cell shapes
continue to round-trip.

7 failing, 4 passing against the current tree.
CSV and TSV project rows onto a flat grid of scalar cells. The old
MustBeList requirement only checked that the root was a list and
never walked the element shape, so a list of maps holding nested
lists or maps slipped past the check. The writer then fell back to
Dart's default toString() and silently emitted garbage like
items,"[{key: rumil, value: ^0.6.0}, ...]" for the real-world chain
.dependencies | as(csv) | as(toml) | as(csv).

MustBeFlatList walks the element shape. It accepts SList<scalar>,
SList<SList<scalar>>, and SList<SMap<k:scalar, ...>>, and defers to
SAny wherever the shape is unknown (the checker cannot prove
incompatibility). requirementFor(csv|tsv) now returns the stricter
class; MustBeList stays in the hierarchy for any future consumer
that wants the permissive form.

The writer also grows a defensive _scalarCell guard. The shape
check is the primary line of defense, but element shapes can
collapse to SAny (empty or heterogeneous lists) and bypass it; in
that case the guard throws QueryError rather than stringify a
non-scalar cell. The handover file calls for QueryError and not
OutputShapeError here because by the time we reach the writer the
shape check has already reported Writable — reaching here means
the shape language was too lossy to prove the mismatch.

MustBeFlatList is exported from the lambe barrel. One shape_check
test that asserted `required is MustBeList` now asserts the
stricter class. Full suite green, 973 tests, zero analyzer issues,
formatter clean.
Adds shape_output_consistency_test.dart, a structural complement to
pipe_ops_consistency_test.dart. The pipe-op matrix pins evaluator
vs. spec; this one pins writer vs. shape check. Together they
prevent a whole class of drift.

For every representative value and every OutputFormat, the test
runs canWriteAs and formatOutput and asserts they agree: Writable
must not produce OutputShapeError; NotWritable must. The writer is
allowed to raise plain QueryError from the defensive _scalarCell
guard (for cases where the shape language is too lossy to prove
the mismatch), but silent stringification — what the original
0.7.1 bug produced — is strictly forbidden. A second group pins
that invariant directly on the two offender shapes.

88 new cases. 1061 tests green, analyzer clean, formatter clean.
filter evaluates `evaluate(predicate, item) == true`. Strict equality
against `true` means that any predicate whose value is not a boolean
is silently rejected — the filter produces an empty list regardless
of input. Without static feedback, a typo like filter(.missing) or a
predicate that forgot its comparison (filter(.name) instead of
filter(.name == "Alice")) looks fine at parse time and silently
produces an empty result at runtime.

explain() now walks each stage before inferring it and flags two
patterns that can only produce an empty filter:

- Missing-field paths. filter(.missing) over an element shape that
  is a known SMap without `missing` — at runtime the field access
  returns null, which never equals true. The check walks Field /
  Access chains segment by segment, so .users | filter(.addr.nope)
  resolves the chain against the element's `addr` subshape.
- Non-boolean predicate shapes. If the inferred predicate shape is
  any concrete non-bool (SNum, SString, SList, SMap, SNull), the
  filter is provably empty. SBool and SAny are left alone: either
  might be true.

The same rules apply to filter_values (predicate sees element values
of the input map) and filter_keys (predicate sees keys, always
strings). sort_by / map / unique_by are intentionally out of scope —
they do not apply a `== true` test, so their predicates have
different correctness rules.

New ExplainWarning type is exported through lib/lambe.dart and
rendered by renderExplain between the stage table and the
writability summary. 12 new tests pin the warning on every affected
op and on both warning modes, and pin the absence of warnings on
valid predicates, SAny input, and non-filter ops. 1073 tests green,
analyzer clean, dart doc clean.
Parse errors now report `line:column` instead of just `column`, and
the source excerpt is a proper context block: the offending line
prefixed with its line number, the caret under the offending column,
and (for multi-line queries) one line of context on either side. The
gutter width adapts to the query's line count. A 1-line query still
produces the existing single-line excerpt with a gutter prefix.

Previously the renderer built one big block by concatenating the
caret line after the full expression, ignoring newlines. For any
multi-line query that placed the caret at column N of the final
visible line regardless of which line the error was actually on —
unusable for heredoc-authored queries or anything coming out of an
editor. `Location.line` was always available on Rumil's ParseError;
the old code just didn't read it.

The CLI's --explain path used to bail out with "Error: failed to
parse query", skipping the rich diagnostic that --to had access to.
It now goes through parseAst so both paths surface the same message.

Also drops the `QueryError: ` prefix-doubling in bin/lam.dart: three
callsites used `'Error: $e'` which invokes QueryError.toString and
added a second `QueryError:` on top of the CLI's `Error:` prefix.
Those now use `e.message`, matching the rest of the file.

Test matrix pins the renderer: single-line + multi-line, error-at-
line-1 / line-mid / line-last, gutter width scaling with line count,
and the did-you-mean hint for mistyped pipe ops. Also pins the
QueryError.message / toString contract so downstream callers know
which to use. 1081 tests green, analyzer clean, dart doc clean.
First review pass on the 0.8.0 work surfaced seven issues. None of
them affect behavior the tests already pin, but they affect what a
reader sees when they open the public API or hit an error.

- formatOutput's docstring in lib/src/output.dart still said CSV/TSV
  accepts "a list of maps or list of lists" with no mention of the
  scalar-cell constraint. Now describes the three accepted element
  shapes and names both error paths (OutputShapeError from the shape
  check, QueryError from the defensive writer guard).
- _scalarCell's error message used cell.runtimeType, so a user saw
  "got _GrowableList" instead of "got list". Now maps List/Map to
  "list"/"map" up front and only falls back to runtimeType for the
  unreachable else branch.
- _predicateWarning hard-coded "element shape" for all three ops.
  For filter_values the predicate sees a value, for filter_keys it
  sees a key. Now takes the domain name as an argument so each op
  reads naturally.
- Two test files had stale docstrings: one referenced a
  runQueryAsString API that never existed (draft rot), one claimed
  "remediation" is offered for the non-scalar-cell case (the handover
  deliberately chose not to emit one). Both corrected.
- Leftover inline comment in csv_element_shape_test.dart removed.
- Attribution fix in shape_output_consistency_test.dart: the silent-
  stringification bug predates 0.7.1 by two releases, not "shipped
  in 0.7.1".

Two new tests pin the polish: the new descriptive cell-kind message,
and the value-domain wording for filter_values warnings. 1083 tests
green, analyzer clean, dart doc clean, pana 160/160.
Two gaps surfaced on the second review pass. Neither indicates a real
bug, but both described behavior more loosely than the tests in the
rest of the file.

- The renderExplain ordering test asserted that Warning and Writable
  lines both appear in the output but said nothing about their
  position. "Between stages and writability" was only the test name,
  not what was pinned. Now compares indexOf values so the stage row,
  the warning block, and the Writable line are pinned in that order.
- The parse-error renderer handles \r\n line endings (regex split on
  \r?\n), but no test exercised that path. Added a direct
  Windows-line-ending case that also pins the CR character does not
  leak into the excerpt.
Third review pass turned up one real user-facing bug and one doc
inaccuracy. Both surface-level, both pinned by existing tests now
that the code is correct.

bin/mcp_server.dart had three `Error: $e` callsites that invoked
QueryError.toString(), producing `Error: QueryError: parse error at
line ...` — the same prefix-doubling the CLI avoided. MCP callers
(typically LLM agents) would see the `QueryError:` token as part of
the human-readable error, which is noise for agents that already
know the error came from the query tool. Now uses `e.message`
consistently with bin/lam.dart.

MustBeList's docstring still claimed it was "used where downstream
code handles arbitrary element shapes", but after 0.8.0 there is no
such downstream code — csv/tsv use the stricter MustBeFlatList, and
no other consumer exists. Rewritten to describe the class honestly:
the generic list-root requirement, retained for future format
additions whose serialization tolerates any element shape.
The writer took headers from the first row only, so
[{a: 1}, {b: 2}] produced "a\n1\n" — the b column was silently
dropped and row 2 rendered as an empty line. Pre-existed 0.8.0 but
belongs to the same class as the non-scalar-cell bug we just
fixed: the writer producing something other than what the input
structurally carries. Leaving it unfixed would have forced a 0.8.1
within weeks of the release.

_toCsv now unions keys across all rows in first-seen insertion
order. Each row renders a cell per header, with empty string when
the row doesn't have that key. Matches pandas.DataFrame.to_csv and
Python's csv.DictWriter(extrasaction='ignore') semantics:
presentation is lossless for scalar-celled data regardless of
whether every row carries every key.

Null values and absent keys both render as empty cells. This
intentional — the CSV format has no distinct "null" token, so the
distinction can't be preserved round-trip anyway. Callers that
need the distinction should use JSON.

Two rows added to shape_output_consistency_test.dart
(disjoint-keys, overlapping-keys) so the matrix covers the new
behavior. Seven new cases in csv_element_shape_test.dart pin the
behavior directly, including a regression guard for the
homogeneous case. 1102 tests green, analyzer clean, pana 160/160.
Running \`lam '' file.json\` surfaced the parser's full list of
expected tokens — every pipe-op name plus 14 punctuation entries,
roughly 30 items — and buried the actual problem (the query is
empty) under the noise. Pre-existed 0.8.0 but became more visible
once the parse-error renderer got the rest of its jq-style
treatment in this release.

_formatParseErrors now short-circuits on expression.trim().isEmpty
with a single line: \`parse error: expression is empty\`. No
expected-list, no line/column (neither meaningful for an empty
input), no caret.

Three tests pin the shape for genuinely-empty, whitespace-only,
and newline-only input. 1105 tests green.
- pubspec: 0.7.1 → 0.8.0
- _version.dart regenerated via tool/gen_version.dart
- doc/lam.1.md front matter: "Lambë 0.7.1" → "0.8.0",
  "April 2026" → "May 2026"; lam.1 regenerated via tool/manpage.dart
- README / doc/getting-started REPL banner bumped to v0.8.0
- manpage_test.dart: TH date check updated to "May 2026"
- CHANGELOG: new 0.8.0 entry with Added / Changed / Breaking sections
- ROADMAP: 0.8.0 marked shipped; 0.9.0 candidates refreshed with the
  new classes surfaced during the 0.8.0 review passes (runtime-failure
  warnings, CSV cell-flattening policy)

.pubignore added to keep the pub archive focused. Previously the
archive shipped a 7MB stale x86_64 AOT of lam-mcp from an April
developer build, plus any locally-generated doc/api/ HTML and any
benchmark artifacts lying around. None of that reaches users:
`dart pub global activate lambe` compiles the MCP server from source
via Dart VM snapshots, and per-platform lam-mcp binaries are built
in CI on tag push (see .github/workflows/release.yml) and published
as GitHub release assets referenced by SHA256 in the MCP registry.
Dropping the stale binary takes the archive from 3 MB to 133 KB.

1105 tests green, analyzer clean, dart doc clean, pana 160/160,
dart pub publish --dry-run clean (bar the uncommitted-changes
warning, which this commit resolves).
…e helpers

Removes em dashes from prose added in this release (they stay in
existing section-heading style like `## 0.8.0 — shipped` to match the
0.7.x entries). Flattens editorial phrasing in CHANGELOG and ROADMAP:
"This is the release's core behavior change", "every behavior change
swapped a silent failure for a loud one", "is the most
identity-defining candidate in the pool" — all cut. The entries
describe what changed and what users need to do about it.

Two private helpers picked up docstrings: `_describeCellKind` in
lib/src/output.dart and `_isScalar` in lib/src/shape/check.dart. The
analyzer only enforces public_member_api_docs on public members, so
these weren't required, but the helpers are non-obvious (why is SAny
"scalar"?) and the one-liner makes the intent explicit.

No behavior change. 1105 tests, analyzer clean, dart doc clean,
pana 160/160, dart pub publish --dry-run 0 warnings.
@hakimjonas hakimjonas merged commit 3029dec into main May 2, 2026
3 checks passed
@hakimjonas hakimjonas deleted the feature/0.8.0 branch May 2, 2026 13:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant