Skip to content

Unify examples, templates, and add verbatim golden comparison#35

Merged
marklubin merged 12 commits into
mainfrom
text-golden-comparison
Feb 12, 2026
Merged

Unify examples, templates, and add verbatim golden comparison#35
marklubin merged 12 commits into
mainfrom
text-golden-comparison

Conversation

@marklubin

@marklubin marklubin commented Feb 11, 2026

Copy link
Copy Markdown
Owner

Summary

Overhauls the examples/templates system and adds verbatim stdout/stderr golden comparison to catch UX regressions.

Examples & Templates

  • Rename fixtures/ to sources/ in examples 01 and 02, update all path references
  • Example 01 (chatbot-export-synthesis): new case.py, anonymized ChatGPT/Claude export data, recorded cassettes for pipeline_monthly.py
  • Example 03 (team-report): new README
  • .env.example added to all 3 examples
  • Delete src/synix/templates/init/ replaced by per-example templates synced from examples/
  • scripts/sync-templates copies user-facing files from examples to templates
  • scripts/new-example scaffolds new example directory structure
  • synix init --template / --list rewritten to use per-example templates
  • scripts/prerelease rewritten with sync-templates + example verification
  • CI demo-smoke matrix for all 3 examples

Verbatim Golden Comparison

  • Text golden comparison for every demo step stdout/stderr catches formatting, wording, panel layout, and table structure regressions that JSON-only comparison misses
  • _normalize_output() replaces dynamic values (timing, LLM stats, paths, API key lines, artifact counts) with stable placeholders
  • _show_text_diff() shows unified diff capped at 15 lines, color-coded
  • --update-goldens writes text goldens (.stdout.txt / .stderr.txt) alongside existing JSON goldens
  • Missing golden = warning, not failure (first-run safe)
  • Pin COLUMNS=120 and NO_COLOR=1 in subprocess env for consistent Rich output across machines
  • Sort lineage tree parent IDs for deterministic provenance display

Dev Workflow

  • uv run release full pre-push check suite: sync templates, ruff fix, ruff check, pytest, verify all demos
  • uv run verify-demos standalone demo verification for fast UX feedback
  • Updated CLAUDE.md contributing section

Codebase Formatting

  • Full ruff format pass across src/ and tests/

Test plan

  • 777 tests pass
  • All 3 demos pass text + JSON golden comparison
  • Full release check green
  • Template-example sync test passes
  • Init --list and --template tests pass

- Rename fixtures/ -> sources/ in examples 01 and 02
- Update all path references (pipeline.py, fix_source.py, tape.tape)
- Simplify core layer config (remove Opus, add upgrade comment)
- Update topical pipeline with technical topic list
- Add .env.example to all 3 examples
- Rewrite README for example 01, create README for example 03
- Create case.py for example 01 (plan/build/search/rebuild demo)
- Generate anonymized ChatGPT/Claude export data for example 01
- Record cassettes for example 01 (pipeline_monthly.py)
- Delete old src/synix/templates/init/
- Create scripts/sync-templates (copies user-facing files from examples)
- Create scripts/new-example (scaffolds new example structure)
- Rewrite init_commands.py with --template and --list options
- Rewrite scripts/prerelease with sync-templates + example verification
- Add demo-smoke CI matrix for all 3 examples
- Create tests/unit/test_templates.py (template-example sync tests)
- Add --template/--list assertions to test_init_cli.py
- Sync templates from examples (769 tests passing)
Catch UX regressions (formatting, wording, panel layout) by comparing
normalized stdout/stderr for every demo step against golden files.

- Pin COLUMNS=120 and NO_COLOR=1 in subprocess env for consistent output
- _normalize_output() replaces timing, LLM stats, paths, API key lines,
  and verify counts with stable placeholders
- _show_text_diff() displays unified diff capped at 15 lines
- --update-goldens writes text goldens alongside existing JSON goldens
- Missing golden = warning, not failure (first-run safe)
- Sort lineage tree parent IDs for deterministic output
- Add uv run release and uv run verify-demos entry points
- Update CLAUDE.md contributing section
@marklubin marklubin changed the title Add verbatim stdout/stderr golden comparison to demo run Unify examples, templates, and add verbatim golden comparison Feb 11, 2026
Add normalization for built/cached/new/rebuild status words and
materialization state so goldens match on both fresh CI builds
and incremental local runs.
- Normalize standalone "cached" status word
- Normalize plan "Estimated:" summary line
- Normalize Build Summary table cell numbers
- Fix release() ordering: format before sync-templates
- Combined pattern for cached/new/materialized + indexed count
- Standalone "cached" now normalizes to <MATERIALIZED> (matches materialized)
- Added <N> indexed normalization for search projection plan lines
- Added test for search projection status normalization
- Regenerated all goldens with unified placeholders
- Add cassette miss key hash normalization (<HASH> placeholder)
- Normalize standalone 'built' same as 'cached' → <MATERIALIZED>
- Add tests for both new normalization patterns
- Regenerate all goldens
Collapsed whitespace around <MATERIALIZED> in table cells so
'cached' (6 chars) and 'built' (5 chars) produce identical output.
Goldens were previously generated from cached local state. CI starts
from a clean checkout, producing different output (fresh LLM calls via
cassette replay, different artifact counts, validator results). Delete
build dirs before regenerating to match CI's fresh-build scenario.

Also normalize whitespace around <MATERIALIZED> in table cells.
Comment thread .github/workflows/ci.yml Outdated
runs-on: ubuntu-latest
strategy:
matrix:
example: [examples/01-chatbot-export-synthesis, examples/02-tv-returns, examples/03-team-report]

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just have it enumerate the subdirs of examples so we don't have to keep updating?


## Sample Data

Place your ChatGPT and Claude exports into `sources/`:

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add the specific instructions or reference to the instructions for how to get these files from those platforms. Give a the full list of command they need to run once they get the extracts in their email and download.

Comment on lines +50 to +52
**ChatGPT:** Settings -> Data Controls -> Export data. Download the zip, extract `conversations.json`, and copy it into `sources/`.

**Claude:** Settings -> Export data. Download and copy the JSON file into `sources/`.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's deep link we can use do it. Give the commands for extract.

Comment thread examples/03-team-report/README.md Outdated
synix search 'hiking'
```

## Use Your Own Data

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest example for them to try....create their own bio profile and add themselves to the team...change the project..how ot make those changes to the pipline.

Comment thread examples/03-team-report/README.md Outdated
Comment on lines +33 to +35
synix build pipeline.py
synix validate pipeline.py
synix search 'hiking'

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uvx for all synix command when used in a customer facing context rather than internal dev. Add to claude.md and fix everywhere.

Comment thread scripts/new-example Outdated
\`\`\`
README

echo "Created example at $dir"

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename "example" as "template" we don' want to expose the cassette features as part of the external facing product spec..ONly for internal.

Comment thread scripts/sync-templates
@@ -0,0 +1,25 @@
#!/usr/bin/env bash

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this step why can't we just have the single SoT?

@@ -0,0 +1,11 @@
# Synix uses the API key matching your pipeline's provider setting.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just define this .env.example in a single place and the copy it when we create a project from a template..dont want to have to edit all these each time we change what we support.

# ---------------------------------------------------------------------------

@register_transform("demo_load_product_offers")
class DemoLoadProductOffersTransform(BaseTransform):

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep pipeline.py clean for all these examples and put any custom transform for the examples in their own distinct submodule that get's spun up with the project creation eg src/{my-project}/transforms etc.

- Dynamic CI matrix: discover examples/ subdirs at runtime
- Use `uvx synix` in all customer-facing docs (READMEs, init output)
- README 01: add ChatGPT/Claude export instructions with deep links
- README 03: add try-it-yourself guide with concrete next steps
- Rename scripts/new-example → new-template, remove dev-only scaffolding
- Single .env.example SoT at templates root (remove per-template copies)
- Separate custom transforms into transforms.py (example 02)
- Add explanatory comment to sync-templates
- Regenerate goldens and sync templates
Single naming convention: everything is "templates". The repo-root
templates/ dir is the source of truth; src/synix/templates/ remains
the curated subset bundled in the wheel (sync-templates copies
user-facing files only).

Updated: CI workflow, scripts, pyproject.toml, READMEs, pipeline
comments, test paths, CLAUDE.md, .gitignore.
Regenerated all goldens from clean state.
- Use `uvx synix` consistently in all customer-facing commands
- Remove inaccurate max_length validator reference (it's a custom
  validator in the template, not built-in)
- Fix "all commands default to pipeline.py" — only build/plan/
  validate/fix/clean take a pipeline path
@marklubin marklubin merged commit 156922c into main Feb 12, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant