Status: v0.8.1 shipped 2026-06-03. Tagged v0.8.1 on
ArkinLaboratory/beril-presentation-maker-skill. Operator the
skill expects: BERIL deployment with .claude/skills/, .env
with CBORG_API_KEY, optional GOOGLE_AI_STUDIO_API_KEY for
image-gen.
Reader: production engineer inheriting maintenance + ops for this skill. This doc tells you the surface you're taking on, what's stable enough to depend on, what's known-deferred, and where the support seams are.
For end-user docs see README.md / TUTORIAL.md / HUB_INSTALL.md. For interop schemas see CONTRACT.md. For design rationale see SPEC.md / LAYOUT.md / DECISIONS.md. For per-release narrative see RELEASE_NOTES.md.
A pipx-installable Python package that drops a Claude Code skill
into a BERIL deployment under .claude/skills/beril-presentation-maker/.
The skill drafts KBase-branded scientific presentations (talks +
posters) from BERDL analysis projects, running a 14-17 stage LLM
pipeline against the project's REPORT.md, RESEARCH_PLAN.md,
figures/, and notebooks/.
Shape (immutable across v0.x):
- Python package at
src/beril_presentation_maker/ships skill data (prompts, tools, references, tests/fixtures) as package data viaimportlib.resources. beril-presentation-maker install-skill <BERIL_ROOT>copies shipped data into<BERIL_ROOT>/.claude/skills/beril-presentation-maker/.- Bash orchestrator (
tools/presentation_maker.sh) sequences the pipeline stages; each stage invokesclaude -pfor an LLM pass (substory_design, slide_compose, etc.) or runs a Python tool (curate_figures, citation_pool, validate_presentation). - Per-draft output lives under
<BERIL_ROOT>/projects/<id>/talks/draft_N/in a 4-zone layout:deliverable/(audience-facing pptx + speaker notes),narrative/(decisions + throughline),working/(slide_spec.json + intermediate fragments),audit/(logs, cost, review outputs). - Default invocation:
beril-presentation-maker draft <project_id>. Resumes:--resume-from <stage> --draft-dir <draft_N>.
Cross-skill dependencies:
| Skill | Version | Optional? | What this skill consumes |
|---|---|---|---|
| beril-adversarial | v0.7.0.8+ | Optional (--no-adversarial to skip) |
adversarial review --type presentation for the review-rewrite loop |
| beril-atlas | (operational) | Optional | Observability; this skill doesn't depend on atlas at runtime |
| beril-paper-writer | v1.0.0+ | Optional | If a sibling papers/draft_N/ exists, citation_pool reuses its citations (D-009 "reuse-from-paper") |
No other external service dependencies beyond Claude (Anthropic or CBORG-gateway) for LLM + Google AI Studio OR CBORG-Gemini for image-gen.
The following surface is stable across v0.x and intended to remain stable through v1.0:
beril-presentation-maker install-skill <BERIL_ROOT>— idempotent skill data deployment.beril-presentation-maker configure— environment + dependency verification (advisory; doesn't block).beril-presentation-maker draft <project_id>— full pipeline. Flag set:--mode {talk-30,talk-15,talk-45,lightning-5,poster-h,poster-v},--tier {STRONG,THIN,EXPLORATORY},--prompts-version {v1,v2,v3,v3.1,v3.2,v3.3},--architecture-pipeline {v0_3,v0_4},--resume-from <stage>,--draft-dir <path>,--auto-advance,--no-images,--auto-approve-images,--max-image-cost-usd,--max-image-approvals,--max-revise-cost-usd,--max-revisions,--revise-severity-floor {P0,P1,P2},--visual-qa,--no-visual-qa,--image-provider {auto,cbio,google-ai-studio},--no-adversarial,--model <claude-model>,--no-stream.
<BERIL_ROOT>/projects/<id>/talks/draft_N/
├── deliverable/ # audience-facing
│ ├── draft.pptx
│ └── speaker-notes.md
├── narrative/ # decision artifacts
│ ├── 00_throughline.md
│ ├── 02_substories.md
│ └── ...
├── working/ # machine-readable intermediates
│ ├── slide_spec.json # the canonical structured deck
│ ├── citation_pool.json
│ ├── 03_slides/ # per-substory compose fragments
│ ├── 04_speaker_notes/ # per-substory speaker notes
│ └── 05_images/ # AI-generated images + manifest
└── audit/ # logs + cost + review
├── adversarial_review.{json,md}
├── review_cascade.{json,md}
├── content_overflow.json # v0.8.0+
├── layout_overlaps.json # v0.8.0+
├── visual_qa.{json,md}
├── visual_qa_final.{json,md} # v0.8.0+
├── revise_loop_metadata.json
├── state.json
└── snapshots/ # pre-revise spec backups
This contract has been stable since v0.3.1 (3+ months of operational
use). The audit/*.json artifacts are versioned schemas (see
CONTRACT.md §3 for shape pins).
slide_spec.v1— the canonical deck representation. 17-layout closed vocabulary. Stable.adversarial-review-presentation.v3— consumer schema from beril-adversarial-skill. Pinned to v3 withcentral_objectionrename +citation_realityrouting.compose-fragment.v1/compose-fragment.v2— per-substory composer fragments. v2 is the v0.4 fused-notes shape.layout-overlaps.v1— deterministic overlap detector output (v0.8.0 G.10-A).content-overflow.v1— renderer-emitted overflow findings (v0.8.0 G.10-C).review-cascade.v1— tiered review aggregator output.
Adding fields to any of these is non-breaking. Adding or removing keys or changing semantics requires a schema-version bump + dual support window.
These are documented gaps the production team should know about. None are blockers; all are documented as v0.9+ work.
- Pre-v0.8.0 saved fragments may contain
---artifact —working/03_slides/deck_close.jsonfrom old drafts can haveforward_call: "---"(an extractor bug from REPORT.md HR capture). v0.8.0's extractor fix prevents this on new drafts; old drafts may need a one-line hand-fix toforward_call.
These have been deferred across multiple v0.x cycles because the load-bearing complaint kept shifting elsewhere. Each is real work but none has been the operator's top complaint:
- Per-arc figure clustering (v0.7 carry).
relevant_figure_not_usedfinding fires but doesn't enforce ARC-level figure placement. - Composer-side cross_tenant grounding omissions (v0.7 carry). Advisory soft-warnings; the composer can be nudged to enumerate every K-BERDL DB explicitly.
- Retraction-aware composer /
discarded_results.mdfilter (v0.5.1 carry). No recurrence in v0.6+ reads; defer until it surfaces again. - Compression / mode-budget heuristics (v0.5.1 carry). Decks can overshoot the mode-N slide budget; advisory soft-warning surfaces this but no auto-compress path.
These would be material new work, not incremental:
- LLM-driven layout patches (Tier G.10 scoping discussion). Defer until deterministic G.10-A overlap findings show what the geometric resolver can't handle. As of v0.8.1, most overlaps resolve via content_overflow → revise loop; the LLM-patch path may not be needed.
- Real font-metrics text measurement (replacing the heuristic
_AVG_GLYPH_WIDTH_RATIO). Would require fonttools + PIL + actual font files in the wheel. Defer until heuristic miscalibration is the load-bearing complaint. - Multi-language support (the master is Oxygen-family + the glyph-width calibration is tuned for English). Out of scope for v0.x.
| Stage | Cost | Wall-clock |
|---|---|---|
| Setup (plan + throughline + substory_design + curate + citation_pool + cross_tenant + intro) | $1.50-2.50 | 15-25 min |
| Compose (slide_compose × N substories) | $0.50-1.50 | 4-8 min/substory; parallel on v0.4 |
| QA prep + deck_close + speaker_notes | $0.80-1.20 | 5-10 min |
| Image gen (multi-provider) | $0.10-0.20 | 30-90s per image |
| Merge + assemble + cascade + adversarial | $0.50-1.00 | 2-5 min |
| Revise loop (1st pass) | $0-5 (capped) | 5-15 min |
| Visual QA + 2nd revise pass | $0.50-3 | 5-15 min |
| Total | $3-12 | 45-90 min |
The --max-revise-cost-usd cap (default $5) is the primary cost
control; the revise loop short-circuits when it hits this OR
--max-revisions (default 6). Hub batch use should set
--auto-advance --auto-approve-images to skip interactive gates.
- ~1.5GB pipx venv (python-pptx + Pillow + nbformat + tenacity).
- ~2-3MB per draft (slide_spec.json + audit logs).
- ~500KB per AI-generated image (PNG, ~1024×768 typical).
- Renders cleanly on python 3.10+ (tested 3.10, 3.12, 3.14).
- No GPU required.
- LibreOffice (optional) for PDF export + visual-QA. The skill falls back to PPTX-only without it.
- Draft directories are allocated atomically (
projects/<id>/talks/draft_Npicks the next N via filesystem-level race-safe alloc). - Multiple parallel runs on the same project get distinct
draft_N/directories — no cross-contamination. - No global state in the pipx install; per-user / per-hub-user isolation by filesystem layout.
- Pipeline halt: the orchestrator writes
audit/state.jsonon every stage transition;--resume-from <stage> --draft-dir <path>picks up where it stopped. - Adversarial CLI missing: falls back to "skip review loop" with
a warning. Operator can re-run with
--no-adversarial. - Image-gen failure: the per-image probe + provider auto-discovery catches most cases. Falls back to no-image rendering. Cost cap prevents runaway.
- LLM output malformed (rare): REPAIR_MODE-style bounded retry
inside each stage. After 3 retries the stage fails + writes
partial output to
audit/stages/<stage>/. - slide_spec validation failure: the validator emits structured
ValidatorIssue findings with severity. Hard errors halt assembly;
soft-warnings surface in
audit/presentation_validation.json+ the assembler banner but don't block.
- Deployment to the BERIL hub: pipx install from tag +
install-skillinvocation per host. - Per-tenant configuration:
.envsetup,CBORG_API_KEY/GOOGLE_AI_STUDIO_API_KEYprovisioning. - Operational monitoring: wall-clock anomalies, cost drift,
failure-mode trending (via
audit/runs/run-N/summary.json). - First-line user support: tutorial pointers, common
configuration issues.
HUB_INSTALL.mdcovers the troubleshooting basics. - Incident triage: when the orchestrator halts, the audit
artifacts under
<draft_N>/audit/are the forensic surface. No special tooling needed — JSON files + stderr logs. - Version pinning: when to upgrade pipx-installed version.
Recommend pinning to the latest
v0.x.ytag; bump deliberately per release-notes review.
- Skill code maintenance — bug fixes, schema migrations, cross-skill compat (especially when adversarial-skill upgrades).
- Prompt engineering — v3.x prompt-stack iteration, new
layouts, content-discipline rules. Versioned via
--prompts-versionfor safe rollout. - Image-gen provider integration — Google AI Studio + CBORG multi-provider layer, calibration cost-caps.
- New layout authoring — additions to the 17-layout vocabulary require master-template work + validator updates + per-layout fill handler.
- Cost-model recalibration — image-gen worst-case + token estimates, drift from real-world hub use.
- New audit artifact requests — production team flags
observability needs; vendor adds the artifact + schema-versions
it. Recent example:
audit/content_overflow.json(v0.8.0 G.10-C) emerged from operator-visible regression patterns. - Production incident → reproducer — production team forwards the failing draft directory; vendor reproduces + fixes. The 4-zone layout makes this clean.
- Patch releases (
v0.x.y) when a Tier-I read surfaces actionable carries. Roughly quarterly so far; faster when active development is happening. - Minor releases (
v0.x.0) introduce new pipeline stages, prompt-stack versions, or layout additions. Documented in RELEASE_NOTES.md before release. - Major release (
v1.0.0) is the production-handoff milestone (this document). Bumps imply a deliberate re-evaluation of stable surface.
- All 17-layout fill handlers are stable + tested.
- The 4-zone draft layout has been load-bearing since v0.3.1.
- 1967 unit tests passing as of v0.8.1.
- Multi-project tested: lanthanide_methylotrophy_atlas, ibd_phage_targeting, functional_dark_matter, conservation_vs_fitness, phb_granule_ecology, amr_pangenome_atlas (and earlier projects through the v0.3-v0.7 cycles).
- Cross-skill integration: paper-writer reuse path, adversarial v3 schema, atlas observability all proven on hub deployments.
- Multi-user web wrap (the Agent SDK probe path). This skill is a CLI-style invocation; multi-tenant web service is out of scope.
- Non-English projects (calibration is English-only).
- Non-KBase brand templates (the master is KBase-branded; brand swap requires master rebuild).
For production incidents:
- Capture the failing
<draft_N>/audit/directory. - File issue at
github.com/ArkinLaboratory/beril-presentation-maker-skill/issueswith the draft dir + reproduction command + invocation flags. - Vendor turn-around: P0 (next-day reproduction); P1 (next patch release); P2 (next minor release).
For feature requests:
- Open issue with the operational need + a sample project that demonstrates the gap.
- Vendor triages against the v0.9+ deferred list (§3).
After deploying to a hub:
# 1. Install verified
beril-presentation-maker --version # should print 0.8.1+
# 2. BERIL deployment verified
beril-presentation-maker configure
# Expects: CBORG_API_KEY present, claude on PATH, python-pptx + Pillow
# importable, BERIL_ROOT/.claude/skills/ present.
# 3. Smoke run (small project, no adversarial, capped image cost)
cd "$BERIL_ROOT"
beril-presentation-maker draft <small_project_id> \
--tier STRONG --mode talk-30 \
--auto-advance \
--no-adversarial \
--auto-approve-images \
--max-image-cost-usd 0.20Expected:
- Wall clock 15-25 min, cost $2-4 (no adversarial).
<draft_dir>/deliverable/draft.pptxexists + non-empty (>200KB).<draft_dir>/audit/state.jsonshows all stages reached "complete".- Open the .pptx in PowerPoint/Keynote + verify title + intro + per-substory slides + acknowledgments render.
If any of these fail, escalate per §6.
This document is the v0.8.1 handoff surface. Updates when material changes affect the production-team-owned interface (CLI flags, file layout, schemas, dependencies). Bump alongside RELEASE_NOTES.md when those surfaces drift.