A skill-specific guide for reading, iterating on, and hand-editing presentation drafts produced by beril-presentation-maker. Assumes you've already installed the four BERIL plug-in skills and have a project in BERIL ready to draft from.
For install + configure + first-time setup, start with the cross-skill PARTICIPANT-RUNBOOK.md (covers all 4 plug-in skills end-to-end). This tutorial is the presentation-maker-specific layer that sits on top.
For operator hub deployment, see HUB_INSTALL.md. For consumer integration with the adversarial reviewer, see CONTRACT.md.
Audience: Researchers using /beril-presentation-maker in Claude Code on the BERIL hub who have already run a draft and want to make the most of it (read the output, iterate cheaply, hand-edit the .pptx, troubleshoot).
Time: 5 minutes to read.
Wall-clock per draft on the hub (verified 2026-05-06 against ibd_phage_targeting talk-45 STRONG end-to-end):
| Mode + Tier | Wall-clock | Cost | Notes |
|---|---|---|---|
talk-30 STRONG, full pipeline |
90-150 min | $8-15 | Stages 1-2 are ~30 min (plan-heavy); stages 3-10 dominate; image-gen + adversarial + revise add ~30 min. |
talk-45 STRONG, full pipeline |
150-220 min (2.5-3.7h) | $12-20 | Larger deck → more LLM calls per stage. Per-substory fan-out (slide_compose + speaker_notes) is the bottleneck. |
talk-30 STRONG, no image-gen, no adversarial |
60-90 min | $4-8 | Cheapest mode. Skip image-gen (--no-images) and adversarial (--no-adversarial). |
lightning-5, any tier |
25-50 min | $1-3 | 5-6 slides; minimal pipeline. |
Stage-1 (plan) alone takes 8-15 min on dense projects with figure-rich REPORT.md / RESEARCH_PLAN.md (it does method extraction + tier scoring + notebook classification + figure inventory in one big LLM call). Don't kill the run if it's "too quiet" for 15 minutes after launch.
Iteration is materially cheaper than fresh runs. The pipeline's prompt-cache reuses ~1M tokens of project context per stage; first-run pays a ~$2-3 cache-creation cost that subsequent runs skip. A revise-loop iteration on the same draft is $0.50-2 and 5-15 min per pass — much cheaper than a full re-run.
Per-revise-loop iteration: $0.50-2 / 5-15 minutes per pass.
Per-image cost: ~$0.03 (multi-provider mix; v0.4 M5b adds
Google AI Studio + CBORG-Gemini path). v0.8.0 D-088 widens
image-gen scope to claim_evidence slides (≥3 bullets) +
technical-specificity judge; default cap --max-image-approvals 4.
Expect 2-4 generated illustrations per talk-30 STRONG draft,
adding ~$0.06-0.12 in image-gen cost plus ~30-60s of latency.
v0.8.0 known limitations (planned for v0.9+ — set expectations now):
- Hand-editing remains useful, mostly cosmetic now. v0.5 D-072 register discipline + v0.6 D-081 figure-utilization contract + v0.8 G.10 deterministic layout pass close most of the v0.3.6 hand-edit list. Remaining hand-fixes are stylistic (operator preference: tighter title wording, slide reordering for talk style). See Known Limitations for the surviving checklist.
- The revise loop is a budgeted feedback loop, not a guarantee.
Adversarial review identifies findings; the revise loop
rewrites the slot per
revise_slide.v1. Both phases cap at--max-revisions(default 6) and--max-revise-cost-usd(default $5.00). If a finding's class is in SURFACE_ONLY (e.g.,throughline,central_objection,citation_reality,unbacked_quantitative) the loop surfaces it but doesn't auto-fix. - Wall-clock varies widely on the hub. Expect 60-120 min for talk-30 STRONG, 90-150 min for talk-45 STRONG. Stage-1 (plan) alone can take 8-10 min on dense projects. Don't kill the run if it's "too quiet" for 10 minutes.
After beril-presentation-maker draft <project_id> finishes, your project has a new talks/draft_N/ directory with a four-zone layout:
talks/draft_N/
├── deliverable/ # what you actually share with humans
│ ├── draft.pptx # ← the slide deck (open in PowerPoint / Keynote / LibreOffice)
│ └── speaker-notes.md # ← speaker notes as readable markdown
├── narrative/ # the *decisions* the writer made, before slides
│ ├── REPORT-skim.md # ← which parts of REPORT.md the writer read
│ ├── throughline.md # ← the chosen scientific throughline (1 of N candidates)
│ └── substory-design.md # ← partition into substories (acts of the talk)
├── working/ # machine-readable internals (you'll mostly skip)
│ ├── slide_spec.json # ← THE structured slide spec — useful for hand-edits
│ ├── citation_pool.md # ← references the writer found
│ └── stage_metadata.json# ← per-stage cost + timing
└── audit/ # everything the writer did and why
├── stages/ # ← per-stage logs (one dir per stage; 17 on talk-30 STRONG)
├── runs/ # ← per-run summaries, cost, adversarial review output
└── manual-edits/ # ← (only created after you hand-edit the .pptx)
Two files are worth opening; the rest is for debugging.
The deliverable/draft.pptx is the obvious one. The non-obvious one is working/slide_spec.json — read on.
The slide_spec is the single source of truth for what's on each slide before it gets rendered into PowerPoint XML. Useful when:
- A slide is structurally wrong and you want to understand what the writer was trying to do.
- You want to revise one slide cheaply (
/beril-presentation-maker-continuereads slide_spec, edits one slide, re-assembles). - You want to hand-edit slide_spec.json directly and re-run
assemble(advanced; see "Iteration" below).
Top-level shape:
{
"schema_version": "1.0",
"project_id": "my_phylo_study",
"mode": "talk-30",
"tier": "STRONG",
"throughline": { "id": "TL2", "punchline": "...", "tier_evidence": "STRONG" },
"substories": [ { "id": "S1", "punchline": "...", "slide_ids": [3,4,5] }, ... ],
"slides": [ ... ]
}Each slide has a layout (one of 16 — see below), an id, an optional substory_id mapping it to a substory, and a layout-specific content block. The validator_status field on each slide records whether the post-checkers (quantitative grounding, figure manifest, etc.) flagged anything during the pipeline.
The slide_spec validator (tools/slide_spec.py) is hand-rolled and authoritative. If you edit slide_spec.json by hand, run python3 src/beril_presentation_maker/skill/tools/slide_spec.py validate <path> first — broken specs fail loud at assemble time and you'll waste a $5-20 re-run.
The presentation maker speaks a vocabulary of 16 slide layouts. Each layout has its own content schema, its own master-template geometry, and its own author rules (in prompts/slide_compose.v1.md).
| Layout | When the writer chooses it | Key content fields |
|---|---|---|
title |
Slide 1 (always) | title, presenter, date, optional subtitle/affiliation/venue |
section_divider |
Substory transitions | punchline, optional substory_number |
big_idea |
Opening claim or key transition; pull-quote feel | title; optional supporting_graphic for banner+image mode |
big_number |
Headline statistic (27M genomes, 90% accuracy) | headline, subtitle, optional sub_pointer / source_footer |
claim_evidence |
The workhorse data slide | title, 1-3 bullets, optional figure + figure_caption + citations |
two_column_compare |
Before/after, control/treatment | left_col_{title,content}, right_col_{title,content} |
data_figure |
One figure dominates the slide | title, figure path, caption (≤280 chars), optional data_source |
data_table |
Ranked top-N or comparison matrix | title, columns (2-6), rows (1-12), optional caption/footnote/highlight_rows |
workflow_diagram |
Methods walkthrough as boxes-and-arrows | title, diagram (nodes + edges), 3 step_captions |
methods_summary |
Parameters/tools list | title, 5-10 bullets, optional tools_versions |
concept_illustration |
AI-generated illustration as a slide | title, image_path, image_prompt, style, provenance |
cross_tenant_integration |
Pulling data from multiple BERDL tenants | title, optional tenant_list / kberdl_db_list / data_flow_diagram |
implications |
"What changes if this is true" | title, 1-3 bullets ({claim, evidence_pointer}) |
acknowledgments |
Last-but-one slide | contributors list, optional funder_logos / tenant_attribution |
references |
Final slide | refs_short (≤8 entries), optional ai_disclosure |
qa_anticipated |
Anticipated Q&A | question, answer_summary, evidence_pointer, optional answer_detail |
Layout selection happens in slide_compose.v1.md based on the substory plan + figure inventory + tier. The writer rarely makes blatant mis-fits but does have known biases — see "Troubleshooting" below.
concept_illustration slides use AI-generated illustrations (CBORG-Gemini, ~$0.014/image). The pipeline's image_gen stage:
- Decides which slides need illustrations (deterministic Python; per-slide).
- Authors an image prompt for each (LLM call against
ai_image_prompt.v1.md). - Approves each image with you, one at a time — by default, interactively.
- Generates approved images and binds them into the slide_spec via the merge stage.
The per-image approval prompt shows you the slide context, the prompt the writer wrote, the estimated cost, and your options:
Slide 7: concept_illustration — "Genome ring opener for Substory 1"
Prompt: "scientific illustration in KBase brand palette, ..."
Style: scientific_illustration | Channel: A | Cost estimate: $0.014
[A]pprove and generate / [R]e-roll prompt / [E]dit prompt / [S]kip slide / [D]efer slide
For non-interactive runs (auto-advance + auto-approve), pass --auto-approve-images --max-image-cost-usd 0.20. The cap is per-image; pipeline halts gracefully if any single image exceeds the cap. Without --auto-approve-images, you'll be prompted; with it but without --max-image-cost-usd, the pipeline assumes infinite budget. Use both for hands-off runs; use neither for full control.
If a generated image looks wrong post-hoc, run /beril-presentation-maker-continue against the draft and use the revise loop to re-roll specific slides — much cheaper than re-running the whole pipeline.
After a draft completes, you have three iteration paths in increasing cost order:
| Path | Cost | When |
|---|---|---|
Hand-edit .pptx in PowerPoint |
$0 | Cosmetic / formatting / typo fixes. Manual edits preserved under audit/manual-edits/ on subsequent revise passes (see below). |
Revise specific slides (/beril-presentation-maker-continue with adversarial review) |
$0.50-2 per pass | Slide content is structurally wrong (weak punchline, missing claim, fabricated quantitative). The revise loop runs adversarial review, picks the worst N findings, re-authors only the affected slides. |
Full re-run from scratch (beril-presentation-maker draft) |
$5-20 | Throughline was wrong (most expensive failure mode). Substory partition is fundamentally off. Tier was mis-detected. Use sparingly; the throughline-pick stage halts for you to confirm before any deep cost is paid. |
The revise loop is the right tool for most "this slide is bad" cases. The full re-run is the right tool only when the story is wrong, not when individual slides are.
The .pptx is yours. Open it, edit freely. Preserve and ship semantics:
- Manual edits are detected via hash-guard. On the next
/beril-presentation-maker-continueor revise pass, the assembler diffs the on-disk.pptxagainst what it would render fromslide_spec.json. Diverging slides are preserved — copied toaudit/manual-edits/draft_N/and the revise loop skips them. You won't lose work. - The slide_spec.json does NOT update from your edits. If you want the structured spec to reflect your hand-edit (e.g., you fixed a punchline by hand and want the next draft to inherit it), you'll need to update slide_spec.json manually. The hand-edit detection is one-way —
.pptxwins, but only for that draft. - Best practice for the May 7 event: finish the pipeline first, then hand-edit only the deliverable
.pptx. Don't try to edit slide_spec.json + re-assemble unless you know what you're doing — it's a faster cycle but easier to break.
Content-level issues (slide writing, not layout — surface across most drafts):
- Process-detail bleed in slide content. The writer often cites internal artifacts (notebook names like
NB04h.ipynb,REPORT.md §Pillar 2, file paths likedata/nb09b.tsv, analysis-layer abbreviations likeA16/H3c/L13) where peer-readable evidence belongs. Find/replace these with cohort name + sample size + primary author/year before showing the deck publicly. Verbatim example: "REPORT.md §Pillar 2 opener #6; NB04h_hmp2_external_replication.ipynb" should become "Lloyd-Price 2019, HMP2 cohort, n=1,627." - Titles often read as category labels, not claims. "Five-layer pipeline" should become "Ecotype-stratified meta-analysis on 8,489 samples yields 6 CD pathobionts." Rewriting titles as claims is a high-value, low-cost hand-edit pass.
- Defensive caveats embedded mid-result. Hedges like "is an upper bound" / "qualitatively robust" / "pending validation" buried in result bullets read as apologies for fragile science. Front-load these as design choices in the methods slide or limitations slide instead.
- Concept illustrations may be absent on figure-rich projects. The image-gen decision layer currently defers most candidate slides to an LLM-judgment layer that hasn't shipped (v0.3.6 → v0.4.x). For talks needing conceptual visuals (mechanism cartoons, framework diagrams, cocktail-strategy schematics), hand-add 1-2 illustration slides post-draft.
Layout-level issues (cosmetic, hand-fixable in PowerPoint):
- Title slide — long throughline punchlines (>200 chars) may render visually cramped. Adjust font size or shorten in the title slide directly.
- References slide —
refs_shortis currently capped at 8; longer bibliographies are truncated upstream. To show more references, copy fromworking/citation_pool.mdand split into a second references slide manually. qa_anticipatedslide — long synthesis-style questions may overflow the question region. Shorten the question or split across two slides.cross_tenant_integrationslide — currently renders as a flat list when nodata_flow_diagramis provided. If that's how yours looks, consider adding a hand-drawn diagram in PowerPoint to convey the integration pattern.
Content issues are tracked at workspace tasks #87-#90; layout issues at #74-#77. All slated for v0.4.x. Until then, the deck is a first draft requiring a hand-edit pass before public presentation.
For cross-skill troubleshooting (pipx install, configure, BERIL_ROOT detection, schema mismatch errors), see PARTICIPANT-RUNBOOK §Appendix A. The items below are presentation-maker-specific.
Pipeline halts cleanly after stage 2 with <draft_dir>/.handoff.json written. This is the expected v0.3.6+ behavior at the throughline-pick gate. Bash exits rc=0 (clean halt, NOT failure) and writes a handoff JSON listing the candidates. To resume, read <draft_dir>/.handoff.json (or working/00_throughline_candidates.md for the full evidence map), pick a candidate, and run beril-presentation-maker continue <draft_dir> --pick TLN. The slash command (/beril-presentation-maker) does this two-stage flow automatically (it reads the handoff, renders the candidates inline as its own message, and resumes on your pick); if you're invoking the CLI directly from a terminal, you do it by hand. For unattended runs (CI, hub-batch), pass --auto-advance to skip the gate entirely and auto-pick TL1.
Pipeline halts at "Pick a throughline (TL1 / TL2 / TL3):" with exit code 1 in a Claude Code background task — pre-v0.3.6 only. Versions before v0.3.6 used a TTY-blocking read </dev/tty at the gate, which fails 100% in TTY-less contexts (Claude Code on the hub auto-backgrounds bash). Upgrade to v0.3.6+ — the TTY block is gone and replaced with the halt-and-handoff pattern above.
Image-gen approval prompts you don't want. You're running interactively but each slide's image-gen approval is breaking your flow. Pass --auto-approve-images --max-image-cost-usd 0.20 to make image-gen non-interactive while still capping cost.
Image-gen budget exceeded mid-pipeline. The pipeline halts gracefully at the next image-gen call after the cap. Resume options: (a) raise the cap and continue, (b) skip remaining concept_illustration slides (they fall back to image-free big_idea layouts), (c) accept the partial draft (the deliverable from a partial pipeline is usable).
concept_illustration selection seems too eager / too sparse. Known bias in the writer's layout selection (tracked as #65). For too-eager: the revise loop can downgrade specific concept_illustration slides to big_idea or claim_evidence. For too-sparse: the writer is conservative on illustrations by design when the project's REPORT.md is figure-rich; consider hand-adding a slide if you really want one.
Caption for a data_figure slide is rejected as too long. Validator caps at 280 chars (v0.3.5+). The error message tells you the actual length; trim the caption, move citations to data_source, or split insight across multiple slides. The revise_slide.v1 prompt knows this rule.
Adversarial review finds the same issue across 3+ slides. That's a deck-level pattern, not a per-slide fix. Run the revise loop once; if the pattern persists across iterations, the throughline or substory partition is the actual root cause — consider a full re-run with --no-adversarial first to get a clean draft, then iterate.
Wall-clock wildly exceeds 60 minutes. Image-gen retries on rate-limit are the most common cause; check audit/stages/image_gen/ logs. Adversarial review on a long deck (35+ slides for talk-45) can also push 90+ minutes. Both are normal under load; not an error condition.
- Cross-skill setup, BERIL workflow, cohort cheat-sheets, recovery patterns → PARTICIPANT-RUNBOOK.md.
- Hub deployment for operators → HUB_INSTALL.md.
- The 14 pipeline stages in detail (with prompts) → SPEC.md §5.
- Runtime contracts (file paths, draft layout, exit codes) → LAYOUT.md.
- Cross-skill interop pinning (presentation-maker as adversarial consumer) → CONTRACT.md.
- Version history → RELEASE_NOTES.md.
- Adversarial reviewer for presentations (the v3 schema this skill consumes) → github.com/kbaseincubator/beril-adversarial-skill/blob/main/TUTORIAL.md.