PPT Master is a solo-maintained open source project, driven by priority rather than fixed timelines. This roadmap is here to align expectations: what's already shipped, what's under ongoing maintenance and evolution, and what's intentionally out of scope. Priorities shift with user feedback and real usage signals — no committed delivery windows.
Where we are: AI generates SVG from scratch → converts to DrawingML for natively editable PPTX. The core axis is pixel-fidelity across four renderers (PowerPoint / Keynote / LibreOffice / WPS) + real native shapes. Every direction below serves that axis.
The past two months' structural capability growth. Single flags / incremental polish go to the commit log.
- Direct export to natively editable PPTX —
svg_to_pptxadds glow / rotate / text-decoration / stroke-linejoin; the full SVG → DrawingML chain becomes usable - Chart / layout template JSON indexes ship, AI selection path connected
- Source-less generation:
topic-researchworkflow supports "topic only, no source files" - PPTX export step-change: SVG clipPath → DrawingML picture geometry, marker → native arrows, output consolidated to
exports/ - Chart library expands to 70 templates + three icon libraries (simple-icons / phosphor-duotone / brand-logo)
spec_lock.mdmachine-readable contract: Strategist locks the spec, Executor re-reads it before every page — cross-page consistency gets a real guarantee- Per-element animation on by default + recorded narration / video export (
workflows/generate-audio.md)
- Live Preview enters the main pipeline (
workflows/live-preview.md) — browser preview, click elements to write annotations, say "apply my annotations" and the AI rewrites that region (built on @WodenJay's PR #85) - Replicate any PPTX as a template (
workflows/create-template.md) — PPTX → SVG reverse + OOXML theme / master / layout / asset extraction - AI image three-dimension system rendering × palette × type + Strategist h.5 lock, downstream consumes a fixed contract
- AI image
hero_pagedual-track — local insert + full-canvas hero image coexist - Brand identity preset subsystem (
workflows/create-brand.md) — extract and reuse brand palette / typography / logo / voice - Visual self-review workflow (
workflows/visual-review.md) — rubric-based per-page check of AI-generated SVGs - AI image: Type concept boundary clarification — Type is now narrowed to "the internal geometric skeleton of a local infographic block" (11 real skeletons); the four pseudo-types (hero / background / portrait / typography) fold back into
page_role: hero_pageplus four composition primitives (single-subject / portrait / typographic / atmospheric); hero_page text layering rule (visual keywords embedded, editable text via SVG overlay) - Brutalist AI newspaper example deck shipped (
examples/ppt169_brutalist_ai_newspaper_2026/) — first of the three P0 capability-backing demos: wall-to-wall small type + irregular columns + halftone monochrome + single-spot red + real native shapes; 10-page editorial annual report stressing text-position precision and cross-page consistency - Kubernetes Blueprint example deck shipped (
examples/ppt169_kubernetes_blueprint_2026/) — second of the three P0 capability-backing demos: isometric technical-drawing aesthetic + blueprint cyan/amber palette + hand-authored SVG geometry (no raster images) + custom drawing-in animation; 10-page Kubernetes architecture walkthrough stressing geometric shape generalization and chart-structure extensibility - AI image
customescape hatch —rendering/palette/ hero composition each acceptcustom+ a one-paragraph*_behaviorprose, replacing the false "default to vector-illustration / cool-corporate" fallback; end-to-end contract spansimage-renderings/_index.md§1.5,image-palettes/_index.md§2, Strategist h.5 hard-rule (≤1 custom per dimension; one candidate may carry both), spec_lock fields, and Image_Generator Step 2 consumption branch - Template architecture: three-kind consolidation (
docs/templates-architecture.md) — brand / layout / deck split into three independent dirs with per-kind schemas + segment-level fusion + git-style conflict resolution; SKILL.md Step 3 dispatches perkind, trigger rule remains "explicit path only" - Pattern fill PPTX safety net —
svg_quality_checker.pynow warns on<pattern>withoutdata-pptx-pattern(silent fallback toltUpDiag) and errors on values outside OOXMLST_PresetPatternVal(schema-failed PPTX that won't open);shared-standards.md §7documents the closed preset enum and the required<rect fill="<bg>"/>child convention - LaTeX math formula rendering shipped (
scripts/latex_render.py) — Strategist locks one of three policies (mixed/render-all/text-only) inside the Typography confirmation and writes an explicitimages/formula_manifest.json; the renderer walks a codecogs → quicklatex → mathpad → wikimedia fallback chain and emits transparent PNGs that land in §VIII asAcquire Via: formula/Status: Renderedrows; formula-heavy decks (academic / engineering / educational) finally have a native rendering route. Formula selection is a Strategist decision — the renderer never scans source files for$...$markers - Live preview direct editing — L1 / L2 / L3 (
workflows/live-preview.md) — the browser editor gains deterministic in-place edits with no AI round-trip: text content (L1), presentation attributes like fill / stroke / font-size (L2), and on-canvas geometry (L3) — drag a selected element to move it, arrow-key nudge (Shift= 10px), multi-select, plus a right-click overlap picker for stacked shapes. Edits stage withCtrl+Zundo + coalescing and write tosvg_output/on Apply changes; moves persist through finalize / export (moved text frames, promoted multi-line tspans, repositioned icons all reproduce in the PPTX). Re-export stays chat-driven; on-canvas resize handles are not yet implemented (resize via the geometry inputs)
-
Replicate any PPTX's design → refill content route (
workflows/template-fill-pptx.md) — when a user supplies an existing.pptxplus new material / a topic and asks to "reuse this deck's design / fill the content back in", this standalone workflow edits the PPTX directly and never enters the SVG generation pipeline. Output stays natively editable (it reuses the original slides' shapes / layouts, not a screenshot refill); it isolates private parts on reuse, exposes chart data, and runs capacity checks. Trigger follows the template rule — only on an explicit ask to reuse an existing deck — and it deliberately does not reflow / add pages / swap images (that's the from-scratch main route). Distinguished from Non-goals #53 below -
Three executors retired → mode + visual-style dual catalogs (
references/modes/+references/visual-styles/) — the old threeexecutor-*.md(general / consultant / consultant-top) entangled domain · audience · persuasion · narrative on one axis; split into two orthogonal catalogs (following theimage-renderingspattern: flat dir +_index+ on-demand read + Strategist locks one). mode = narrative skeleton (pyramid/narrative/instructional/showcase; consultant + top merge into pyramid since their narrative core is identical); visual-style = SVG layout aesthetic (swiss-minimal/editorial/soft-rounded/dark-tech, each paired with an image-rendering, zero HEX — color truth stays in confirmation e + image-palettes). Strategist§dlocksmode+visual_styleindependently intospec_lock; Executor loads the two locked files; any mode × any style. Render coordinates stay intemplates/charts/ -
Prompt constraint-strength decoupling (
docs/rules/prompt-style.md§4) — three explicit strength tiers — rule (Hard rule/Forbidden) / default (Default — … may override) / reference (Reference — not a constraint) — plus an "objective failure vs taste" test and a checker boundary, so the model can tell "must keep vs may deviate" at a glance; the visual-style catalog is Reference-strength throughout -
visual-style catalog grows to 18, aligned with image-renderings + examples reclaimed — first four distilled from the examples library (
brutalist/blueprint/memphis/zine), then six filling in the hand-drawn / textured renderings that have a layout twin (sketch-notes/ink-notes/chalkboard/paper-cut/vintage-poster/pixel-art), then four more reclaimed from still-uncovered example aesthetics:ink-wash(rice-paper whitespace, from 藏拙 / 李子柒) ·glassmorphism(dark frosted-glass + gradient light, from glassmorphism_demo, split out of soft-rounded) ·photo-editorial(full-bleed photography dominates, text captions, from Pritzker / fashion_weekly) ·data-journalism(Bloomberg/Economist multi-column micro-charts + sidebars, from global_ai_capital). The catalog is regrouped into five families (corporate-product / editorial-publication / expressive-print / hand-drawn-brush / specialty). The test: a rendering earns a visual-style twin only when it defines a whole-page layout language, not merely how an inserted image looks — so photo-led composition getsphoto-editorial(paired with corporate-photo), while purely atmospheric renderings (nature / warm-scene / fantasy-animation) stay imagery-only and just pair with a layout style. Zero HEX, Reference strength throughout -
mode catalog grows to 5: add
briefing— fills the "neutral information delivery" cell: no thesis, no story, no teaching, no spectacle — topic titles, even weight, complete and scannable, for status updates / reference decks / catalogs / meeting packs / FAQs that inform without arguing. The five now partition presentation intent more nearly MECE: persuade (pyramid) · tell a story (narrative) · teach (instructional) · impress (showcase) · simply inform (briefing)._indexgains abriefingvspyramidtie-breaker ("if you'd have to invent a thesis to fit pyramid, it's briefing"). Five presets plus acustomescape hatch for a bespoke direction none of them captures (a special cadence, a multi-mode fusion, a particular posture) — either user-requested or Strategist-recommended, confirmed like every lock; a deck always locks one value, and a fusion is onecustomdescribing the acts. The only thing to avoid is defaulting tocustomwhen a preset genuinely fits. Mirrors the truth-precedence rule that a user-authored outline or direction overrides the mode -
mode / visual-style system validated on real decks + four calibration tightenings landed — the 5 modes + 18 visual-styles + the
customescape hatch were exercised on five covering decks (briefing×data-journalism / narrative×photo-editorial / instructional×chalkboard / showcase×glassmorphism / custom×zine, with the narrative deck going through the AI-image branch): zero selection misses (all four Close-calls tie-breakers fired and held against the real grey-zone pulls), discipline fully honored (zero HEX / Reference strength / whole-page layout language), thecustommechanism works (amode_behaviorprose paragraph carried a 10-page generation and reads as plain language at confirmation, not a bare token), mode ⟂ visual_style holds (any combination without bleed, including a positive check of the "keynote/launch = mode not style" route), export 5/5 decks × every page with 0 failures. Tightened four spots from the real signal:strategist §enow locks the full neutral tier set the visual_style implies up front (killing the mid-deck color top-ups seen in three consecutive decks),executor-base §1re-skins chart/layout template gradients-shadows-fills to the deck's visual_style (templates supply structure, not skin; mirror templates stay verbatim per §1.1),briefing §1makescore_messagestate coverage rather than a claim (briefing-only; global §IX assertion semantics stay for narrative/instructional/pyramid), andsvg_quality_checkerfixes font-family drift false positives (delimiter-matched capture + stack normalization) and drops the font-size ceiling for showcase mode and poster visual-styles -
Optional spec-review checkpoint shipped (
workflows/refine-spec.md) — an opt-in pause after the Eight Confirmations: when the user explicitly asks (default OFF), the Strategist produces the fulldesign_spec.md+spec_lock.md, then stops so the user can revise any part of the spec (outline / color / typography / layout / image strategy / page rhythm) before generation, keeping both files in sync. Same shape as the split-mode note — it never fires on its own, the default pipeline is unchanged, and it surfaces as one opt-in line inside the Eight Confirmations. Review lenses (logical clarity / information density / focus / register / emotional resonance / chapter balance + the design dimensions) give a direction only, never a numeric threshold (Referencestrength). Prompted by @cuberoocp in issue #173 -
Interactive visual Eight Confirmations page (Step 4) (
scripts/confirm_ui/server.py, field schemascripts/docs/confirm_ui.md) — the Eight Confirmations move from chat-only to a browser page that auto-launches by default at Step 4: enumerable fields (canvas / mode / visual_style / icons / image usage / formula & generation policy) list common options fromcatalogs.json, while generative fields present candidates — color swatches, live font previews (CJK / Latin previewed independently), and AI-image rendering × palette picks; plus live swatch feedback on custom HEX, a live combined color × typography preview, and a body-size range hinted from the chosen canvas. It shares port5050with the Step 6 Live Preview (they never run at once — the page auto---shutdowns at the end of Step 4 to free the port; if the port is busy the launcher auto-advances to the next free one). On principle chat stays the canonical channel and the page is a convenience layer: the page'sresult.jsonis authoritative over the recommendations and is consumed in place downstream (image plan /image_strategy/ fonts / the split and refine-spec toggles all read from it), and any failure to open / timeout / headless host degrades losslessly to the chat-summary fallback -
A batch of source-conversion fidelity gains — less information dropped as source material enters the pipeline:
doc_to_mdconverts Word OMML / Office Math equations to inline LaTeX,pdf_to_mdrecognizesFigure N |pipe-delimited captions, andppt_to_mdpreserves hyperlinks already in the source deck (run-level external[text](url)/ slide-internal jumps[text](#slide-N)/ shape-level clicks, with dangerous-scheme filtering and anchor-text Markdown escaping) and transcribes native chart data into Markdown tables (values survive the conversion instead of collapsing to a single picture). Caption recognition is based on @suay1113's PR #191; hyperlink preservation is distilled from @ZhaoZuohong's PR #155 -
Content-faithful PPT beautification / re-layout shipped (
workflows/beautify-pptx.md) — mirror oftemplate-fill: template-fill reuses a deck's design with new content, beautify keeps the content and redoes the layout. Given an existing PPTX, every text string is preserved verbatim (nothing added / removed / reworded); it extracts and inherits the source deck's visual identity (palette / fonts,themeorobservedoffered as two confirm-page candidates) and redoes only layout / hierarchy / whitespace, strict 1:1 page count and order, with charts / tables regenerated natively from extracted data (values frozen) and source pictures re-laid-out. Technically still "generate a native, editable PPTX from scratch" (ppt_to_mdextracts content → main pipeline → a brand-new deck), not a patch over the original, so it stays clear of Non-goals #53. Addsbeautify_identity.py/beautify_inventory.py; the confirm page is seeded from the source for the user to review. Honest v1 ceiling: it does not relieve information overload (a crowded page improves within itself; true re-pagination is the main pipeline), does not guarantee coordinate-level paste-back, and combo / dual-axis / waterfall charts lose the un-captured plots -
Multi-deck PPTX intake +
analysis/source-name prefixing — a main-pipeline project can now combine several source decks: each writes its own<stem>.identity.json/<stem>.slide_library.json, with every deck's digest inlined underdecks[]in the single indexsource_profile.json(preserving the "Strategist must-readsource_profile.json" single-entry contract — one entry for a one-deck project, several for a combined one; re-importing the same stem replaces its entry).beautify/template-fillstay single-deck (1:1) and read their own<stem>.*artifacts -
Material divergence (a free-text field under §c Audience) — a main-pipeline content-strategy question added as a free-text box under the audience field: the user states in their own words how closely to follow the source vs how freely to reshape it (blank = balanced default). Deliberately not a fixed set of options, not recommended from analyzing the source, and not coupled to page count — it is purely the user's stated intent. However freely they ask, facts stay sourced — reorganize / reframe / expand / connect what is in the source, never import facts from outside it (that is the
topic-researchjob). The Strategist reads the prose when authoring the §IX outline and records it indesign_spec §I; it is not written tospec_lock(the Executor never reads it).modeand divergence are orthogonal. Beautify / template-fill freeze content and do not surface this field -
A batch of default-behavior and intake standardizations — per-element entrance animation now defaults off (page transition
fadeonly; element builds are opt-in via-a auto/animations.json), removing the auto-cascade "AI deck" tell; per-projecticons/copies chosen icons into the project at selection time and embeds project-first;analysis/is established as the canonical must-read layer for machine-extracted facts (PPTX intake bundle +image_analysis.csv); the main pipeline treats a source deck's identity (palette / fonts / layout) as reference, not constraint (inherit or redesign by Strategist judgment, defaulting to fresh design); the confirm page gains a custom color input
Directions actively underway or up next, with no committed timeline.
- Real-usage calibration of multi-deck intake and material divergence (just landed) — multi-deck combined intake (
<stem>prefixing + adecks[]merge index) and the material-divergence free-text field (under §c audience) both shipped (see "2026-06" above); next is calibration from real usage: same-stem collisions across decks currently let the later one overwrite the former — whether dedupe / numbering is needed awaits signal; and whether the Strategist reads the free-text intent accurately, and whether the "facts stay sourced" line holds when the user asks for free reshaping, awaits real generation. Neither gets a mechanical threshold added pre-emptively - Otherwise: the mode / visual-style validation and calibration is settled (see "2026-06" above) — the structure (5 modes + 18 visual-styles + custom) is locked, the four adjacent pairs are consolidated into one Close-calls table, and the four calibration tightenings have landed. Future direction is driven by real usage signal and feedback; long-running improvements are under "Ongoing maintenance" below, and evaluated-out directions under "Non-goals"
Long-running improvements with no committed timeline. Only real directions are listed; specific fixes / single flags go to the commit log.
- Prompt slimming — compress per-role prompt token footprint and improve cache hit rate without sacrificing quality, for indirect cost / speed gains. Complements "Pure speed optimization" below: indirect optimization yes, quality-sacrificing speedups no.
The directions below come up repeatedly and have been evaluated as not on the path. Listing them is not a value judgment on the underlying need — they simply don't fit this project's main route. If you specifically need these capabilities, consider other tools or forking.
PPT Master's main route is "AI generates SVG from scratch → DrawingML", with the whole pipeline built around full control of every shape / text / layout. "Parse existing PPTX placeholders + only refill text" is a different product shape requiring handling of arbitrary master / theme / placeholder systems — orthogonal to where this architecture invests.
The basic need is actually simple: if you just need "replace Excel data into fixed positions in a PPT template", have the AI write a few lines of python-pptx. You don't need this pipeline.
vs. the
template-fill-pptxroute: "reuse a deck's own design and refill it with new content" is a supported capability (see "2026-06" above), with natively-editable output. What's out of scope here is the other shape — parsing an arbitrary third-party template's master / theme / placeholder system to do text-only substitution. Different investment points; don't conflate them.
Pixel-fidelity across the four renderers (PowerPoint / Keynote / LibreOffice / WPS) is the project's spine. Switching to native PowerPoint charts breaks that — the same PPTX renders different chart layouts across renderers. Charts as SVG is by design, not a capability gap.
If you need data-driven native Excel charts, pick a different tool or manually replace charts in PowerPoint post-export — this project won't build that path in.
Issue: #111
pip + requirements.txt is the only official install path because it works in every Python environment with no extra learning cost. uv is a fine tool, but making it default raises the bar for new users. If you personally prefer uv, use it in your fork — it won't affect the main line.
Issue: #97
In the cost / speed / quality triangle this project picks quality. ~20 minutes for a high-quality PPTX is the current reasonable point.
Will do: indirect improvements via prompt slimming / cache hit rate. Won't do: trading quality for "throw a few pages together" speed.
If speed-sensitive and quality-tolerant, Gamma / similar AI tools are a better fit.
The product form is firmly chat-driven AI IDE skill (Claude Code / Cursor / VS Code + Copilot / Codebuddy).
Won't do: standalone CLI (ppm-style), SaaS web service, Electron shell. Any "make it run independently of chat" proposal will be declined. Chat is the interaction core, not a wrapper.
- Issues: github.com/hugohe3/ppt-master/issues — bugs / proposals
- Discussions: github.com/hugohe3/ppt-master/discussions — usage / experience sharing
- Email: heyug3@gmail.com
Before proposing a new direction, scan the Non-goals above. If your request falls there, it's unlikely to land — but we're happy to discuss other paths to your underlying need.