Roadmap

English | 中文

PPT Master is a solo-maintained open source project, driven by priority rather than fixed timelines. This roadmap is here to align expectations: what's already shipped, what's under ongoing maintenance and evolution, and what's intentionally out of scope. Priorities shift with user feedback and real usage signals — no committed delivery windows.

Where we are: AI generates SVG from scratch → converts to DrawingML for natively editable PPTX. The core axis is pixel-fidelity across four renderers (PowerPoint / Keynote / LibreOffice / WPS) + real native shapes. Every direction below serves that axis.

Recent capability evolution

The past two months' structural capability growth. Single flags / incremental polish go to the commit log.

2026-03 — Native PPTX route takes shape

Direct export to natively editable PPTX — svg_to_pptx adds glow / rotate / text-decoration / stroke-linejoin; the full SVG → DrawingML chain becomes usable
Chart / layout template JSON indexes ship, AI selection path connected

2026-04 — Pipeline at scale

Source-less generation: topic-research workflow supports "topic only, no source files"
PPTX export step-change: SVG clipPath → DrawingML picture geometry, marker → native arrows, output consolidated to exports/
Chart library expands to 70 templates + three icon libraries (simple-icons / phosphor-duotone / brand-logo)
spec_lock.md machine-readable contract: Strategist locks the spec, Executor re-reads it before every page — cross-page consistency gets a real guarantee
Per-element animation on by default + recorded narration / video export (workflows/generate-audio.md)

2026-05 — Visual editing + AI image systematization

Live Preview enters the main pipeline (workflows/live-preview.md) — browser preview, click elements to write annotations, say "apply my annotations" and the AI rewrites that region (built on @WodenJay's PR #85)
Replicate any PPTX as a template (workflows/create-template.md) — PPTX → SVG reverse + OOXML theme / master / layout / asset extraction
AI image three-dimension system rendering × palette × type + Strategist h.5 lock, downstream consumes a fixed contract
AI image hero_page dual-track — local insert + full-canvas hero image coexist
Brand identity preset subsystem (workflows/create-brand.md) — extract and reuse brand palette / typography / logo / voice
Visual self-review workflow (workflows/visual-review.md) — rubric-based per-page check of AI-generated SVGs
AI image: Type concept boundary clarification — Type is now narrowed to "the internal geometric skeleton of a local infographic block" (11 real skeletons); the four pseudo-types (hero / background / portrait / typography) fold back into page_role: hero_page plus four composition primitives (single-subject / portrait / typographic / atmospheric); hero_page text layering rule (visual keywords embedded, editable text via SVG overlay)
Brutalist AI newspaper example deck shipped (examples/ppt169_brutalist_ai_newspaper_2026/) — first of the three P0 capability-backing demos: wall-to-wall small type + irregular columns + halftone monochrome + single-spot red + real native shapes; 10-page editorial annual report stressing text-position precision and cross-page consistency
Kubernetes Blueprint example deck shipped (examples/ppt169_kubernetes_blueprint_2026/) — second of the three P0 capability-backing demos: isometric technical-drawing aesthetic + blueprint cyan/amber palette + hand-authored SVG geometry (no raster images) + custom drawing-in animation; 10-page Kubernetes architecture walkthrough stressing geometric shape generalization and chart-structure extensibility
AI image custom escape hatch — rendering / palette / hero composition each accept custom + a one-paragraph *_behavior prose, replacing the false "default to vector-illustration / cool-corporate" fallback; end-to-end contract spans image-renderings/_index.md §1.5, image-palettes/_index.md §2, Strategist h.5 hard-rule (≤1 custom per dimension; one candidate may carry both), spec_lock fields, and Image_Generator Step 2 consumption branch
Template architecture: three-kind consolidation (docs/templates-architecture.md) — brand / layout / deck split into three independent dirs with per-kind schemas + segment-level fusion + git-style conflict resolution; SKILL.md Step 3 dispatches per kind, trigger rule remains "explicit path only"
Pattern fill PPTX safety net — svg_quality_checker.py now warns on <pattern> without data-pptx-pattern (silent fallback to ltUpDiag) and errors on values outside OOXML ST_PresetPatternVal (schema-failed PPTX that won't open); shared-standards.md §7 documents the closed preset enum and the required <rect fill="<bg>"/> child convention
LaTeX math formula rendering shipped (scripts/latex_render.py) — Strategist locks one of three policies (mixed / render-all / text-only) inside the Typography confirmation and writes an explicit images/formula_manifest.json; the renderer walks a codecogs → quicklatex → mathpad → wikimedia fallback chain and emits transparent PNGs that land in §VIII as Acquire Via: formula / Status: Rendered rows; formula-heavy decks (academic / engineering / educational) finally have a native rendering route. Formula selection is a Strategist decision — the renderer never scans source files for $...$ markers
Live preview direct editing — L1 / L2 / L3 (workflows/live-preview.md) — the browser editor gains deterministic in-place edits with no AI round-trip: text content (L1), presentation attributes like fill / stroke / font-size (L2), and on-canvas geometry (L3) — drag a selected element to move it, arrow-key nudge (Shift = 10px), multi-select, plus a right-click overlap picker for stacked shapes. Edits stage with Ctrl+Z undo + coalescing and write to svg_output/ on Apply changes; moves persist through finalize / export (moved text frames, promoted multi-line tspans, repositioned icons all reproduce in the PPTX). Re-export stays chat-driven; on-canvas resize handles are not yet implemented (resize via the geometry inputs)

2026-06 — Mode & visual-style dual catalogs + PPTX intake & content-strategy expansion

Replicate any PPTX's design → refill content route (workflows/template-fill-pptx.md) — when a user supplies an existing .pptx plus new material / a topic and asks to "reuse this deck's design / fill the content back in", this standalone workflow edits the PPTX directly and never enters the SVG generation pipeline. Output stays natively editable (it reuses the original slides' shapes / layouts, not a screenshot refill); it isolates private parts on reuse, exposes chart data, and runs capacity checks. Trigger follows the template rule — only on an explicit ask to reuse an existing deck — and it deliberately does not reflow / add pages / swap images (that's the from-scratch main route). Distinguished from Non-goals #53 below
Three executors retired → mode + visual-style dual catalogs (references/modes/ + references/visual-styles/) — the old three executor-*.md (general / consultant / consultant-top) entangled domain · audience · persuasion · narrative on one axis; split into two orthogonal catalogs (following the image-renderings pattern: flat dir + _index + on-demand read + Strategist locks one). mode = narrative skeleton (pyramid / narrative / instructional / showcase; consultant + top merge into pyramid since their narrative core is identical); visual-style = SVG layout aesthetic (swiss-minimal / editorial / soft-rounded / dark-tech, each paired with an image-rendering, zero HEX — color truth stays in confirmation e + image-palettes). Strategist §d locks mode + visual_style independently into spec_lock; Executor loads the two locked files; any mode × any style. Render coordinates stay in templates/charts/
Prompt constraint-strength decoupling (docs/rules/prompt-style.md §4) — three explicit strength tiers — rule (Hard rule / Forbidden) / default (Default — … may override) / reference (Reference — not a constraint) — plus an "objective failure vs taste" test and a checker boundary, so the model can tell "must keep vs may deviate" at a glance; the visual-style catalog is Reference-strength throughout
visual-style catalog grows to 18, aligned with image-renderings + examples reclaimed — first four distilled from the examples library (brutalist / blueprint / memphis / zine), then six filling in the hand-drawn / textured renderings that have a layout twin (sketch-notes / ink-notes / chalkboard / paper-cut / vintage-poster / pixel-art), then four more reclaimed from still-uncovered example aesthetics: ink-wash (rice-paper whitespace, from 藏拙 / 李子柒) · glassmorphism (dark frosted-glass + gradient light, from glassmorphism_demo, split out of soft-rounded) · photo-editorial (full-bleed photography dominates, text captions, from Pritzker / fashion_weekly) · data-journalism (Bloomberg/Economist multi-column micro-charts + sidebars, from global_ai_capital). The catalog is regrouped into five families (corporate-product / editorial-publication / expressive-print / hand-drawn-brush / specialty). The test: a rendering earns a visual-style twin only when it defines a whole-page layout language, not merely how an inserted image looks — so photo-led composition gets photo-editorial (paired with corporate-photo), while purely atmospheric renderings (nature / warm-scene / fantasy-animation) stay imagery-only and just pair with a layout style. Zero HEX, Reference strength throughout
mode catalog grows to 5: add briefing — fills the "neutral information delivery" cell: no thesis, no story, no teaching, no spectacle — topic titles, even weight, complete and scannable, for status updates / reference decks / catalogs / meeting packs / FAQs that inform without arguing. The five now partition presentation intent more nearly MECE: persuade (pyramid) · tell a story (narrative) · teach (instructional) · impress (showcase) · simply inform (briefing). _index gains a briefing vs pyramid tie-breaker ("if you'd have to invent a thesis to fit pyramid, it's briefing"). Five presets plus a custom escape hatch for a bespoke direction none of them captures (a special cadence, a multi-mode fusion, a particular posture) — either user-requested or Strategist-recommended, confirmed like every lock; a deck always locks one value, and a fusion is one custom describing the acts. The only thing to avoid is defaulting to custom when a preset genuinely fits. Mirrors the truth-precedence rule that a user-authored outline or direction overrides the mode
mode / visual-style system validated on real decks + four calibration tightenings landed — the 5 modes + 18 visual-styles + the custom escape hatch were exercised on five covering decks (briefing×data-journalism / narrative×photo-editorial / instructional×chalkboard / showcase×glassmorphism / custom×zine, with the narrative deck going through the AI-image branch): zero selection misses (all four Close-calls tie-breakers fired and held against the real grey-zone pulls), discipline fully honored (zero HEX / Reference strength / whole-page layout language), the custom mechanism works (a mode_behavior prose paragraph carried a 10-page generation and reads as plain language at confirmation, not a bare token), mode ⟂ visual_style holds (any combination without bleed, including a positive check of the "keynote/launch = mode not style" route), export 5/5 decks × every page with 0 failures. Tightened four spots from the real signal: strategist §e now locks the full neutral tier set the visual_style implies up front (killing the mid-deck color top-ups seen in three consecutive decks), executor-base §1 re-skins chart/layout template gradients-shadows-fills to the deck's visual_style (templates supply structure, not skin; mirror templates stay verbatim per §1.1), briefing §1 makes core_message state coverage rather than a claim (briefing-only; global §IX assertion semantics stay for narrative/instructional/pyramid), and svg_quality_checker fixes font-family drift false positives (delimiter-matched capture + stack normalization) and drops the font-size ceiling for showcase mode and poster visual-styles
Optional spec-review checkpoint shipped (workflows/refine-spec.md) — an opt-in pause after the Eight Confirmations: when the user explicitly asks (default OFF), the Strategist produces the full design_spec.md + spec_lock.md, then stops so the user can revise any part of the spec (outline / color / typography / layout / image strategy / page rhythm) before generation, keeping both files in sync. Same shape as the split-mode note — it never fires on its own, the default pipeline is unchanged, and it surfaces as one opt-in line inside the Eight Confirmations. Review lenses (logical clarity / information density / focus / register / emotional resonance / chapter balance + the design dimensions) give a direction only, never a numeric threshold (Reference strength). Prompted by @cuberoocp in issue #173
Interactive visual Eight Confirmations page (Step 4) (scripts/confirm_ui/server.py, field schema scripts/docs/confirm_ui.md) — the Eight Confirmations move from chat-only to a browser page that auto-launches by default at Step 4: enumerable fields (canvas / mode / visual_style / icons / image usage / formula & generation policy) list common options from catalogs.json, while generative fields present candidates — color swatches, live font previews (CJK / Latin previewed independently), and AI-image rendering × palette picks; plus live swatch feedback on custom HEX, a live combined color × typography preview, and a body-size range hinted from the chosen canvas. It shares port 5050 with the Step 6 Live Preview (they never run at once — the page auto---shutdowns at the end of Step 4 to free the port; if the port is busy the launcher auto-advances to the next free one). On principle chat stays the canonical channel and the page is a convenience layer: the page's result.json is authoritative over the recommendations and is consumed in place downstream (image plan / image_strategy / fonts / the split and refine-spec toggles all read from it), and any failure to open / timeout / headless host degrades losslessly to the chat-summary fallback
A batch of source-conversion fidelity gains — less information dropped as source material enters the pipeline: doc_to_md converts Word OMML / Office Math equations to inline LaTeX, pdf_to_md recognizes Figure N | pipe-delimited captions, and ppt_to_md preserves hyperlinks already in the source deck (run-level external [text](url) / slide-internal jumps [text](#slide-N) / shape-level clicks, with dangerous-scheme filtering and anchor-text Markdown escaping) and transcribes native chart data into Markdown tables (values survive the conversion instead of collapsing to a single picture). Caption recognition is based on @suay1113's PR #191; hyperlink preservation is distilled from @ZhaoZuohong's PR #155
Content-faithful PPT beautification / re-layout shipped (workflows/beautify-pptx.md) — mirror of template-fill: template-fill reuses a deck's design with new content, beautify keeps the content and redoes the layout. Given an existing PPTX, every text string is preserved verbatim (nothing added / removed / reworded); it extracts and inherits the source deck's visual identity (palette / fonts, theme or observed offered as two confirm-page candidates) and redoes only layout / hierarchy / whitespace, strict 1:1 page count and order, with charts / tables regenerated natively from extracted data (values frozen) and source pictures re-laid-out. Technically still "generate a native, editable PPTX from scratch" (ppt_to_md extracts content → main pipeline → a brand-new deck), not a patch over the original, so it stays clear of Non-goals #53. Adds beautify_identity.py / beautify_inventory.py; the confirm page is seeded from the source for the user to review. Honest v1 ceiling: it does not relieve information overload (a crowded page improves within itself; true re-pagination is the main pipeline), does not guarantee coordinate-level paste-back, and combo / dual-axis / waterfall charts lose the un-captured plots
Multi-deck PPTX intake + analysis/ source-name prefixing — a main-pipeline project can now combine several source decks: each writes its own <stem>.identity.json / <stem>.slide_library.json, with every deck's digest inlined under decks[] in the single index source_profile.json (preserving the "Strategist must-read source_profile.json" single-entry contract — one entry for a one-deck project, several for a combined one; re-importing the same stem replaces its entry). beautify / template-fill stay single-deck (1:1) and read their own <stem>.* artifacts
Material divergence (a free-text field under §c Audience) — a main-pipeline content-strategy question added as a free-text box under the audience field: the user states in their own words how closely to follow the source vs how freely to reshape it (blank = balanced default). Deliberately not a fixed set of options, not recommended from analyzing the source, and not coupled to page count — it is purely the user's stated intent. However freely they ask, facts stay sourced — reorganize / reframe / expand / connect what is in the source, never import facts from outside it (that is the topic-research job). The Strategist reads the prose when authoring the §IX outline and records it in design_spec §I; it is not written to spec_lock (the Executor never reads it). mode and divergence are orthogonal. Beautify / template-fill freeze content and do not surface this field
A batch of default-behavior and intake standardizations — per-element entrance animation now defaults off (page transition fade only; element builds are opt-in via -a auto / animations.json), removing the auto-cascade "AI deck" tell; per-project icons/ copies chosen icons into the project at selection time and embeds project-first; analysis/ is established as the canonical must-read layer for machine-extracted facts (PPTX intake bundle + image_analysis.csv); the main pipeline treats a source deck's identity (palette / fonts / layout) as reference, not constraint (inherit or redesign by Strategist judgment, defaulting to fresh design); the confirm page gains a custom color input

In progress / Next

Directions actively underway or up next, with no committed timeline.

Real-usage calibration of multi-deck intake and material divergence (just landed) — multi-deck combined intake (<stem> prefixing + a decks[] merge index) and the material-divergence free-text field (under §c audience) both shipped (see "2026-06" above); next is calibration from real usage: same-stem collisions across decks currently let the later one overwrite the former — whether dedupe / numbering is needed awaits signal; and whether the Strategist reads the free-text intent accurately, and whether the "facts stay sourced" line holds when the user asks for free reshaping, awaits real generation. Neither gets a mechanical threshold added pre-emptively
Otherwise: the mode / visual-style validation and calibration is settled (see "2026-06" above) — the structure (5 modes + 18 visual-styles + custom) is locked, the four adjacent pairs are consolidated into one Close-calls table, and the four calibration tightenings have landed. Future direction is driven by real usage signal and feedback; long-running improvements are under "Ongoing maintenance" below, and evaluated-out directions under "Non-goals"

Ongoing maintenance directions

Long-running improvements with no committed timeline. Only real directions are listed; specific fixes / single flags go to the commit log.

Prompt slimming — compress per-role prompt token footprint and improve cache hit rate without sacrificing quality, for indirect cost / speed gains. Complements "Pure speed optimization" below: indirect optimization yes, quality-sacrificing speedups no.

Non-goals

The directions below come up repeatedly and have been evaluated as not on the path. Listing them is not a value judgment on the underlying need — they simply don't fit this project's main route. If you specifically need these capabilities, consider other tools or forking.

Read arbitrary PPTX templates → fill text only

Issues: #53, #118

PPT Master's main route is "AI generates SVG from scratch → DrawingML", with the whole pipeline built around full control of every shape / text / layout. "Parse existing PPTX placeholders + only refill text" is a different product shape requiring handling of arbitrary master / theme / placeholder systems — orthogonal to where this architecture invests.

The basic need is actually simple: if you just need "replace Excel data into fixed positions in a PPT template", have the AI write a few lines of python-pptx. You don't need this pipeline.

vs. the template-fill-pptx route: "reuse a deck's own design and refill it with new content" is a supported capability (see "2026-06" above), with natively-editable output. What's out of scope here is the other shape — parsing an arbitrary third-party template's master / theme / placeholder system to do text-only substitution. Different investment points; don't conflate them.

Switch to native PowerPoint charts (Excel-native chart)

Issues: #99, #100-class

Pixel-fidelity across the four renderers (PowerPoint / Keynote / LibreOffice / WPS) is the project's spine. Switching to native PowerPoint charts breaks that — the same PPTX renders different chart layouts across renderers. Charts as SVG is by design, not a capability gap.

If you need data-driven native Excel charts, pick a different tool or manually replace charts in PowerPoint post-export — this project won't build that path in.

uv as default / required dependency

Issue: #111

pip + requirements.txt is the only official install path because it works in every Python environment with no extra learning cost. uv is a fine tool, but making it default raises the bar for new users. If you personally prefer uv, use it in your fork — it won't affect the main line.

Pure speed optimization

Issue: #97

In the cost / speed / quality triangle this project picks quality. ~20 minutes for a high-quality PPTX is the current reasonable point.

Will do: indirect improvements via prompt slimming / cache hit rate. Won't do: trading quality for "throw a few pages together" speed.

If speed-sensitive and quality-tolerant, Gamma / similar AI tools are a better fit.

CLI / SaaS / desktop app form factors

The product form is firmly chat-driven AI IDE skill (Claude Code / Cursor / VS Code + Copilot / Codebuddy).

Won't do: standalone CLI (ppm-style), SaaS web service, Electron shell. Any "make it run independently of chat" proposal will be declined. Chat is the interaction core, not a wrapper.

Feedback channels

Issues: github.com/hugohe3/ppt-master/issues — bugs / proposals
Discussions: github.com/hugohe3/ppt-master/discussions — usage / experience sharing
Email: heyug3@gmail.com

Before proposing a new direction, scan the Non-goals above. If your request falls there, it's unlikely to land — but we're happy to discuss other paths to your underlying need.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Roadmap

Recent capability evolution

2026-03 — Native PPTX route takes shape

2026-04 — Pipeline at scale

2026-05 — Visual editing + AI image systematization

2026-06 — Mode & visual-style dual catalogs + PPTX intake & content-strategy expansion

In progress / Next

Ongoing maintenance directions

Non-goals

Read arbitrary PPTX templates → fill text only

Switch to native PowerPoint charts (Excel-native chart)

uv as default / required dependency

Pure speed optimization

CLI / SaaS / desktop app form factors

Feedback channels

Uh oh!

FilesExpand file tree

roadmap.md

Latest commit

History

roadmap.md

File metadata and controls

Roadmap

Recent capability evolution

2026-03 — Native PPTX route takes shape

2026-04 — Pipeline at scale

2026-05 — Visual editing + AI image systematization

2026-06 — Mode & visual-style dual catalogs + PPTX intake & content-strategy expansion

In progress / Next

Ongoing maintenance directions

Non-goals

Read arbitrary PPTX templates → fill text only

Switch to native PowerPoint charts (Excel-native chart)

uv as default / required dependency

Pure speed optimization

CLI / SaaS / desktop app form factors

Feedback channels