Skip to content

Latest commit

 

History

History
139 lines (86 loc) · 25.2 KB

File metadata and controls

139 lines (86 loc) · 25.2 KB

Roadmap

English | 中文


PPT Master is a solo-maintained open source project, driven by priority rather than fixed timelines. This roadmap is here to align expectations: what's already shipped, what's under ongoing maintenance and evolution, and what's intentionally out of scope. Priorities shift with user feedback and real usage signals — no committed delivery windows.

Where we are: AI generates SVG from scratch → converts to DrawingML for natively editable PPTX. The core axis is pixel-fidelity across four renderers (PowerPoint / Keynote / LibreOffice / WPS) + real native shapes. Every direction below serves that axis.


Recent capability evolution

The past two months' structural capability growth. Single flags / incremental polish go to the commit log.

2026-03 — Native PPTX route takes shape

  • Direct export to natively editable PPTXsvg_to_pptx adds glow / rotate / text-decoration / stroke-linejoin; the full SVG → DrawingML chain becomes usable
  • Chart / layout template JSON indexes ship, AI selection path connected

2026-04 — Pipeline at scale

  • Source-less generation: topic-research workflow supports "topic only, no source files"
  • PPTX export step-change: SVG clipPath → DrawingML picture geometry, marker → native arrows, output consolidated to exports/
  • Chart library expands to 70 templates + three icon libraries (simple-icons / phosphor-duotone / brand-logo)
  • spec_lock.md machine-readable contract: Strategist locks the spec, Executor re-reads it before every page — cross-page consistency gets a real guarantee
  • Per-element animation on by default + recorded narration / video export (workflows/generate-audio.md)

2026-05 — Visual editing + AI image systematization

  • Live Preview enters the main pipeline (workflows/live-preview.md) — browser preview, click elements to write annotations, say "apply my annotations" and the AI rewrites that region (built on @WodenJay's PR #85)
  • Replicate any PPTX as a template (workflows/create-template.md) — PPTX → SVG reverse + OOXML theme / master / layout / asset extraction
  • AI image three-dimension system rendering × palette × type + Strategist h.5 lock, downstream consumes a fixed contract
  • AI image hero_page dual-track — local insert + full-canvas hero image coexist
  • Brand identity preset subsystem (workflows/create-brand.md) — extract and reuse brand palette / typography / logo / voice
  • Visual self-review workflow (workflows/visual-review.md) — rubric-based per-page check of AI-generated SVGs
  • AI image: Type concept boundary clarification — Type is now narrowed to "the internal geometric skeleton of a local infographic block" (11 real skeletons); the four pseudo-types (hero / background / portrait / typography) fold back into page_role: hero_page plus four composition primitives (single-subject / portrait / typographic / atmospheric); hero_page text layering rule (visual keywords embedded, editable text via SVG overlay)
  • Brutalist AI newspaper example deck shipped (examples/ppt169_brutalist_ai_newspaper_2026/) — first of the three P0 capability-backing demos: wall-to-wall small type + irregular columns + halftone monochrome + single-spot red + real native shapes; 10-page editorial annual report stressing text-position precision and cross-page consistency
  • Kubernetes Blueprint example deck shipped (examples/ppt169_kubernetes_blueprint_2026/) — second of the three P0 capability-backing demos: isometric technical-drawing aesthetic + blueprint cyan/amber palette + hand-authored SVG geometry (no raster images) + custom drawing-in animation; 10-page Kubernetes architecture walkthrough stressing geometric shape generalization and chart-structure extensibility
  • AI image custom escape hatchrendering / palette / hero composition each accept custom + a one-paragraph *_behavior prose, replacing the false "default to vector-illustration / cool-corporate" fallback; end-to-end contract spans image-renderings/_index.md §1.5, image-palettes/_index.md §2, Strategist h.5 hard-rule (≤1 custom per dimension; one candidate may carry both), spec_lock fields, and Image_Generator Step 2 consumption branch
  • Template architecture: three-kind consolidation (docs/templates-architecture.md) — brand / layout / deck split into three independent dirs with per-kind schemas + segment-level fusion + git-style conflict resolution; SKILL.md Step 3 dispatches per kind, trigger rule remains "explicit path only"
  • Pattern fill PPTX safety netsvg_quality_checker.py now warns on <pattern> without data-pptx-pattern (silent fallback to ltUpDiag) and errors on values outside OOXML ST_PresetPatternVal (schema-failed PPTX that won't open); shared-standards.md §7 documents the closed preset enum and the required <rect fill="<bg>"/> child convention
  • LaTeX math formula rendering shipped (scripts/latex_render.py) — Strategist locks one of three policies (mixed / render-all / text-only) inside the Typography confirmation and writes an explicit images/formula_manifest.json; the renderer walks a codecogs → quicklatex → mathpad → wikimedia fallback chain and emits transparent PNGs that land in §VIII as Acquire Via: formula / Status: Rendered rows; formula-heavy decks (academic / engineering / educational) finally have a native rendering route. Formula selection is a Strategist decision — the renderer never scans source files for $...$ markers
  • Live preview direct editing — L1 / L2 / L3 (workflows/live-preview.md) — the browser editor gains deterministic in-place edits with no AI round-trip: text content (L1), presentation attributes like fill / stroke / font-size (L2), and on-canvas geometry (L3) — drag a selected element to move it, arrow-key nudge (Shift = 10px), multi-select, plus a right-click overlap picker for stacked shapes. Edits stage with Ctrl+Z undo + coalescing and write to svg_output/ on Apply changes; moves persist through finalize / export (moved text frames, promoted multi-line tspans, repositioned icons all reproduce in the PPTX). Re-export stays chat-driven; on-canvas resize handles are not yet implemented (resize via the geometry inputs)

2026-06 — Mode & visual-style dual catalogs + PPTX intake & content-strategy expansion

  • Replicate any PPTX's design → refill content route (workflows/template-fill-pptx.md) — when a user supplies an existing .pptx plus new material / a topic and asks to "reuse this deck's design / fill the content back in", this standalone workflow edits the PPTX directly and never enters the SVG generation pipeline. Output stays natively editable (it reuses the original slides' shapes / layouts, not a screenshot refill); it isolates private parts on reuse, exposes chart data, and runs capacity checks. Trigger follows the template rule — only on an explicit ask to reuse an existing deck — and it deliberately does not reflow / add pages / swap images (that's the from-scratch main route). Distinguished from Non-goals #53 below

  • Three executors retired → mode + visual-style dual catalogs (references/modes/ + references/visual-styles/) — the old three executor-*.md (general / consultant / consultant-top) entangled domain · audience · persuasion · narrative on one axis; split into two orthogonal catalogs (following the image-renderings pattern: flat dir + _index + on-demand read + Strategist locks one). mode = narrative skeleton (pyramid / narrative / instructional / showcase; consultant + top merge into pyramid since their narrative core is identical); visual-style = SVG layout aesthetic (swiss-minimal / editorial / soft-rounded / dark-tech, each paired with an image-rendering, zero HEX — color truth stays in confirmation e + image-palettes). Strategist §d locks mode + visual_style independently into spec_lock; Executor loads the two locked files; any mode × any style. Render coordinates stay in templates/charts/

  • Prompt constraint-strength decoupling (docs/rules/prompt-style.md §4) — three explicit strength tiers — rule (Hard rule / Forbidden) / default (Default — … may override) / reference (Reference — not a constraint) — plus an "objective failure vs taste" test and a checker boundary, so the model can tell "must keep vs may deviate" at a glance; the visual-style catalog is Reference-strength throughout

  • visual-style catalog grows to 18, aligned with image-renderings + examples reclaimed — first four distilled from the examples library (brutalist / blueprint / memphis / zine), then six filling in the hand-drawn / textured renderings that have a layout twin (sketch-notes / ink-notes / chalkboard / paper-cut / vintage-poster / pixel-art), then four more reclaimed from still-uncovered example aesthetics: ink-wash (rice-paper whitespace, from 藏拙 / 李子柒) · glassmorphism (dark frosted-glass + gradient light, from glassmorphism_demo, split out of soft-rounded) · photo-editorial (full-bleed photography dominates, text captions, from Pritzker / fashion_weekly) · data-journalism (Bloomberg/Economist multi-column micro-charts + sidebars, from global_ai_capital). The catalog is regrouped into five families (corporate-product / editorial-publication / expressive-print / hand-drawn-brush / specialty). The test: a rendering earns a visual-style twin only when it defines a whole-page layout language, not merely how an inserted image looks — so photo-led composition gets photo-editorial (paired with corporate-photo), while purely atmospheric renderings (nature / warm-scene / fantasy-animation) stay imagery-only and just pair with a layout style. Zero HEX, Reference strength throughout

  • mode catalog grows to 5: add briefing — fills the "neutral information delivery" cell: no thesis, no story, no teaching, no spectacle — topic titles, even weight, complete and scannable, for status updates / reference decks / catalogs / meeting packs / FAQs that inform without arguing. The five now partition presentation intent more nearly MECE: persuade (pyramid) · tell a story (narrative) · teach (instructional) · impress (showcase) · simply inform (briefing). _index gains a briefing vs pyramid tie-breaker ("if you'd have to invent a thesis to fit pyramid, it's briefing"). Five presets plus a custom escape hatch for a bespoke direction none of them captures (a special cadence, a multi-mode fusion, a particular posture) — either user-requested or Strategist-recommended, confirmed like every lock; a deck always locks one value, and a fusion is one custom describing the acts. The only thing to avoid is defaulting to custom when a preset genuinely fits. Mirrors the truth-precedence rule that a user-authored outline or direction overrides the mode

  • mode / visual-style system validated on real decks + four calibration tightenings landed — the 5 modes + 18 visual-styles + the custom escape hatch were exercised on five covering decks (briefing×data-journalism / narrative×photo-editorial / instructional×chalkboard / showcase×glassmorphism / custom×zine, with the narrative deck going through the AI-image branch): zero selection misses (all four Close-calls tie-breakers fired and held against the real grey-zone pulls), discipline fully honored (zero HEX / Reference strength / whole-page layout language), the custom mechanism works (a mode_behavior prose paragraph carried a 10-page generation and reads as plain language at confirmation, not a bare token), mode ⟂ visual_style holds (any combination without bleed, including a positive check of the "keynote/launch = mode not style" route), export 5/5 decks × every page with 0 failures. Tightened four spots from the real signal: strategist §e now locks the full neutral tier set the visual_style implies up front (killing the mid-deck color top-ups seen in three consecutive decks), executor-base §1 re-skins chart/layout template gradients-shadows-fills to the deck's visual_style (templates supply structure, not skin; mirror templates stay verbatim per §1.1), briefing §1 makes core_message state coverage rather than a claim (briefing-only; global §IX assertion semantics stay for narrative/instructional/pyramid), and svg_quality_checker fixes font-family drift false positives (delimiter-matched capture + stack normalization) and drops the font-size ceiling for showcase mode and poster visual-styles

  • Optional spec-review checkpoint shipped (workflows/refine-spec.md) — an opt-in pause after the Eight Confirmations: when the user explicitly asks (default OFF), the Strategist produces the full design_spec.md + spec_lock.md, then stops so the user can revise any part of the spec (outline / color / typography / layout / image strategy / page rhythm) before generation, keeping both files in sync. Same shape as the split-mode note — it never fires on its own, the default pipeline is unchanged, and it surfaces as one opt-in line inside the Eight Confirmations. Review lenses (logical clarity / information density / focus / register / emotional resonance / chapter balance + the design dimensions) give a direction only, never a numeric threshold (Reference strength). Prompted by @cuberoocp in issue #173

  • Interactive visual Eight Confirmations page (Step 4) (scripts/confirm_ui/server.py, field schema scripts/docs/confirm_ui.md) — the Eight Confirmations move from chat-only to a browser page that auto-launches by default at Step 4: enumerable fields (canvas / mode / visual_style / icons / image usage / formula & generation policy) list common options from catalogs.json, while generative fields present candidates — color swatches, live font previews (CJK / Latin previewed independently), and AI-image rendering × palette picks; plus live swatch feedback on custom HEX, a live combined color × typography preview, and a body-size range hinted from the chosen canvas. It shares port 5050 with the Step 6 Live Preview (they never run at once — the page auto---shutdowns at the end of Step 4 to free the port; if the port is busy the launcher auto-advances to the next free one). On principle chat stays the canonical channel and the page is a convenience layer: the page's result.json is authoritative over the recommendations and is consumed in place downstream (image plan / image_strategy / fonts / the split and refine-spec toggles all read from it), and any failure to open / timeout / headless host degrades losslessly to the chat-summary fallback

  • A batch of source-conversion fidelity gains — less information dropped as source material enters the pipeline: doc_to_md converts Word OMML / Office Math equations to inline LaTeX, pdf_to_md recognizes Figure N | pipe-delimited captions, and ppt_to_md preserves hyperlinks already in the source deck (run-level external [text](url) / slide-internal jumps [text](#slide-N) / shape-level clicks, with dangerous-scheme filtering and anchor-text Markdown escaping) and transcribes native chart data into Markdown tables (values survive the conversion instead of collapsing to a single picture). Caption recognition is based on @suay1113's PR #191; hyperlink preservation is distilled from @ZhaoZuohong's PR #155

  • Content-faithful PPT beautification / re-layout shipped (workflows/beautify-pptx.md) — mirror of template-fill: template-fill reuses a deck's design with new content, beautify keeps the content and redoes the layout. Given an existing PPTX, every text string is preserved verbatim (nothing added / removed / reworded); it extracts and inherits the source deck's visual identity (palette / fonts, theme or observed offered as two confirm-page candidates) and redoes only layout / hierarchy / whitespace, strict 1:1 page count and order, with charts / tables regenerated natively from extracted data (values frozen) and source pictures re-laid-out. Technically still "generate a native, editable PPTX from scratch" (ppt_to_md extracts content → main pipeline → a brand-new deck), not a patch over the original, so it stays clear of Non-goals #53. Adds beautify_identity.py / beautify_inventory.py; the confirm page is seeded from the source for the user to review. Honest v1 ceiling: it does not relieve information overload (a crowded page improves within itself; true re-pagination is the main pipeline), does not guarantee coordinate-level paste-back, and combo / dual-axis / waterfall charts lose the un-captured plots

  • Multi-deck PPTX intake + analysis/ source-name prefixing — a main-pipeline project can now combine several source decks: each writes its own <stem>.identity.json / <stem>.slide_library.json, with every deck's digest inlined under decks[] in the single index source_profile.json (preserving the "Strategist must-read source_profile.json" single-entry contract — one entry for a one-deck project, several for a combined one; re-importing the same stem replaces its entry). beautify / template-fill stay single-deck (1:1) and read their own <stem>.* artifacts

  • Material divergence (a free-text field under §c Audience) — a main-pipeline content-strategy question added as a free-text box under the audience field: the user states in their own words how closely to follow the source vs how freely to reshape it (blank = balanced default). Deliberately not a fixed set of options, not recommended from analyzing the source, and not coupled to page count — it is purely the user's stated intent. However freely they ask, facts stay sourced — reorganize / reframe / expand / connect what is in the source, never import facts from outside it (that is the topic-research job). The Strategist reads the prose when authoring the §IX outline and records it in design_spec §I; it is not written to spec_lock (the Executor never reads it). mode and divergence are orthogonal. Beautify / template-fill freeze content and do not surface this field

  • A batch of default-behavior and intake standardizations — per-element entrance animation now defaults off (page transition fade only; element builds are opt-in via -a auto / animations.json), removing the auto-cascade "AI deck" tell; per-project icons/ copies chosen icons into the project at selection time and embeds project-first; analysis/ is established as the canonical must-read layer for machine-extracted facts (PPTX intake bundle + image_analysis.csv); the main pipeline treats a source deck's identity (palette / fonts / layout) as reference, not constraint (inherit or redesign by Strategist judgment, defaulting to fresh design); the confirm page gains a custom color input


In progress / Next

Directions actively underway or up next, with no committed timeline.

  • Real-usage calibration of multi-deck intake and material divergence (just landed) — multi-deck combined intake (<stem> prefixing + a decks[] merge index) and the material-divergence free-text field (under §c audience) both shipped (see "2026-06" above); next is calibration from real usage: same-stem collisions across decks currently let the later one overwrite the former — whether dedupe / numbering is needed awaits signal; and whether the Strategist reads the free-text intent accurately, and whether the "facts stay sourced" line holds when the user asks for free reshaping, awaits real generation. Neither gets a mechanical threshold added pre-emptively
  • Otherwise: the mode / visual-style validation and calibration is settled (see "2026-06" above) — the structure (5 modes + 18 visual-styles + custom) is locked, the four adjacent pairs are consolidated into one Close-calls table, and the four calibration tightenings have landed. Future direction is driven by real usage signal and feedback; long-running improvements are under "Ongoing maintenance" below, and evaluated-out directions under "Non-goals"

Ongoing maintenance directions

Long-running improvements with no committed timeline. Only real directions are listed; specific fixes / single flags go to the commit log.

  • Prompt slimming — compress per-role prompt token footprint and improve cache hit rate without sacrificing quality, for indirect cost / speed gains. Complements "Pure speed optimization" below: indirect optimization yes, quality-sacrificing speedups no.

Non-goals

The directions below come up repeatedly and have been evaluated as not on the path. Listing them is not a value judgment on the underlying need — they simply don't fit this project's main route. If you specifically need these capabilities, consider other tools or forking.

Read arbitrary PPTX templates → fill text only

Issues: #53, #118

PPT Master's main route is "AI generates SVG from scratch → DrawingML", with the whole pipeline built around full control of every shape / text / layout. "Parse existing PPTX placeholders + only refill text" is a different product shape requiring handling of arbitrary master / theme / placeholder systems — orthogonal to where this architecture invests.

The basic need is actually simple: if you just need "replace Excel data into fixed positions in a PPT template", have the AI write a few lines of python-pptx. You don't need this pipeline.

vs. the template-fill-pptx route: "reuse a deck's own design and refill it with new content" is a supported capability (see "2026-06" above), with natively-editable output. What's out of scope here is the other shape — parsing an arbitrary third-party template's master / theme / placeholder system to do text-only substitution. Different investment points; don't conflate them.

Switch to native PowerPoint charts (Excel-native chart)

Issues: #99, #100-class

Pixel-fidelity across the four renderers (PowerPoint / Keynote / LibreOffice / WPS) is the project's spine. Switching to native PowerPoint charts breaks that — the same PPTX renders different chart layouts across renderers. Charts as SVG is by design, not a capability gap.

If you need data-driven native Excel charts, pick a different tool or manually replace charts in PowerPoint post-export — this project won't build that path in.

uv as default / required dependency

Issue: #111

pip + requirements.txt is the only official install path because it works in every Python environment with no extra learning cost. uv is a fine tool, but making it default raises the bar for new users. If you personally prefer uv, use it in your fork — it won't affect the main line.

Pure speed optimization

Issue: #97

In the cost / speed / quality triangle this project picks quality. ~20 minutes for a high-quality PPTX is the current reasonable point.

Will do: indirect improvements via prompt slimming / cache hit rate. Won't do: trading quality for "throw a few pages together" speed.

If speed-sensitive and quality-tolerant, Gamma / similar AI tools are a better fit.

CLI / SaaS / desktop app form factors

The product form is firmly chat-driven AI IDE skill (Claude Code / Cursor / VS Code + Copilot / Codebuddy).

Won't do: standalone CLI (ppm-style), SaaS web service, Electron shell. Any "make it run independently of chat" proposal will be declined. Chat is the interaction core, not a wrapper.


Feedback channels

Before proposing a new direction, scan the Non-goals above. If your request falls there, it's unlikely to land — but we're happy to discuss other paths to your underlying need.