diff --git a/README.md b/README.md index f5c20302c..c326e6014 100644 --- a/README.md +++ b/README.md @@ -191,6 +191,7 @@ LANGUAGE=en |--------|-------------|---------------| | 🖥️ **Gradio Web UI** | Interactive web interface for music generation | [Guide](./docs/en/GRADIO_GUIDE.md) | | 🧭 **UI Support Baseline** | Supported UI boundary and future UI parity checklist | [Guide](./docs/en/UI_SUPPORT.md) | +| 🧩 **Next UI Requirements** | Product requirements and information architecture for a future beginner-friendly UI | [Guide](./docs/en/NEXT_UI_REQUIREMENTS.md) | | 🎛️ **VST3 Plugin** | Standalone VST3 plugin (C++/GGML) for DAW integration | [acestep.vst3](https://github.com/ace-step/acestep.vst3) | | 🐍 **Python API** | Programmatic access for integration | [Guide](./docs/en/INFERENCE.md) | | 🌐 **REST API** | HTTP-based async API for services | [Guide](./docs/en/API.md) | diff --git a/docs/.vitepress/config.mts b/docs/.vitepress/config.mts index 84be2090b..6a5159e44 100644 --- a/docs/.vitepress/config.mts +++ b/docs/.vitepress/config.mts @@ -154,6 +154,7 @@ function sidebarEN() { items: [ { text: 'Gradio UI Guide', link: '/en/GRADIO_GUIDE' }, { text: 'UI Support Baseline', link: '/en/UI_SUPPORT' }, + { text: 'Next UI Requirements', link: '/en/NEXT_UI_REQUIREMENTS' }, { text: 'CLI', link: '/en/CLI' }, { text: "Musician's Guide", link: '/en/ace_step_musicians_guide' }, ], diff --git a/docs/en/NEXT_UI_REQUIREMENTS.md b/docs/en/NEXT_UI_REQUIREMENTS.md new file mode 100644 index 000000000..273f6f784 --- /dev/null +++ b/docs/en/NEXT_UI_REQUIREMENTS.md @@ -0,0 +1,154 @@ +# Next UI Requirements + +This document defines product requirements and information architecture for a future +beginner-friendly ACE-Step UI. It is intentionally framework-neutral: implementation should not +begin until the supported workflows, feature parity expectations, and help model are agreed. + +## Goals + +- Help a new user generate a first useful song without understanding model internals. +- Preserve the full capability coverage documented in [UI Support Baseline](UI_SUPPORT.md). +- Keep expert controls available without making them part of the default path. +- Make model readiness, hardware limits, errors, and recovery steps understandable. +- Avoid reintroducing multiple competing product UIs. + +## Non-Goals + +- Do not replace the supported Gradio UI in this planning step. +- Do not choose a frontend framework in this document. +- Do not remove API, CLI, or Side-Step training workflows as part of new UI planning. +- Do not expose unfinished flows as default-ready user features. + +## User Tiers + +| Tier | User intent | UI posture | +|------|-------------|------------| +| Beginner | Generate a song from a plain-language idea. | Guided defaults, minimal required choices, clear examples. | +| Intermediate | Reuse a result, remix audio, repaint a section, control lyrics and structure. | Task-based flows with contextual controls. | +| Advanced | Tune metadata, seeds, batch generation, adapters, source audio, and training data. | Collapsible expert sections with safe defaults. | +| Expert | Control diffusion, LM behavior, audio codes, diagnostics, and edge workflows. | Complete access, but separated from beginner workflows. | +| Admin | Configure models, devices, service mode, auth, ports, API mode, storage, and paths. | Settings-focused interface with explicit risk warnings. | + +## Primary Navigation + +| Area | Purpose | Default audience | +|------|---------|------------------| +| Create | First-run and everyday text-to-music generation. | Beginner | +| Refine | Lyrics, metadata, seed, duration, language, and style iteration. | Beginner/Intermediate | +| Edit Audio | Remix, repaint, extract, lego, and complete workflows. | Intermediate/Advanced | +| Results | Listen, compare, save, download, restore params, score, LRC, and reuse outputs. | Beginner/Intermediate | +| Train | Dataset builder, preprocessing, LoRA, and LoKr training. | Advanced | +| Settings | Model readiness, hardware, API/service mode, auth, storage, and expert runtime options. | Admin | +| Help | Task guides, examples, glossary, troubleshooting, and recovery steps. | All users | + +## First-Run Experience + +The first screen should answer four questions before asking for creative input: + +1. Is the required model available? +2. Is the current hardware ready? +3. What generation limits apply on this machine? +4. What is the simplest safe action the user can take next? + +Requirements: + +- Show a plain-language readiness state: ready, loading, needs download, needs setup, or unsupported. +- Explain whether the LM is optional, unavailable, or required for the selected action. +- Prefer recommended hardware/model defaults and hide risky overrides. +- Provide a single primary action, such as "Generate a Song", once the system is ready. +- Surface OOM risk before generation when duration, batch size, LM choice, or model choice is unsafe. + +## Create Flow + +The default generation flow should be: + +1. Describe the song. +2. Choose vocals or instrumental. +3. Optionally add lyrics. +4. Pick duration with an auto-safe default. +5. Generate. +6. Listen, save, or refine. + +Requirements: + +- The caption field should support examples and structured suggestions. +- Lyrics should be optional and clearly separated from style description. +- Metadata should default to auto unless the user opens refinement controls. +- Simple mode should remain the default beginner path. +- Custom mode capabilities should be available without requiring users to know the word "Custom". + +## Edit Audio Flow + +Editing should be presented as user tasks instead of internal task names. + +| User task | Current Gradio capability | UI requirement | +|-----------|---------------------------|----------------| +| Change the style of an existing song | Remix / cover | Explain structure preservation and strength. | +| Regenerate part of a song | Repaint | Provide clear start/end controls and recovery if the range is invalid. | +| Isolate an instrument or vocal | Extract | Gate by model support and explain source audio requirements. | +| Add an instrument layer | Lego | Gate by model support and show track choices clearly. | +| Complete a partial arrangement | Complete | Gate by model support and frame as arrangement completion. | + +The UI should disable unsupported tasks for the loaded model while explaining which model family +enables them. + +## Results Flow + +Results should support iteration, not just playback. + +Requirements: + +- Show generated audio in a comparison-friendly layout. +- Keep batch navigation understandable when more than one batch exists. +- Make save/download actions obvious. +- Preserve restore-params, send-to-remix, and send-to-repaint actions. +- Keep scores, LRC, audio codes, and generation metadata discoverable in details. +- Clearly show seed and key generation parameters for reproducibility. + +## Training Flow + +Training should not be part of the beginner first screen, but it must remain available. + +Requirements: + +- Separate dataset preparation from training execution. +- Preserve dataset scan, preview, label, edit, save, and preprocessing steps. +- Preserve LoRA and LoKr training flows with clear experimental labeling for LoKr. +- Provide status, logs, and error recovery for missing files, invalid datasets, and interrupted runs. +- Avoid hiding training behind unrelated generation settings. + +## Progressive Disclosure Rules + +- Beginner default: idea, vocals/instrumental, lyrics, duration, generate. +- Refine: BPM, key, language, seed, batch size, audio format, metadata restore. +- Advanced: model choices, LoRA, reference audio, source audio, repaint/remix strength. +- Expert: audio codes, diffusion controls, LM controls, constrained decoding, DCW, timesteps. +- Admin: service mode, API mode, auth, ports, allowed paths, download source, hardware overrides. + +Every advanced or expert section should include a short explanation of when to use it. + +## Help And Recovery + +Help content should be task-level, not only field-level. + +Required help coverage: + +- First generation walkthrough. +- Caption vs lyrics explanation. +- Instrumental and vocal-language guidance. +- Remix vs repaint distinction. +- Base-only mode explanation for Extract, Lego, and Complete. +- LoRA and LoKr training prerequisites. +- OOM recovery and safe hardware defaults. +- Missing model/download recovery. +- Invalid audio, invalid repaint range, and unsupported format recovery. +- Seed and reproducibility explanation. + +## Implementation Constraints + +- Preserve Gradio feature parity unless a feature is explicitly deprecated in a separate issue. +- Keep APIs framework-neutral so future UIs do not duplicate model loading or generation logic. +- Do not expose unfinished UI routes by default. +- Keep the supported Gradio UI stable while new UI work is experimental. +- Treat any reusable service extraction as a separate implementation PR. + diff --git a/docs/en/UI_SUPPORT.md b/docs/en/UI_SUPPORT.md index 7b99d96b9..26226ca30 100644 --- a/docs/en/UI_SUPPORT.md +++ b/docs/en/UI_SUPPORT.md @@ -62,6 +62,9 @@ friendlier defaults and progressive disclosure. ## Design Implications For The Next UI +See [Next UI Requirements](NEXT_UI_REQUIREMENTS.md) for the product requirements and information +architecture that build on this support baseline. + - The first screen should prioritize one successful generation over exposing every parameter. - Advanced controls should remain complete, but grouped by intent: model setup, song structure, generation quality, LM behavior, source audio editing, and expert code controls.