Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -191,6 +191,7 @@ LANGUAGE=en
|--------|-------------|---------------|
| 🖥️ **Gradio Web UI** | Interactive web interface for music generation | [Guide](./docs/en/GRADIO_GUIDE.md) |
| 🧭 **UI Support Baseline** | Supported UI boundary and future UI parity checklist | [Guide](./docs/en/UI_SUPPORT.md) |
| 🧩 **Next UI Requirements** | Product requirements and information architecture for a future beginner-friendly UI | [Guide](./docs/en/NEXT_UI_REQUIREMENTS.md) |
| 🎛️ **VST3 Plugin** | Standalone VST3 plugin (C++/GGML) for DAW integration | [acestep.vst3](https://github.com/ace-step/acestep.vst3) |
| 🐍 **Python API** | Programmatic access for integration | [Guide](./docs/en/INFERENCE.md) |
| 🌐 **REST API** | HTTP-based async API for services | [Guide](./docs/en/API.md) |
Expand Down
1 change: 1 addition & 0 deletions docs/.vitepress/config.mts
Original file line number Diff line number Diff line change
Expand Up @@ -154,6 +154,7 @@ function sidebarEN() {
items: [
{ text: 'Gradio UI Guide', link: '/en/GRADIO_GUIDE' },
{ text: 'UI Support Baseline', link: '/en/UI_SUPPORT' },
{ text: 'Next UI Requirements', link: '/en/NEXT_UI_REQUIREMENTS' },
{ text: 'CLI', link: '/en/CLI' },
{ text: "Musician's Guide", link: '/en/ace_step_musicians_guide' },
],
Expand Down
154 changes: 154 additions & 0 deletions docs/en/NEXT_UI_REQUIREMENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
# Next UI Requirements

This document defines product requirements and information architecture for a future
beginner-friendly ACE-Step UI. It is intentionally framework-neutral: implementation should not
begin until the supported workflows, feature parity expectations, and help model are agreed.

## Goals

- Help a new user generate a first useful song without understanding model internals.
- Preserve the full capability coverage documented in [UI Support Baseline](UI_SUPPORT.md).
- Keep expert controls available without making them part of the default path.
- Make model readiness, hardware limits, errors, and recovery steps understandable.
- Avoid reintroducing multiple competing product UIs.

## Non-Goals

- Do not replace the supported Gradio UI in this planning step.
- Do not choose a frontend framework in this document.
- Do not remove API, CLI, or Side-Step training workflows as part of new UI planning.
- Do not expose unfinished flows as default-ready user features.

## User Tiers

| Tier | User intent | UI posture |
|------|-------------|------------|
| Beginner | Generate a song from a plain-language idea. | Guided defaults, minimal required choices, clear examples. |
| Intermediate | Reuse a result, remix audio, repaint a section, control lyrics and structure. | Task-based flows with contextual controls. |
| Advanced | Tune metadata, seeds, batch generation, adapters, source audio, and training data. | Collapsible expert sections with safe defaults. |
| Expert | Control diffusion, LM behavior, audio codes, diagnostics, and edge workflows. | Complete access, but separated from beginner workflows. |
| Admin | Configure models, devices, service mode, auth, ports, API mode, storage, and paths. | Settings-focused interface with explicit risk warnings. |

## Primary Navigation

| Area | Purpose | Default audience |
|------|---------|------------------|
| Create | First-run and everyday text-to-music generation. | Beginner |
| Refine | Lyrics, metadata, seed, duration, language, and style iteration. | Beginner/Intermediate |
| Edit Audio | Remix, repaint, extract, lego, and complete workflows. | Intermediate/Advanced |
| Results | Listen, compare, save, download, restore params, score, LRC, and reuse outputs. | Beginner/Intermediate |
| Train | Dataset builder, preprocessing, LoRA, and LoKr training. | Advanced |
| Settings | Model readiness, hardware, API/service mode, auth, storage, and expert runtime options. | Admin |
| Help | Task guides, examples, glossary, troubleshooting, and recovery steps. | All users |

## First-Run Experience

The first screen should answer four questions before asking for creative input:

1. Is the required model available?
2. Is the current hardware ready?
3. What generation limits apply on this machine?
4. What is the simplest safe action the user can take next?

Requirements:

- Show a plain-language readiness state: ready, loading, needs download, needs setup, or unsupported.
- Explain whether the LM is optional, unavailable, or required for the selected action.
- Prefer recommended hardware/model defaults and hide risky overrides.
- Provide a single primary action, such as "Generate a Song", once the system is ready.
- Surface OOM risk before generation when duration, batch size, LM choice, or model choice is unsafe.

## Create Flow

The default generation flow should be:

1. Describe the song.
2. Choose vocals or instrumental.
3. Optionally add lyrics.
4. Pick duration with an auto-safe default.
5. Generate.
6. Listen, save, or refine.

Requirements:

- The caption field should support examples and structured suggestions.
- Lyrics should be optional and clearly separated from style description.
- Metadata should default to auto unless the user opens refinement controls.
- Simple mode should remain the default beginner path.
- Custom mode capabilities should be available without requiring users to know the word "Custom".

## Edit Audio Flow

Editing should be presented as user tasks instead of internal task names.

| User task | Current Gradio capability | UI requirement |
|-----------|---------------------------|----------------|
| Change the style of an existing song | Remix / cover | Explain structure preservation and strength. |
| Regenerate part of a song | Repaint | Provide clear start/end controls and recovery if the range is invalid. |
| Isolate an instrument or vocal | Extract | Gate by model support and explain source audio requirements. |
| Add an instrument layer | Lego | Gate by model support and show track choices clearly. |
| Complete a partial arrangement | Complete | Gate by model support and frame as arrangement completion. |

The UI should disable unsupported tasks for the loaded model while explaining which model family
enables them.

## Results Flow

Results should support iteration, not just playback.

Requirements:

- Show generated audio in a comparison-friendly layout.
- Keep batch navigation understandable when more than one batch exists.
- Make save/download actions obvious.
- Preserve restore-params, send-to-remix, and send-to-repaint actions.
- Keep scores, LRC, audio codes, and generation metadata discoverable in details.
- Clearly show seed and key generation parameters for reproducibility.

## Training Flow

Training should not be part of the beginner first screen, but it must remain available.

Requirements:

- Separate dataset preparation from training execution.
- Preserve dataset scan, preview, label, edit, save, and preprocessing steps.
- Preserve LoRA and LoKr training flows with clear experimental labeling for LoKr.
- Provide status, logs, and error recovery for missing files, invalid datasets, and interrupted runs.
- Avoid hiding training behind unrelated generation settings.

## Progressive Disclosure Rules

- Beginner default: idea, vocals/instrumental, lyrics, duration, generate.
- Refine: BPM, key, language, seed, batch size, audio format, metadata restore.
- Advanced: model choices, LoRA, reference audio, source audio, repaint/remix strength.
- Expert: audio codes, diffusion controls, LM controls, constrained decoding, DCW, timesteps.
- Admin: service mode, API mode, auth, ports, allowed paths, download source, hardware overrides.

Every advanced or expert section should include a short explanation of when to use it.

## Help And Recovery

Help content should be task-level, not only field-level.

Required help coverage:

- First generation walkthrough.
- Caption vs lyrics explanation.
- Instrumental and vocal-language guidance.
- Remix vs repaint distinction.
- Base-only mode explanation for Extract, Lego, and Complete.
- LoRA and LoKr training prerequisites.
- OOM recovery and safe hardware defaults.
- Missing model/download recovery.
- Invalid audio, invalid repaint range, and unsupported format recovery.
- Seed and reproducibility explanation.

## Implementation Constraints

- Preserve Gradio feature parity unless a feature is explicitly deprecated in a separate issue.
- Keep APIs framework-neutral so future UIs do not duplicate model loading or generation logic.
- Do not expose unfinished UI routes by default.
- Keep the supported Gradio UI stable while new UI work is experimental.
- Treat any reusable service extraction as a separate implementation PR.

3 changes: 3 additions & 0 deletions docs/en/UI_SUPPORT.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,9 @@ friendlier defaults and progressive disclosure.

## Design Implications For The Next UI

See [Next UI Requirements](NEXT_UI_REQUIREMENTS.md) for the product requirements and information
architecture that build on this support baseline.

- The first screen should prioritize one successful generation over exposing every parameter.
- Advanced controls should remain complete, but grouped by intent: model setup, song structure,
generation quality, LM behavior, source audio editing, and expert code controls.
Expand Down
Loading