Skip to content

Commit

Permalink
Merge branch 'invoke-ai:main' into rocm57
Browse files Browse the repository at this point in the history
  • Loading branch information
cbayle authored Jan 10, 2025
2 parents 9620c5d + d88b59c commit 5af90a7
Show file tree
Hide file tree
Showing 16 changed files with 346 additions and 141 deletions.
7 changes: 6 additions & 1 deletion docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ It has two sections - one for internal use and one for user settings:

```yaml
# Internal metadata - do not edit:
schema_version: 4
schema_version: 4.0.2

# Put user settings here - see https://invoke-ai.github.io/InvokeAI/features/CONFIGURATION/:
host: 0.0.0.0 # serve the app on your local network
Expand Down Expand Up @@ -83,6 +83,10 @@ A subset of settings may be specified using CLI args:
- `--root`: specify the root directory
- `--config`: override the default `invokeai.yaml` file location

### Low-VRAM Mode

See the [Low-VRAM mode docs][low-vram] for details on enabling this feature.

### All Settings

Following the table are additional explanations for certain settings.
Expand Down Expand Up @@ -185,3 +189,4 @@ The `log_format` option provides several alternative formats:

[basic guide to yaml files]: https://circleci.com/blog/what-is-yaml-a-beginner-s-guide/
[Model Marketplace API Keys]: #model-marketplace-api-keys
[low-vram]: ./features/low-vram.md
Binary file added docs/features/cuda-sysmem-fallback.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
129 changes: 129 additions & 0 deletions docs/features/low-vram.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
---
title: Low-VRAM mode
---

As of v5.6.0, Invoke has a low-VRAM mode. It works on systems with dedicated GPUs (Nvidia GPUs on Windows/Linux and AMD GPUs on Linux).

This allows you to generate even if your GPU doesn't have enough VRAM to hold full models. Most users should be able to run even the beefiest models - like the ~24GB unquantised FLUX dev model.

## Enabling Low-VRAM mode

To enable Low-VRAM mode, add this line to your `invokeai.yaml` configuration file, then restart Invoke:

```yaml
enable_partial_loading: true
```
**Windows users should also [disable the Nvidia sysmem fallback](#disabling-nvidia-sysmem-fallback-windows-only)**.
It is possible to fine-tune the settings for best performance or if you still get out-of-memory errors (OOMs).
!!! tip "How to find `invokeai.yaml`"

The `invokeai.yaml` configuration file lives in your install directory. To access it, run the **Invoke Community Edition** launcher and click the install location. This will open your install directory in a file explorer window.

You'll see `invokeai.yaml` there and can edit it with any text editor. After making changes, restart Invoke.

If you don't see `invokeai.yaml`, launch Invoke once. It will create the file on its first startup.

## Details and fine-tuning

Low-VRAM mode involves 3 features, each of which can be configured or fine-tuned:

- Partial model loading
- Dynamic RAM and VRAM cache sizes
- Working memory

Read on to learn about these features and understand how to fine-tune them for your system and use-cases.

### Partial model loading

Invoke's partial model loading works by streaming model "layers" between RAM and VRAM as they are needed.

When an operation needs layers that are not in VRAM, but there isn't enough room to load them, inactive layers are offloaded to RAM to make room.

#### Enabling partial model loading

As described above, you can enable partial model loading by adding this line to `invokeai.yaml`:

```yaml
enable_partial_loading: true
```

### Dynamic RAM and VRAM cache sizes

Loading models from disk is slow and can be a major bottleneck for performance. Invoke uses two model caches - RAM and VRAM - to reduce loading from disk to a minimum.

By default, Invoke manages these caches' sizes dynamically for best performance.

#### Fine-tuning cache sizes

Prior to v5.6.0, the cache sizes were static, and for best performance, many users needed to manually fine-tune the `ram` and `vram` settings in `invokeai.yaml`.

As of v5.6.0, the caches are dynamically sized. The `ram` and `vram` settings are no longer used, and new settings are added to configure the cache.

**Most users will not need to fine-tune the cache sizes.**

But, if your GPU has enough VRAM to hold models fully, you might get a perf boost by manually setting the cache sizes in `invokeai.yaml`:

```yaml
# Set the RAM cache size to as large as possible, leaving a few GB free for the rest of your system and Invoke.
# For example, if your system has 32GB RAM, 28GB is a good value.
max_cache_ram_gb: 28
# Set the VRAM cache size to be as large as possible while leaving enough room for the working memory of the tasks you will be doing.
# For example, on a 24GB GPU that will be running unquantized FLUX without any auxiliary models,
# 18GB is a good value.
max_cache_vram_gb: 18
```

!!! tip "Max safe value for `max_cache_vram_gb`"

To determine the max safe value for `max_cache_vram_gb`, subtract `device_working_mem_gb` from your GPU's VRAM. As described below, the default for `device_working_mem_gb` is 3GB.

For example, if you have a 12GB GPU, the max safe value for `max_cache_vram_gb` is `12GB - 3GB = 9GB`.

If you had increased `device_working_mem_gb` to 4GB, then the max safe value for `max_cache_vram_gb` is `12GB - 4GB = 8GB`.

### Working memory

Invoke cannot use _all_ of your VRAM for model caching and loading. It requires some VRAM to use as working memory for various operations.

Invoke reserves 3GB VRAM as working memory by default, which is enough for most use-cases. However, it is possible to fine-tune this setting if you still get OOMs.

#### Fine-tuning working memory

You can increase the working memory size in `invokeai.yaml` to prevent OOMs:

```yaml
# The default is 3GB - bump it up to 4GB to prevent OOMs.
device_working_mem_gb: 4
```

!!! tip "Operations may request more working memory"

For some operations, we can determine VRAM requirements in advance and allocate additional working memory to prevent OOMs.

VAE decoding is one such operation. This operation converts the generation process's output into an image. For large image outputs, this might use more than the default working memory size of 3GB.

During this decoding step, Invoke calculates how much VRAM will be required to decode and requests that much VRAM from the model manager. If the amount exceeds the working memory size, the model manager will offload cached model layers from VRAM until there's enough VRAM to decode.

Once decoding completes, the model manager "reclaims" the extra VRAM allocated as working memory for future model loading operations.

### Disabling Nvidia sysmem fallback (Windows only)

On Windows, Nvidia GPUs are able to use system RAM when their VRAM fills up via **sysmem fallback**. While it sounds like a good idea on the surface, in practice it causes massive slowdowns during generation.

It is strongly suggested to disable this feature:

- Open the **NVIDIA Control Panel** app.
- Expand **3D Settings** on the left panel.
- Click **Manage 3D Settings** in the left panel.
- Find **CUDA - Sysmem Fallback Policy** in the right panel and set it to **Prefer No Sysmem Fallback**.

![cuda-sysmem-fallback](./cuda-sysmem-fallback.png)

!!! tip "Invoke does the same thing, but better"

If the sysmem fallback feature sounds familiar, that's because Invoke's partial model loading strategy is conceptually very similar - use VRAM when there's room, else fall back to RAM.

Unfortunately, the Nvidia implementation is not optimized for applications like Invoke and does more harm than good.
2 changes: 1 addition & 1 deletion docs/installation/quick_start.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ If you have an existing Invoke installation, you can select it and let the launc
- Open the **Invoke-Installer-mac-arm64.dmg** file.
- Drag the launcher to **Applications**.
- Open a terminal.
- Run `xattr -cr /Applications/Invoke-Installer.app`.
- Run `xattr -d 'com.apple.quarantine' /Applications/Invoke\ Community\ Edition.app`.

You should now be able to run the launcher.

Expand Down
2 changes: 1 addition & 1 deletion docs/nodes/communityNodes.md
Original file line number Diff line number Diff line change
Expand Up @@ -535,7 +535,7 @@ View:
**Node Link:** https://github.com/simonfuhrmann/invokeai-stereo

**Example Workflow and Output**
</br><img src="https://github.com/simonfuhrmann/invokeai-stereo/blob/main/docs/example_promo_03.jpg" width="500" />
</br><img src="https://raw.githubusercontent.com/simonfuhrmann/invokeai-stereo/refs/heads/main/docs/example_promo_03.jpg" width="600" />

--------------------------------
### Simple Skin Detection
Expand Down
6 changes: 3 additions & 3 deletions invokeai/frontend/web/public/locales/en.json
Original file line number Diff line number Diff line change
Expand Up @@ -1185,6 +1185,7 @@
"modelAddedSimple": "Model Added to Queue",
"modelImportCanceled": "Model Import Canceled",
"outOfMemoryError": "Out of Memory Error",
"outOfMemoryErrorDescLocal": "Follow our <LinkComponent>Low VRAM guide</LinkComponent> to reduce OOMs.",
"outOfMemoryErrorDesc": "Your current generation settings exceed system capacity. Please adjust your settings and try again.",
"parameters": "Parameters",
"parameterSet": "Parameter Recalled",
Expand Down Expand Up @@ -2133,9 +2134,8 @@
"toGetStartedLocal": "To get started, make sure to download or import models needed to run Invoke. Then, enter a prompt in the box and click <StrongComponent>Invoke</StrongComponent> to generate your first image. Select a prompt template to improve results. You can choose to save your images directly to the <StrongComponent>Gallery</StrongComponent> or edit them to the <StrongComponent>Canvas</StrongComponent>.",
"toGetStarted": "To get started, enter a prompt in the box and click <StrongComponent>Invoke</StrongComponent> to generate your first image. Select a prompt template to improve results. You can choose to save your images directly to the <StrongComponent>Gallery</StrongComponent> or edit them to the <StrongComponent>Canvas</StrongComponent>.",
"gettingStartedSeries": "Want more guidance? Check out our <LinkComponent>Getting Started Series</LinkComponent> for tips on unlocking the full potential of the Invoke Studio.",
"downloadStarterModels": "Download Starter Models",
"importModels": "Import Models",
"noModelsInstalled": "It looks like you don't have any models installed"
"lowVRAMMode": "For best performance, follow our <LinkComponent>Low VRAM guide</LinkComponent>.",
"noModelsInstalled": "It looks like you don't have any models installed! You can <DownloadStarterModelsButton>download a starter model bundle</DownloadStarterModelsButton> or <ImportModelsButton>import models</ImportModelsButton>."
},
"whatsNew": {
"whatsNewInInvoke": "What's New in Invoke",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ export const CanvasMainPanelContent = memo(() => {
gap={2}
alignItems="center"
justifyContent="center"
overflow="hidden"
>
<CanvasManagerProviderGate>
<CanvasToolbar />
Expand All @@ -70,6 +71,7 @@ export const CanvasMainPanelContent = memo(() => {
h="full"
bg={dynamicGrid ? 'base.850' : 'base.900'}
borderRadius="base"
overflow="hidden"
>
<InvokeCanvasComponent />
<CanvasManagerProviderGate>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ export const ImageViewer = memo(({ closeButton }: Props) => {
left={0}
alignItems="center"
justifyContent="center"
overflow="hidden"
>
{hasImageToCompare && <CompareToolbar />}
{!hasImageToCompare && <ViewerToolbar closeButton={closeButton} />}
Expand Down
Loading

0 comments on commit 5af90a7

Please sign in to comment.