mirror of
https://github.com/AUTOMATIC1111/stable-diffusion-webui.git
synced 2025-01-31 02:32:57 +08:00
Capitalization
parent
60c6d26f96
commit
b692e254e2
@ -2,13 +2,13 @@ A number of optimization can be enabled by [commandline arguments](Command-Line-
|
||||
|
||||
| commandline argument | explanation |
|
||||
|--------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
||||
| `--opt-sdp-attention` | May results in faster speeds than using xformers on some systems but requires more VRAM. (non-deterministic)
|
||||
| `--opt-sdp-no-mem-attention` | May results in faster speeds than using xformers on some systems but requires more VRAM. (deterministic, slightly slower than `--opt-sdp-attention` and uses more VRAM)
|
||||
| `--xformers` | Use [xformers](https://github.com/facebookresearch/xformers) library. Great improvement to memory consumption and speed. Nvidia GPUs only. (non-deterministic) |
|
||||
| `--force-enable-xformers` | Enables xformers regardless of whether the program thinks you can run it or not. Do not report bugs you get running this. |
|
||||
| `--opt-sdp-attention` | May results in faster speeds than using xFormers on some systems but requires more VRAM. (non-deterministic)
|
||||
| `--opt-sdp-no-mem-attention` | May results in faster speeds than using xFormers on some systems but requires more VRAM. (deterministic, slightly slower than `--opt-sdp-attention` and uses more VRAM)
|
||||
| `--xformers` | Use [xFormers](https://github.com/facebookresearch/xformers) library. Great improvement to memory consumption and speed. Nvidia GPUs only. (non-deterministic) |
|
||||
| `--force-enable-xformers` | Enables xFormers regardless of whether the program thinks you can run it or not. Do not report bugs you get running this. |
|
||||
| `--opt-split-attention` | Cross attention layer optimization significantly reducing memory use for almost no cost (some report improved performance with it). Black magic. <br/>On by default for `torch.cuda`, which includes both NVidia and AMD cards. |
|
||||
| `--disable-opt-split-attention` | Disables the optimization above. |
|
||||
| `--opt-sub-quad-attention` | Sub-quadratic attention, a memory efficient Cross Attention layer optimization that can significantly reduce required memory, sometimes at a slight performance cost. Recommended if getting poor performance or failed generations with a hardware/software configuration that xformers doesn't work for. On macOS, this will also allow for generation of larger images. |
|
||||
| `--opt-sub-quad-attention` | Sub-quadratic attention, a memory efficient Cross Attention layer optimization that can significantly reduce required memory, sometimes at a slight performance cost. Recommended if getting poor performance or failed generations with a hardware/software configuration that xFormers doesn't work for. On macOS, this will also allow for generation of larger images. |
|
||||
| `--opt-split-attention-v1` | Uses an older version of the optimization above that is not as memory hungry (it will use less VRAM, but will be more limiting in the maximum size of pictures you can make). |
|
||||
| `--medvram` | Makes the Stable Diffusion model consume less VRAM by splitting it into three parts - cond (for transforming text into numerical representation), first_stage (for converting a picture into latent space and back), and unet (for actual denoising of latent space) and making it so that only one is in VRAM at all times, sending others to CPU RAM. Lowers performance, but only by a bit - except if live previews are enabled. |
|
||||
| `--lowvram` | An even more thorough optimization of the above, splitting unet into many modules, and only one module is kept in VRAM. Devastating for performance. |
|
||||
@ -17,7 +17,7 @@ A number of optimization can be enabled by [commandline arguments](Command-Line-
|
||||
| `--opt-channelslast` | Changes torch memory type for stable diffusion to channels last. Effects not closely studied. |
|
||||
| `--upcast-sampling` | For Nvidia and AMD cards normally forced to run with `--no-half`, [should improve generation speed](https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/8782).
|
||||
|
||||
since webui 1.3.0 `Cross attention optimization` can be selected under settings (xformers still needs to enabled via COMMANDLINE_ARGS)
|
||||
As of [version 1.3.0](https://github.com/AUTOMATIC1111/stable-diffusion-webui/releases/tag/v1.3.0), `Cross attention optimization` can be selected under settings. xFormers still needs to enabled via `COMMANDLINE_ARGS`.
|
||||
![2023-06-21 22_53_54_877 chrome](https://github.com/AUTOMATIC1111/stable-diffusion-webui/assets/40751091/c72576e1-0f51-4643-ad91-e9aaec4fc125)
|
||||
|
||||
|
||||
@ -34,13 +34,13 @@ Extra tips (Windows):
|
||||
| Cross-attention | Peak Memory at Batch size 1/2/4/8/16 | Initial It/s | Peak It/s | Note |
|
||||
| --------------- | ------------------------------------ | -------- | --------- | ---- |
|
||||
| None | 4.1 / 6.2 / OOM / OOM / OOM | 4.2 | 4.6 | slow and early out-of-memory
|
||||
| v1 | 2.8 / 2.8 / 2.8 / 3.1 / 4.1 | 4.1 | 4.7 | slow but lowest memory usage and does not require sometimes problematic xformers
|
||||
| v1 | 2.8 / 2.8 / 2.8 / 3.1 / 4.1 | 4.1 | 4.7 | slow but lowest memory usage and does not require sometimes problematic xFormers
|
||||
| InvokeAI | 3.1 / 4.2 / 6.3 / 6.6 / 7.0 | 5.5 | 6.6 | almost identical to default optimizer
|
||||
| Doggetx | 3.1 / 4.2 / 6.3 / 6.6 / 7.1 | 5.4 | 6.6 | default |
|
||||
| Doggetx | 2.2 / 2.7 / 3.8 / 5.9 / 6.2 | 4.1 | 6.3 | using `medvram` preset result in decent memory savings without huge performance hit
|
||||
| Doggetx | 0.9 / 1.1 / 2.2 / 4.3 / 6.4 | 1.0 | 6.3 | using `lowvram` preset is extremely slow due to constant swapping
|
||||
| Xformers | 2.8 / 2.8 / 2.8 / 3.1 / 4.1 | 6.5 | 7.5 | fastest and low memory
|
||||
| Xformers | 2.9 / 2.9 / 2.9 / 3.6 / 4.1 | 6.4 | 7.6 | with `cuda_alloc_conf` and `opt-channelslast`
|
||||
| xFormers | 2.8 / 2.8 / 2.8 / 3.1 / 4.1 | 6.5 | 7.5 | fastest and low memory
|
||||
| xFormers | 2.9 / 2.9 / 2.9 / 3.6 / 4.1 | 6.4 | 7.6 | with `cuda_alloc_conf` and `opt-channelslast`
|
||||
|
||||
Notes:
|
||||
- Performance at batch-size 1 is around **~70%** of peak performance
|
||||
|
Loading…
Reference in New Issue
Block a user