mirror of
https://github.com/AUTOMATIC1111/stable-diffusion-webui.git
synced 2025-03-04 13:04:53 +08:00
Updated Optimizations (markdown)
parent
82011d27c3
commit
6df331a03a
@ -18,4 +18,32 @@ A number of optimization can be enabled by [commandline arguments](Run-with-Cust
|
|||||||
Extra tips (Windows):
|
Extra tips (Windows):
|
||||||
- https://github.com/AUTOMATIC1111/stable-diffusion-webui/discussions/3889 Disable Hardware GPU scheduling.
|
- https://github.com/AUTOMATIC1111/stable-diffusion-webui/discussions/3889 Disable Hardware GPU scheduling.
|
||||||
- disable browser hardware acceleration
|
- disable browser hardware acceleration
|
||||||
- Go in nvidia control panel, 3d parameters, and change power profile to "maximum performance"
|
- Go in nvidia control panel, 3d parameters, and change power profile to "maximum performance"
|
||||||
|
|
||||||
|
## Memory & Performance Impact of Optimizers and Flags
|
||||||
|
|
||||||
|
*This is an example test using specific hardware and configuration, your mileage may vary*
|
||||||
|
*Tested using nVidia RTX3060 and CUDA 11.7*
|
||||||
|
|
||||||
|
| Cross-attention | Peak Memory at Batch size 1/2/4/8/16 | Initial It/s | Peak It/s | Note |
|
||||||
|
| --------------- | ------------------------------------ | -------- | --------- | ---- |
|
||||||
|
| None | 4.1 / 6.2 / OOM / OOM / OOM | 4.2 | 4.6 | slow and early out-of-memory
|
||||||
|
| v1 | 2.8 / 2.8 / 2.8 / 3.1 / 4.1 | 4.1 | 4.7 | slow but lowest memory usage and does not require sometimes problematic xformers
|
||||||
|
| InvokeAI | 3.1 / 4.2 / 6.3 / 6.6 / 7.0 | 5.5 | 6.6 | almost identical to default optimizer
|
||||||
|
| Doggetx | 3.1 / 4.2 / 6.3 / 6.6 / 7.1 | 5.4 | 6.6 | default |
|
||||||
|
| Doggetx | 2.2 / 2.7 / 3.8 / 5.9 / 6.2 | 4.1 | 6.3 | using `medvram` preset result in decent memory savings without huge performance hit
|
||||||
|
| Doggetx | 0.9 / 1.1 / 2.2 / 4.3 / 6.4 | 1.0 | 6.3 | using `lowvram` preset is extremely slow due to constant swapping
|
||||||
|
| Xformers | 2.8 / 2.8 / 2.8 / 3.1 / 4.1 | 6.5 | 7.5 | fastest and low memory
|
||||||
|
| Xformers | 2.9 / 2.9 / 2.9 / 3.6 / 4.1 | 6.4 | 7.6 | with `cuda_alloc_conf` and `opt-channelslast`
|
||||||
|
|
||||||
|
Notes:
|
||||||
|
- Performance at batch-size 1 is around **~70%** of peak performance
|
||||||
|
- Peak performance is typically around batch size 8
|
||||||
|
After that it grows by few percent if you have extra VRAM before it starts to drop due to GC kicking in
|
||||||
|
- Performance with `lowvram` preset is very low below batch size 8 and by then memory savings are not that big
|
||||||
|
|
||||||
|
Other possible optimizations:
|
||||||
|
- `PYTORCH_CUDA_ALLOC_CONF=garbage_collection_threshold:0.9,max_split_size_mb:512`
|
||||||
|
No performance impact and increases initial memory footprint a bit but reduces memory fragmentation in long runs
|
||||||
|
- `opt-channelslast`
|
||||||
|
Hit-and-miss: seems like additional slight performance increase with higher batch sizes and slower with small sizes, but differences are within margin-of-error
|
||||||
|
Loading…
Reference in New Issue
Block a user