From 3c0993098ccc8cfa819924098c708116edcea296 Mon Sep 17 00:00:00 2001 From: ClashSAN <98228077+ClashSAN@users.noreply.github.com> Date: Tue, 19 Sep 2023 22:16:24 -0400 Subject: [PATCH] add AMD settings, link nvidia driver issue, and remove less useful tips --- Optimum-SDXL-Usage.md | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/Optimum-SDXL-Usage.md b/Optimum-SDXL-Usage.md index cf446d2..4928b7e 100644 --- a/Optimum-SDXL-Usage.md +++ b/Optimum-SDXL-Usage.md @@ -5,16 +5,22 @@ Here's a quick listing of things to tune for your setup: - Nvidia (12gb+) `--xformers` - Nvidia (8gb) `--medvram-sdxl --xformers` - Nvidia (4gb) `--lowvram --xformers` - +- AMD (4gb) `--lowvram --opt-sub-quad-attention` + TAESD in settings +
Both rocm and directml will generate at least 1024x1024 pictures at fp16. However, at full precision, the model fails to load (into 4gb). If your card needs --no-half, try enabling --upcast-sampling instead.
## System: -- Downgrade [Nvidia drivers](https://www.nvidia.com/en-us/geforce/drivers/) to 531 or lower prevent extreme slowdowns for largest pictures. +- (Windows) Downgrade [Nvidia drivers](https://www.nvidia.com/en-us/geforce/drivers/) to 531 or lower. New drivers cause extreme slowdowns on Windows when generating large images towards your card's maximum vram. \ +This important issue is discussed [here](https://github.com/vladmandic/automatic/discussions/1285) and in [#11063](https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/11063). \ +Symptoms: + + * You see Shared GPU memory usage filling up in Task Manager + * Your generations that usually take 1-2 min, take 7+ min + * low vram cards are generating very slowly + - Add a pagefile to prevent failure loading weights due to low RAM. - (Linux) install `tcmalloc`, greatly reducing RAM usage: `sudo apt install --no-install-recommends google-perftools` ([#10117](https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/10117)). - Use an SSD for faster load time, especially if a pagefile is required. -- Convert `.safetensors` to `.ckpt` for reduced RAM usage ([#12086](https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/12086#issuecomment-1691154698)). ## Model weights: -- Use a VAE that will not need to run in fp32 for increased speed and less VRAM usage: [sdxl_vae.safetensors](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix/blob/main/sdxl_vae.safetensors). -- Use [TAESD](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Features#taesd). -- Use fp16 weights (~7gb) for less RAM usage. \ No newline at end of file +- Use [sdxl-vae-fp16-fix](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix/blob/main/sdxl_vae.safetensors), a VAE that will not need to run in fp32, for increased speed and less VRAM usage. +- Use [TAESD](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Features#taesd); a VAE that uses drastically less vram at the cost of some quality. \ No newline at end of file