Commit Graph

92 Commits

Author SHA1 Message Date
Kohaku-Blueleaf
f9ba7e648a Revert "Try to reverse the dtype checking mechanism"
This reverts commit d243e24f53.
2024-01-29 22:54:12 +08:00
Kohaku-Blueleaf
d243e24f53 Try to reverse the dtype checking mechanism 2024-01-29 22:49:45 +08:00
Kohaku-Blueleaf
6e7f0860f7 linting 2024-01-29 22:46:43 +08:00
Kohaku-Blueleaf
750dd6014a Fix potential bugs 2024-01-29 22:27:53 +08:00
Kohaku-Blueleaf
4a66d2fb22 Avoid exceptions to be silenced 2024-01-20 16:33:59 +08:00
Kohaku-Blueleaf
81126027f5 Avoid early disable 2024-01-20 16:31:12 +08:00
Kohaku-Blueleaf
0181c1f76b Fix nested manual cast 2024-01-19 00:14:03 +08:00
Kohaku-Blueleaf
ca671e5d7b rearrange if-statements for cpu 2024-01-09 23:30:55 +08:00
Kohaku-Blueleaf
58d5b042cd Apply the correct behavior of precision='full' 2024-01-09 23:23:40 +08:00
Kohaku-Blueleaf
1fd69655fe Revert "Apply correct inference precision implementation"
This reverts commit e00365962b.
2024-01-09 23:15:05 +08:00
Kohaku-Blueleaf
e00365962b Apply correct inference precision implementation 2024-01-09 23:13:34 +08:00
Kohaku-Blueleaf
c2c05fcca8 linting and debugs 2024-01-09 22:53:58 +08:00
KohakuBlueleaf
42e6df723c Fix bugs when arg dtype doesn't match 2024-01-09 22:39:39 +08:00
Kohaku-Blueleaf
209c26a1cb improve efficiency and support more device 2024-01-09 22:11:44 +08:00
AUTOMATIC1111
a70dfb64a8 change import statements for #14478 2023-12-31 22:38:30 +03:00
Aarni Koskela
5768afc776 Add utility to inspect a model's parameters (to get dtype/device) 2023-12-31 13:22:43 +02:00
Kohaku-Blueleaf
9a15ae2a92 Merge branch 'dev' into test-fp8 2023-12-03 10:54:54 +08:00
AUTOMATIC1111
af5f0734c9
Merge pull request #14171 from Nuullll/ipex
Initial IPEX support for Intel Arc GPU
2023-12-02 19:22:32 +03:00
Kohaku-Blueleaf
110485d5bb Merge branch 'dev' into test-fp8 2023-12-02 17:00:09 +08:00
AUTOMATIC1111
88736b5557
Merge pull request #14131 from read-0nly/patch-1
Update devices.py - Make 'use-cpu all' actually apply to 'all'
2023-12-02 09:46:19 +03:00
Nuullll
7499148ad4 Disable ipex autocast due to its bad perf 2023-12-02 14:00:46 +08:00
Nuullll
8b40f475a3 Initial IPEX support 2023-11-30 20:22:46 +08:00
obsol
3cd6e1d0a0
Update devices.py
fixes issue where "--use-cpu" all properly makes SD run on CPU but leaves ControlNet (and other extensions, I presume) pointed at GPU, causing a crash in ControlNet caused by a mismatch between devices between SD and CN

https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/14097
2023-11-27 19:21:43 -05:00
Kohaku-Blueleaf
043d2edcf6 Better naming 2023-11-19 15:56:31 +08:00
Kohaku-Blueleaf
598da5cd49 Use options instead of cmd_args 2023-11-19 15:50:06 +08:00
KohakuBlueleaf
ddc2a3499b Add MPS manual cast 2023-10-28 16:52:35 +08:00
Kohaku-Blueleaf
d4d3134f6d ManualCast for 10/16 series gpu 2023-10-28 15:24:26 +08:00
Kohaku-Blueleaf
eaa9f5162f Add CPU fp8 support
Since norm layer need fp32, I only convert the linear operation layer(conv2d/linear)

And TE have some pytorch function not support bf16 amp in CPU. I add a condition to indicate if the autocast is for unet.
2023-10-24 01:49:05 +08:00
AUTOMATIC1111
46375f0592 fix for crash when running #12924 without --device-id 2023-09-09 09:39:37 +03:00
catboxanon
5681bf8016 More accurate check for enabling cuDNN benchmark on 16XX cards 2023-08-31 14:57:16 -04:00
AUTOMATIC1111
386245a264 split shared.py into multiple files; should resolve all circular reference import errors related to shared.py 2023-08-09 10:25:35 +03:00
AUTOMATIC1111
0d5dc9a6e7 rework RNG to use generators instead of generating noises beforehand 2023-08-09 08:43:31 +03:00
AUTOMATIC1111
fca42949a3 rework torchsde._brownian.brownian_interval replacement to use device.randn_local and respect the NV setting. 2023-08-03 07:18:55 +03:00
AUTOMATIC1111
84b6fcd02c add NV option for Random number generator source setting, which allows to generate same pictures on CPU/AMD/Mac as on NVidia videocards. 2023-08-03 00:00:23 +03:00
Aarni Koskela
b85fc7187d Fix MPS cache cleanup
Importing torch does not import torch.mps so the call failed.
2023-07-11 12:51:05 +03:00
AUTOMATIC1111
da8916f926 added torch.mps.empty_cache() to torch_gc()
changed a bunch of places that use torch.cuda.empty_cache() to use torch_gc() instead
2023-07-08 17:13:18 +03:00
Aarni Koskela
ba70a220e3 Remove a bunch of unused/vestigial code
As found by Vulture and some eyes
2023-06-05 22:43:57 +03:00
AUTOMATIC
8faac8b963 run basic torch calculation at startup in parallel to reduce the performance impact of first generation 2023-05-21 21:55:14 +03:00
AUTOMATIC
028d3f6425 ruff auto fixes 2023-05-10 11:05:02 +03:00
AUTOMATIC
5fe0dd79be rename CPU RNG to RNG source in settings, add infotext and parameters copypaste support to RNG source 2023-04-29 11:29:37 +03:00
Deciare
d40e44ade4 Option to use CPU for random number generation.
Makes a given manual seed generate the same images across different
platforms, independently of the GPU architecture in use.

Fixes #9613.
2023-04-18 23:27:46 -04:00
brkirch
1b8af15f13 Refactor Mac specific code to a separate file
Move most Mac related code to a separate file, don't even load it unless web UI is run under macOS.
2023-02-01 14:05:56 -05:00
brkirch
2217331cd1 Refactor MPS fixes to CondFunc 2023-02-01 06:36:22 -05:00
brkirch
7738c057ce MPS fix is still needed :(
Apparently I did not test with large enough images to trigger the bug with torch.narrow on MPS
2023-02-01 05:23:58 -05:00
AUTOMATIC1111
fecb990deb
Merge pull request #7309 from brkirch/fix-embeddings
Fix embeddings, upscalers, and refactor `--upcast-sampling`
2023-01-28 18:44:36 +03:00
brkirch
f9edd578e9 Remove MPS fix no longer needed for PyTorch
The torch.narrow fix was required for nightly PyTorch builds for a while to prevent a hard crash, but newer nightly builds don't have this issue.
2023-01-28 04:16:27 -05:00
brkirch
ada17dbd7c Refactor conditional casting, fix upscalers 2023-01-28 04:16:25 -05:00
AUTOMATIC
9beb794e0b clarify the option to disable NaN check. 2023-01-27 13:08:00 +03:00
AUTOMATIC
d2ac95fa7b remove the need to place configs near models 2023-01-27 11:28:12 +03:00
brkirch
e3b53fd295 Add UI setting for upcasting attention to float32
Adds "Upcast cross attention layer to float32" option in Stable Diffusion settings. This allows for generating images using SD 2.1 models without --no-half or xFormers.

In order to make upcasting cross attention layer optimizations possible it is necessary to indent several sections of code in sd_hijack_optimizations.py so that a context manager can be used to disable autocast. Also, even though Stable Diffusion (and Diffusers) only upcast q and k, unfortunately my findings were that most of the cross attention layer optimizations could not function unless v is upcast also.
2023-01-25 01:13:04 -05:00