Commit Graph

7779 Commits

Author SHA1 Message Date
Won-Kyu Park
310d0e6938
restore org_dtype != compute dtype case 2024-11-01 12:36:15 +09:00
Won-Kyu Park
b783a967c0
fix for lazy backup 2024-11-01 12:36:14 +09:00
Won-Kyu Park
412401becb
backup only for needed weights required by lora 2024-11-01 12:36:13 +09:00
Won-Kyu Park
2a1988fa67
call gc.collect() when wanted_names == () 2024-11-01 12:36:13 +09:00
Won-Kyu Park
04f9084253
extract backup/restore io-bound operations out of forward hooks to speed up 2024-11-01 12:36:12 +09:00
Won-Kyu Park
0ab4d7992c
reduce backup_weight size for float8 freeze model 2024-11-01 12:36:12 +09:00
Won-Kyu Park
1d3dae1471
task manager added
based on https://github.com/lllyasviel/stable-diffusion-webui-forge/blob/main/modules_forge/main_thread.py

 * classified
 * this way, gc.collect() will work as intended.
2024-11-01 12:36:09 +09:00
Won-Kyu Park
98cb284eb1
flux: clean up some dead code 2024-11-01 12:35:37 +09:00
Won-Kyu Park
4ad5f22c7b
do not use assing=True for nn.LayerNorm 2024-11-01 12:35:36 +09:00
Won-Kyu Park
5f3314ec43
do not use copy option for nn.Embedding 2024-11-01 12:35:36 +09:00
Won-Kyu Park
ba499f92ac
use shared.opts.lora_without_backup_weight option in the devices.autocast()
* add nn.Embedding in the devices.autocast()
 * do not cast forward args for some cases
 * add copy option in the devices.autocast()
2024-11-01 12:35:36 +09:00
Won-Kyu Park
03516f48f0
use isinstance() 2024-11-01 12:35:35 +09:00
Won-Kyu Park
2e6533519b
fix some nn.Embedding to set dtype=float32 for some float8 freeze model 2024-11-01 12:35:35 +09:00
Won-Kyu Park
8c9c139c65
support Flux schnell and cleanup 2024-11-01 12:35:34 +09:00
Won-Kyu Park
11c9bc719c
make Sd3T5 shared.opts.sd3_enable_t5 independent 2024-11-01 12:35:34 +09:00
Won-Kyu Park
30d0f950b7
fixed ai-toolkit flux lora support
* fixed some mistake
 * some ai-toolkit's lora do not have proj_mlp
2024-11-01 12:35:33 +09:00
Won-Kyu Park
4bea93bc06
fixed typo in the flux lora map 2024-11-01 12:35:33 +09:00
Won-Kyu Park
28eca46959
fix flux to use float8 t5xxl 2024-11-01 12:35:32 +09:00
Won-Kyu Park
f569f6eb1e
use text_encoders.t5xxl.transformer.shared.weight tokens weights
* some T5XXL do not have encoder.embed_tokens.weight. use shared.weight embed_tokens instead.
 * use float8 text encoder t5xxl_fp8_e4m3fn.safetensors
2024-11-01 12:35:29 +09:00
Won-Kyu Park
71b430f703
call torch_gc() to fix VRAM usage spike when call decode_first_stage() 2024-11-01 12:34:44 +09:00
Won-Kyu Park
380e9a84c3
call lowvram.send_everything_to_cpu() for interrupted case 2024-11-01 12:34:42 +09:00
Won-Kyu Park
1f779226f0
check lora_unet prefix to support Black Forest Labs's lora 2024-11-01 12:34:40 +09:00
Won-Kyu Park
6675d1f090
use assign=True for some cases 2024-11-01 12:34:38 +09:00
Won-Kyu Park
eee7294200
add fix_unet_prefix() to support unet only checkpoints 2024-11-01 12:34:36 +09:00
Won-Kyu Park
1318f6118e
fix load_vae() to check size mismatch 2024-11-01 12:34:34 +09:00
Won-Kyu Park
1e73a28707
fix for float8_e5m2 freeze model 2024-11-01 12:34:32 +09:00
Won-Kyu Park
2ffdf01e05
fix position_ids 2024-11-01 12:34:30 +09:00
Won-Kyu Park
3b18b6f482
revert to use without_autocast() 2024-11-01 12:34:28 +09:00
Won-Kyu Park
3cdc26af30
fix lora without backup 2024-11-01 12:34:26 +09:00
Won-Kyu Park
9617f15fd9
pytest with --precision full --no-half 2024-11-01 12:34:24 +09:00
Won-Kyu Park
44a8480f0c
minor update
* use dtype_inference
2024-11-01 12:34:22 +09:00
Won-Kyu Park
789bfc7db4
add cheap approximation for flux 2024-11-01 12:34:20 +09:00
Won-Kyu Park
9e57c722b2
fix to support float8_* 2024-11-01 12:34:18 +09:00
Won-Kyu Park
219a0e2429
support Flux1 2024-11-01 12:34:16 +09:00
Won-Kyu Park
9c0fd83b5e
vae fix for flux 2024-11-01 12:34:14 +09:00
Won-Kyu Park
7e2d51965f
fix for t5xxl 2024-11-01 12:34:12 +09:00
Won-Kyu Park
51c285265f
fix for Lora flux 2024-11-01 12:34:10 +09:00
Won-Kyu Park
d6a609a539
add diffusers weight mapping for flux lora
* add QkvLinear class for Flux lora
2024-11-01 12:34:08 +09:00
Won-Kyu Park
477ff35517
preserve detected dtype_inference 2024-11-01 12:34:06 +09:00
Won-Kyu Park
24f2c1b9e4
fix to support dtype_inference != dtype case 2024-11-01 12:34:04 +09:00
Won-Kyu Park
2f72fd89ff
support copy option to reduce ram usage 2024-11-01 12:34:02 +09:00
Won-Kyu Park
2060886450
add shared.opts.lora_without_backup_weight option to reduce ram usage 2024-11-01 12:34:00 +09:00
Won-Kyu Park
537d9dd71c
misc fixes to support float8 dtype_unet
* devices.dtype_unet, dtype_vae could be considered as storage dtypes (current_dtype)
 * use devices.dtype_inference as computational dtype (taget_dtype)
 * misc fixes to support float8 unet storage
2024-11-01 12:33:58 +09:00
Won-Kyu Park
fcd609f4b4
simplified get_loadable_dtype 2024-11-01 12:33:56 +09:00
Won-Kyu Park
c972951cf6
check Unet/VAE and load as is
- check float8 unet dtype to save memory
 - check vae/ text_encoders dtype and use as intended
2024-11-01 12:33:54 +09:00
Won-Kyu Park
39328bd7db
fix misc
* check supported dtypes
 * detect non_blocking
 * update autocast() to use non_blocking, target_device and current_dtype
2024-11-01 12:33:52 +09:00
Won-Kyu Park
821e76a415
use empty_like for speed 2024-11-01 12:33:50 +09:00
Won-Kyu Park
2d1db1a2d0
fix for flux 2024-11-01 12:33:48 +09:00
Won-Kyu Park
d38732efae
add flux model wrapper 2024-11-01 12:33:46 +09:00
Won-Kyu Park
853551bd6e
import Flux from https://github.com/black-forest-labs/flux/
License: Apache 2.0
original author: Tim Dockhorn @timudk
2024-11-01 12:33:44 +09:00