diff --git a/Optimizations.md b/Optimizations.md
index 6c8b332..6594423 100644
--- a/Optimizations.md
+++ b/Optimizations.md
@@ -18,4 +18,32 @@ A number of optimization can be enabled by [commandline arguments](Run-with-Cust
 Extra tips (Windows): 
 - https://github.com/AUTOMATIC1111/stable-diffusion-webui/discussions/3889 Disable Hardware GPU scheduling.
 - disable browser hardware acceleration
-- Go in nvidia control panel, 3d parameters, and change power profile to "maximum performance"
\ No newline at end of file
+- Go in nvidia control panel, 3d parameters, and change power profile to "maximum performance"
+
+## Memory & Performance Impact of Optimizers and Flags
+
+*This is an example test using specific hardware and configuration, your mileage may vary*  
+*Tested using nVidia RTX3060 and CUDA 11.7*
+
+| Cross-attention | Peak Memory at Batch size 1/2/4/8/16 | Initial It/s | Peak It/s | Note |
+| --------------- | ------------------------------------ | -------- | --------- | ---- |
+| None            | 4.1 / 6.2 / OOM / OOM / OOM | 4.2 | 4.6 | slow and early out-of-memory
+| v1              | 2.8 / 2.8 / 2.8 / 3.1 / 4.1 | 4.1 | 4.7 | slow but lowest memory usage and does not require sometimes problematic xformers
+| InvokeAI        | 3.1 / 4.2 / 6.3 / 6.6 / 7.0 | 5.5 | 6.6 | almost identical to default optimizer
+| Doggetx         | 3.1 / 4.2 / 6.3 / 6.6 / 7.1 | 5.4 | 6.6 | default |
+| Doggetx         | 2.2 / 2.7 / 3.8 / 5.9 / 6.2 | 4.1 | 6.3 | using `medvram` preset result in decent memory savings without huge performance hit
+| Doggetx         | 0.9 / 1.1 / 2.2 / 4.3 / 6.4 | 1.0 | 6.3 | using `lowvram` preset is extremely slow due to constant swapping
+| Xformers        | 2.8 / 2.8 / 2.8 / 3.1 / 4.1 | 6.5 | 7.5 | fastest and low memory
+| Xformers        | 2.9 / 2.9 / 2.9 / 3.6 / 4.1 | 6.4 | 7.6 | with `cuda_alloc_conf` and `opt-channelslast`
+
+Notes:
+- Performance at batch-size 1 is around **~70%** of peak performance  
+- Peak performance is typically around batch size 8  
+  After that it grows by few percent if you have extra VRAM before it starts to drop due to GC kicking in  
+- Performance with `lowvram` preset is very low below batch size 8 and by then memory savings are not that big  
+
+Other possible optimizations:
+- `PYTORCH_CUDA_ALLOC_CONF=garbage_collection_threshold:0.9,max_split_size_mb:512`  
+  No performance impact and increases initial memory footprint a bit but reduces memory fragmentation in long runs  
+- `opt-channelslast`  
+  Hit-and-miss: seems like additional slight performance increase with higher batch sizes and slower with small sizes, but differences are within margin-of-error