Out of memory with image smaller than the one you said you did without tiling #5

jonathancolledge · 2024-01-21T07:31:40Z

Hi,
I have a 3090 with 24 Gb VRAM and I tried a 1265 x 846 image and I got the below:
(Of note installation was a bit tricky I had to use the fixes for long file lengths as per the other issues, but also, it could not find a matching torch to install so I had to use: pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

Is there something else I did wrong?

Loading pipeline components...: 100%|████████████████████████████████████████████████████| 6/6 [00:01<00:00, 3.71it/s]
Resizing image to a square...
Determining background color...
Background color is... (255, 255, 255, 255)
Exporting image tile: image_0.png
0%| | 0/75 [00:14<?, ?it/s]
Traceback (most recent call last):
File "C:\Users\jonat\sd-x4-wui\sdupx4\lib\site-packages\gradio\routes.py", line 321, in run_predict
output = await app.blocks.process_api(
File "C:\Users\jonat\sd-x4-wui\sdupx4\lib\site-packages\gradio\blocks.py", line 1015, in process_api
result = await self.call_function(fn_index, inputs, iterator, request)
File "C:\Users\jonat\sd-x4-wui\sdupx4\lib\site-packages\gradio\blocks.py", line 856, in call_function
prediction = await anyio.to_thread.run_sync(
File "C:\Users\jonat\sd-x4-wui\sdupx4\lib\site-packages\anyio\to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "C:\Users\jonat\sd-x4-wui\sdupx4\lib\site-packages\anyio_backends_asyncio.py", line 2134, in run_sync_in_worker_thread
return await future
File "C:\Users\jonat\sd-x4-wui\sdupx4\lib\site-packages\anyio_backends_asyncio.py", line 851, in run
result = context.run(func, *args)
File "C:\Users\jonat\sd-x4-wui\gradio_gui.py", line 8, in upscale_image
output_image = upscaler.upscale_image(image, int(rows), int(cols),int(seed), prompt,negative_prompt,xformers_input,cpu_offload_input,attention_slicing_input,enable_custom_sliders,guidance,iterations)
File "C:\Users\jonat\sd-x4-wui\upscaler.py", line 86, in upscale_image
ups_tile = pipeline(prompt=prompt,negative_prompt=negative_prompt, image=x.convert("RGB"),generator=generator).images[0]
File "C:\Users\jonat\sd-x4-wui\sdupx4\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "C:\Users\jonat\sd-x4-wui\sdupx4\lib\site-packages\diffusers\pipelines\stable_diffusion\pipeline_stable_diffusion_upscale.py", line 775, in call
noise_pred = self.unet(
File "C:\Users\jonat\sd-x4-wui\sdupx4\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\jonat\sd-x4-wui\sdupx4\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\jonat\sd-x4-wui\sdupx4\lib\site-packages\accelerate\hooks.py", line 165, in new_forward
output = module._old_forward(*args, **kwargs)
File "C:\Users\jonat\sd-x4-wui\sdupx4\lib\site-packages\diffusers\models\unet_2d_condition.py", line 1177, in forward
sample = upsample_block(
File "C:\Users\jonat\sd-x4-wui\sdupx4\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\jonat\sd-x4-wui\sdupx4\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\jonat\sd-x4-wui\sdupx4\lib\site-packages\diffusers\models\unet_2d_blocks.py", line 2354, in forward
hidden_states = attn(
File "C:\Users\jonat\sd-x4-wui\sdupx4\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\jonat\sd-x4-wui\sdupx4\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\jonat\sd-x4-wui\sdupx4\lib\site-packages\diffusers\models\transformer_2d.py", line 392, in forward
hidden_states = block(
File "C:\Users\jonat\sd-x4-wui\sdupx4\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\jonat\sd-x4-wui\sdupx4\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\jonat\sd-x4-wui\sdupx4\lib\site-packages\diffusers\models\attention.py", line 393, in forward
ff_output = self.ff(norm_hidden_states, scale=lora_scale)
File "C:\Users\jonat\sd-x4-wui\sdupx4\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\jonat\sd-x4-wui\sdupx4\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\jonat\sd-x4-wui\sdupx4\lib\site-packages\diffusers\models\attention.py", line 665, in forward
hidden_states = module(hidden_states, scale)
File "C:\Users\jonat\sd-x4-wui\sdupx4\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\jonat\sd-x4-wui\sdupx4\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\jonat\sd-x4-wui\sdupx4\lib\site-packages\diffusers\models\activations.py", line 103, in forward
return hidden_states * self.gelu(gate)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 3.05 GiB. GPU 0 has a total capacty of 24.00 GiB of which 2.11 GiB is free. Of the allocated memory 20.28 GiB is allocated by PyTorch, and 46.82 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

jonathancolledge · 2024-01-22T11:14:08Z

I can't get it to install according to the instructions so I fiddled about. This is my latest install with conda where I am hoping everything installed ok and I get all the optimisations. Currently it is running on a 1024 x 684 image but I think it will run out of memory when it comes to the last step - saving the image. It is running along between 5 Gb and 18 Gb of VRAM in use:

git clone https://github.com/Subarasheese/sd-x4-wui

cd sd-x4-wui

conda create -n sdup python=3.10

git config --system core.longpaths true

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

pip install https://huggingface.co/r4ziel/xformers_pre_built/resolve/main/triton-2.0.0-cp310-cp310-win_amd64.whl

pip3 install -U xformers --index-url https://download.pytorch.org/whl/cu121

pip3 install accelerate

I edited requirements.txt to only have the following:

Pillow == 9.4.0
diffusers
gradio == 3.15.0
split_image == 2.0.1
transformers

Then I ran with

python gradio_gui.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Out of memory with image smaller than the one you said you did without tiling #5

Out of memory with image smaller than the one you said you did without tiling #5

jonathancolledge commented Jan 21, 2024

jonathancolledge commented Jan 22, 2024

Out of memory with image smaller than the one you said you did without tiling #5

Out of memory with image smaller than the one you said you did without tiling #5

Comments

jonathancolledge commented Jan 21, 2024

jonathancolledge commented Jan 22, 2024