Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Out of memory? #19

Open
ulatekh opened this issue Apr 28, 2020 · 0 comments
Open

Out of memory? #19

ulatekh opened this issue Apr 28, 2020 · 0 comments

Comments

@ulatekh
Copy link

ulatekh commented Apr 28, 2020

I've been learning CUDA and pytorch just so that I could run this project. (Doing so has been something of a trial by fire.)

I built my own pytorch from the repo's v0.4.0 tag, and have it running (partially) on two machines, both running Fedora Core 30: one with a Quadro P2000 with 4 GB of main memory, 5 GB of video memory, using SM 6.0, CUDA 9.1, and gcc 5.1.1, and another machine with an RTX 2060 with 32 GB of main memory, 6 GB of video memory, using SM 6.0/7.0, CUDA 9.2 (10.1 had terrible build problems with pytorch 0.4.0), and gcc 6.2.1.

Both machines can run the data/bag.avi test, but when I try to run the data/Human6 test, once it gets to the inpainting part, the RTX 2060 machine gets this:

THCudaCheck FAIL file=$(PYTORCH)/aten/src/THC/generic/THCStorage.cu line=58 error=2 : out of memory
Traceback (most recent call last):
File "demo.py", line 15, in
inpaint(args)
File "$(VOM)/inpaint.py", line 96, in inpaint
inputs = to_var(inputs)
File "$(VOM)/inpainting/utils.py", line 170, in to_var
x = x.cuda()
RuntimeError: cuda runtime error (2) : out of memory at $(PYTORCH)/aten/src/THC/generic/THCStorage.cu:58

The Quadro P2000 machine fails the inpainting part of the data/Human6 test with:

ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
Traceback (most recent call last):
File "demo.py", line 15, in
inpaint(args)
File "$(VOM)/inpaint.py", line 74, in inpaint
for seq, (inputs, masks, info) in enumerate(DTloader):
File "$(PYTORCH)/utils/data/dataloader.py", line 280, in next
idx, batch = self._get_batch()
File "$(PYTORCH)/utils/data/dataloader.py", line 259, in _get_batch
return self.data_queue.get()
File "/usr/lib64/python3.7/multiprocessing/queues.py", line 352, in get
res = self._reader.recv_bytes()
File "/usr/lib64/python3.7/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/usr/lib64/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/usr/lib64/python3.7/multiprocessing/connection.py", line 379, in _recv
chunk = read(handle, remaining)
File "$(PYTORCH)/utils/data/dataloader.py", line 178, in handler
_error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid

Insufficient RAM, I guess? Any insight into what part is so memory-intensive, and what could be done about it?

Assuming these problems are surmountable...do you know if the algorithm is amenable to removing something that doesn't appear in the first frame, and that fades in/out? My first intended project is to remove the credit text from this video.

Thank you for any insights into these issues!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant