Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: CUDA error: device-side assert triggered on criterion #43

Open
junylee11 opened this issue Jan 12, 2024 · 0 comments
Open

Comments

@junylee11
Copy link

I saw issues about this error. #28
But, I don't know how to solve this error..

I don't know how to write a code that skips the error.
Can you tell me the solution?

Error occured on this code
`

accelerator.print('Start training...')

running_loss = 0

for _, batch in enumerate(train_loader):        
    curr_steps += 1
    
    words, labels, phonemes, input_lengths, masked_indices = batch
    text_mask = length_to_mask(torch.Tensor(input_lengths))# .to(device)
    
    tokens_pred, words_pred = bert(phonemes, attention_mask=(~text_mask).int())
    
    loss_vocab = 0
    for _s2s_pred, _text_input, _text_length, _masked_indices in zip(words_pred, words, input_lengths, masked_indices):
        loss_vocab += criterion(_s2s_pred[:_text_length], _text_input[:_text_length]) # Here!!
    loss_vocab /= words.size(0)

`

C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Loss.cu:250: block: [0,0,0], thread: [7,0,0] Assertion t >= 0 && t < n_classes failed.
C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Loss.cu:250: block: [0,0,0], thread: [8,0,0] Assertion t >= 0 && t < n_classes failed.
C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Loss.cu:250: block: [0,0,0], thread: [9,0,0] Assertion t >= 0 && t < n_classes failed.
C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Loss.cu:250: block: [0,0,0], thread: [10,0,0] Assertion t >= 0 && t < n_classes failed.
C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Loss.cu:250: block: [0,0,0], thread: [11,0,0] Assertion t >= 0 && t < n_classes failed.
C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Loss.cu:250: block: [0,0,0], thread: [12,0,0] Assertion t >= 0 && t < n_classes failed.
C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Loss.cu:250: block: [0,0,0], thread: [13,0,0] Assertion t >= 0 && t < n_classes failed.
Traceback (most recent call last):
File "C:\Users\user_\Desktop\PL-BERT-KO\train_infer.py", line 198, in
notebook_launcher(train, args=(), num_processes=1)
File "C:\Users\user_\anaconda3\envs\PL-BERT-KO\lib\site-packages\accelerate\launchers.py", line 207, in notebook_launcher
function(*args)
File "C:\Users\user_\Desktop\PL-BERT-KO\train_infer.py", line 147, in train
loss_vocab += criterion(_s2s_pred[:_text_length], _text_input[:text_length])
File "C:\Users\user\anaconda3\envs\PL-BERT-KO\lib\site-packages\torch\nn\modules\module.py", line 1518, in wrapped_call_impl
return self.call_impl(*args, **kwargs)
File "C:\Users\user\anaconda3\envs\PL-BERT-KO\lib\site-packages\torch\nn\modules\module.py", line 1527, in call_impl
return forward_call(*args, **kwargs)
File "C:\Users\user\anaconda3\envs\PL-BERT-KO\lib\site-packages\torch\nn\modules\loss.py", line 1179, in forward
return F.cross_entropy(input, target, weight=self.weight,
File "C:\Users\user\anaconda3\envs\PL-BERT-KO\lib\site-packages\torch\nn\functional.py", line 3053, in cross_entropy
return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
RuntimeError: CUDA error: device-side assert triggered
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant