Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nbest online decoding bug #1276

Open
binhtranmcs opened this issue Apr 10, 2024 · 15 comments · Fixed by #1280
Open

Nbest online decoding bug #1276

binhtranmcs opened this issue Apr 10, 2024 · 15 comments · Fixed by #1280

Comments

@binhtranmcs
Copy link

binhtranmcs commented Apr 10, 2024

I tried online decoding using the file online_decode.cu and modified a bit to get nbest from the lattice. But I get errors. The assertion is in fsa_utils.cu line 2754.

The offline decoding seems fine. So I suspect there is something wrong with the online implementation. Also, it does not happen to all audios tested. Please help me with this.

Model: librispeech conformer ctc
Code: online_decode.txt
Log debug: error.log

Thanks!

@binhtranmcs binhtranmcs changed the title Nbest online decoding error Nbest online decoding bug Apr 10, 2024
@csukuangfj
Copy link
Collaborator

@pkufool Could you have a look?

@trangtv57
Copy link

i have same issuse, pls, fast check this error :(

@cudothanh-Nhan
Copy link

Same with me :(

@pkufool
Copy link
Collaborator

pkufool commented Apr 11, 2024

OK, I will have a look soon.

@trangtv57
Copy link

sorry but any update @pkufool pk?

@pkufool
Copy link
Collaborator

pkufool commented Apr 22, 2024

@trangtv57 Sorry for the late responds. I reproduced your issue, the error seems happen on the invert of the generated lattice. There is a quick fix at #1280 , it works fine on the given conformer-ctc model above. Pls help to do more tests, thanks!

@trangtv57
Copy link

tks @pkufool, i will check it then feedback you soon.

@binhtranmcs
Copy link
Author

tks @pkufool, I just tested again with the model above and it seems fine. But when decoding with my own model, there is still error, the assertion in fsa_utils.cu line 2756. Please have a further look.

@pkufool
Copy link
Collaborator

pkufool commented Apr 22, 2024

tks @pkufool, I just tested again with the model above and it seems fine. But when decoding with my own model, there is still error, the assertion in fsa_utils.cu line 2756. Please have a further look.

If it is line 2756 in fsa_utils.cu , I think it is the same issue. Could you make sure that (for example, testing with more cases) it works fine with our model? It will be easier for me to debug if I have the model that can reproduce it. Thank you!

@binhtranmcs
Copy link
Author

binhtranmcs commented Apr 22, 2024

@pkufool, I tested again with the audio below (change extension to .wav), and got a different the error:

[F] /home/cpu13266/binhtt4/clone/k2/k2/csrc/array.h:385:T k2::Array1<T>::operator[](int32_t) const [with T = int; int32_t = int] Check failed: ret == cudaSuccess (700 vs. 0)  Error: an illegal memory access was encountered. 

You can try debugging with it, same model as above.

61-70970-0031.txt

@danpovey
Copy link
Collaborator

I merged Kangwei's fix as it seems to be a straightforward fix of an undefined value, if there are further bugs we can fix separately.

@pkufool pkufool reopened this Apr 24, 2024
@pkufool
Copy link
Collaborator

pkufool commented Apr 24, 2024

@binhtranmcs Here is anther fix #1282 , I think previous fix #1280 introduced the bug.
The code runs normally in all my test cases now, pls do more tests, thanks!

@binhtranmcs
Copy link
Author

@pkufool, I tested with the librispeech dataset and it ran smoothly. But there is still error when tested with the audio below (change ext to .wav):

[F] /home/cpu13266/binhtt4/clone/k2/k2/csrc/top_sort.cu:324:k2::FsaVec k2::TopSorter::TopSort(k2::Array1<int>*) Check failed: start_state_present[0] == 1 (0 vs. 1) Our current implementation requires that the start state in each Fsa must be present in the first batch

nbest-err.txt

@pkufool
Copy link
Collaborator

pkufool commented Apr 26, 2024

@binhtranmcs Can you paste your stack backtrace here, I can't reproduce your error.

@binhtranmcs
Copy link
Author

@pkufool fyi

__GI_raise 0x00007fff9700400b
k2::internal::Logger::~Logger log.h:203
k2::TopSorter::TopSort top_sort.cu:324
k2::TopSort top_sort.cu:371
k2::TopSort fsa_algo.cu:141
k2::Nbest::Intersect nbest.cu:77
main online_decode.cu:336

The log:

[F] /home/cpu13266/binhtt4/clone/k2/k2/csrc/top_sort.cu:324:k2::FsaVec k2::TopSorter::TopSort(k2::Array1<int>*) Check failed: start_state_present[0] == 1 (0 vs. 1) Our current implementation requires that the start state in each Fsa must be present in the first batch


[ Stack-Trace: ]
/home/cpu13266/binhtt4/clone/k2/cmake-build-debug/lib/libk2_log.so(k2::internal::GetStackTrace()+0x5f) [0x7ffff641f34a]
/home/cpu13266/binhtt4/clone/k2/cmake-build-debug/bin/online_decode(k2::internal::Logger::~Logger()+0x48) [0x5555555a829e]
/home/cpu13266/binhtt4/clone/k2/cmake-build-debug/lib/libk2context.so(k2::TopSorter::TopSort(k2::Array1<int>*)+0x3cc) [0x7ffff6a3fad4]
/home/cpu13266/binhtt4/clone/k2/cmake-build-debug/lib/libk2context.so(k2::TopSort(k2::Ragged<k2::Arc>&, k2::Ragged<k2::Arc>*, k2::Array1<int>*)+0x3a7) [0x7ffff6a3778a]
/home/cpu13266/binhtt4/clone/k2/cmake-build-debug/lib/libk2_torch.so(k2::TopSort(k2::FsaClass*)+0x58) [0x7ffff7896a62]
/home/cpu13266/binhtt4/clone/k2/cmake-build-debug/lib/libk2_torch.so(k2::Nbest::Intersect(k2::FsaClass*)+0x41d) [0x7ffff78b6cef]
/home/cpu13266/binhtt4/clone/k2/cmake-build-debug/bin/online_decode(+0x4fba2) [0x5555555a3ba2]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7fff96fe5083]
/home/cpu13266/binhtt4/clone/k2/cmake-build-debug/bin/online_decode(+0x4d79e) [0x5555555a179e]

terminate called after throwing an instance of 'std::runtime_error'
  what():  
    Some bad things happened. Please read the above error messages and stack
    trace. If you are using Python, the following command may be helpful:

      gdb --args python /path/to/your/code.py

    (You can use `gdb` to debug the code. Please consider compiling
    a debug version of k2.).

    If you are unable to fix it, please open an issue at:

      https://github.com/k2-fsa/k2/issues/new
    
Signal: SIGABRT (Aborted)

Also the code I use: online_decode.cu.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants