Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve previous result as context for next segment #1335

Merged
merged 1 commit into from
Sep 11, 2024

Conversation

vsd-vector
Copy link
Contributor

In online transducer recognizer during reset() sherpa-onnx preserves the output of the decoder network (decoder_output), but resets model context to sequence of blanks.

decoder_->UpdateDecoderOut(&s->GetResult());
Ort::Value decoder_out = std::move(s->GetResult().decoder_out);

auto r = decoder_->GetEmptyResult(); # initialized with empty hyp containing blanks
...
s->SetResult(r);
s->GetResult().decoder_out = std::move(decoder_out);

After the reset, at t0 of the new segment the beam-search-decoder reuses decoder_output.

if (t == 0) {
    UseCachedDecoderOut(hyps_row_splits, *result, &decoder_out);
}

Let's assume it outputs some token Z, because it's the most probable considering cached decoder output (calculated for previous context before reset - "X Y").
Then on step t1, the beam-search-decoder calculates new decoder_output using current reset context which now is " Z".

It may happen that Z is no longer most probable hypothesis and so the beam search switches to another path. User sees this as "Z" flickering and getting deleted. Sometimes this switch can also happen after outputting 2-3 and probably even more tokens. Besides user discomfort, deleted words frequently contain correct transcript (at least in my subjective experience).

This PR fixes this by using previous result tokens as "context" for next segment instead of caching the decoder output for one timestep.

@csukuangfj
Copy link
Collaborator

Thank you for your contribution!

@csukuangfj csukuangfj merged commit fa20ae1 into k2-fsa:master Sep 11, 2024
190 of 203 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants