Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Online determinization #1208

Open
francisr opened this issue Jun 9, 2023 · 7 comments
Open

Online determinization #1208

francisr opened this issue Jun 9, 2023 · 7 comments

Comments

@francisr
Copy link

francisr commented Jun 9, 2023

Is there a way to get with k2 something similar to the lattice incremental decoder that is in Kaldi?
The k2 OnlineDenseIntersecter returns partial lattices, which is good, but I haven't found a nice way to avoid having to determinize the whole lattice at every chunk.

@danpovey
Copy link
Collaborator

danpovey commented Jun 9, 2023 via email

@francisr
Copy link
Author

francisr commented Jun 9, 2023

I'm trying to spread the cost of a lattice rescore over time, so I don't need an exact lattice.
I was thinking to sample paths of the lattice, but I'm not convinced that it's going to work as well as I'd like when the segment gets longer.

@danpovey
Copy link
Collaborator

danpovey commented Jun 9, 2023 via email

@francisr
Copy link
Author

francisr commented Jun 9, 2023

Yes.

@danpovey
Copy link
Collaborator

My feeling is, an n-best / "beam-search" (in the RNN-T sense) type of algorithm might make sense, as long as you don't have a very nontrivial graph to decode with. Or could perhaps keep the graph on CPU and advance the state, if it is deterministic like a phone bigram (as used for LODR), or can be treated as deterministic if epsilons are treated as "failure" transitions to only be taken if there is no explicit instance of that symbol. We have been focusing more on RNN-T than on CTC which is why we haven't implemented this kind of thing, but given recent trends in ASR, and user requirements, I think it might be worthwhile for us to implement something like this ourselves. In the past becaus of the focus on k2, I think we wanted algorithms that are convenient for GPU, but I think

Someone mentioned to me at ICASSP that for long utterances, it's important to merge paths that differ "too long ago". One possibility might be to use a sorting criterion (for keeping n-best paths) that exaggerates the "delta" between a path and <>. E.g. let

modified_score[path] = min_{path2} score[path2] + f(common_suffix_length[path,path2]) (score[path] - score[path2])

where f(n) is some function like f(n) = 1 + 0.05 n.

@francisr
Copy link
Author

Unfortunately my decode graph is a traditional HMM + ngram, which means the non determinized lattice is quite big.

Someone mentioned to me at ICASSP that for long utterances, it's important to merge paths that differ "too long ago".

Makes sense, although wouldn't they naturally merge due to the markov property of the decode graph?

@danpovey
Copy link
Collaborator

I was thinking about situations like where there is an RNNLM or full-context (non-stateless) RNN-T, where there are full histories.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants