Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question Clarification on streaming decoding for HLG #1242

Open
kbramhendra opened this issue Sep 14, 2023 · 7 comments
Open

Question Clarification on streaming decoding for HLG #1242

kbramhendra opened this issue Sep 14, 2023 · 7 comments

Comments

@kbramhendra
Copy link

Hi,
I am using conformer ctc with HLG decoding for streaming. I have used the implementation mentioned in #1218 online_decode.py. While decoding longer calls I am getting higher latency. In this my question is the current_state_info object for OnlineIntersect contains all the history of previous chunks or is it only limited to few previous chunks. It seems like its taking all the previous history and I am getting OOM for longer calls like > 20 min. Do i have to implement end pointing or something similar for the Online Intersector or it automatically takes care ?...It seems it doesn't do that. Can you please shed some light on this. Thank you.

@pkufool
Copy link
Collaborator

pkufool commented Sep 15, 2023

Yes, both of our online decoding implemented on GPU (RNNT & Online CTC) will keep all the history of previous chunks. This suffers from higher latency and OOM for long utterances. I think you need an end pointer for long audios.

@kbramhendra
Copy link
Author

Thank you for the answer, really helpful.

@kbramhendra
Copy link
Author

Hi, in this online decoding how to keep only previous chunk history, i mean how to update the decoder states in such a way that it only keeps previous history ? can i use the pop function ? and how to find out the length and size of decoder states. sizeof, and len functions are returning constant 48 and 1 for all the time?

@pkufool
Copy link
Collaborator

pkufool commented Nov 22, 2023

@kbramhendra Which decoder states? Could you point me to the code? Sorry, I don't get your idea of "only keeps previous history", could you explain further, an example would be better, thanks!

@kbramhendra
Copy link
Author

@pkufool apologies for lack of clarity. I am using conformer ctc with HLG decoding for streaming. I have used the implementation mentioned in #1218 online_decode.py (line no 175 to 179). From the earlier explanation I understood that current_state_infos object carries history of all the previous chunk history. Because of this i was getting OOM and latency increase for long calls. I have tried using end pointing, it worked for me.

In this I am trying to explore the effect of the previous chunks history. I am trying to keep only previous chunk history instead of all the previous chunk or before endpoint history. How to dynamically update the current_state_infos to only previous chunk history is my question? How to acheive this ?

The current_state_infos object had functions, in which pop and delitem were there. I tried using pop i am getting some errors. Is this the right way ?

@pkufool
Copy link
Collaborator

pkufool commented Nov 24, 2023

@kbramhendra Sorry for the late reply.

I have tried using end pointing, it worked for me.

So when you meet endpoint, you initialize a new state_info, right?

How to dynamically update the current_state_infos to only previous chunk history

To my knowledge, it is hard to keep only previous chunk history. The state_info is actually a RaggedTensor indexed with [frame][state] and [frame][state][arc], see

// DecodeStateInfo contains the history decoding states for each sequence, this
// is normally constructed from `frames_` in MultiGraphDenseIntersectPruned
// by using `Stack` and `Unstack`.
struct DecodeStateInfo {
// States that survived for the previously decoded frames. Indexed
// [frame_idx][state_idx], state_idx just enumerates the active states
// on this frame (as state_idx01's in a_fsas_).
//
// Note: frame_idx may be larger than the real number of frames decoded, it
// may contain empty lists as this is normally the output of `Unstack`.
Ragged<intersect_pruned_internal::StateInfo> states; // 2 axes: frame, state
// Indexed [frame_idx][state_idx][arc_idx].. the first 2 indexes are
// the same as those into 'states' (the first 2 levels of the structure
// are shared), and the last one enumerates the arcs leaving each of those
// states.
Ragged<intersect_pruned_internal::ArcInfo> arcs; // 3 axes: frame, state, arc
// current search beam for this sequence
float beam;
};

To keep only previous chunk, you have to slice on frame dimension, that's doable, but I am afraid the FormarOutput (to generate lattice) requires the first state of state_info is the start state of the decoding graph. So such a slicing might cause a failure when generating the lattice.

From my point of view, the previous chunks history makes little difference for final result, the CTC system does not depend on the previous frames, I think you can initialize a new state_info when meetting an endpoint directlly.

One easy way to explore the effect of previous chunks is to initialize the state_info with state_info of previous segment (I mean the segments split by endpoint), by doing this way, you can keep several previous chunks history. I guess this won't raise any errors.

The current_state_infos object had functions, in which pop and delitem were there. I tried using pop i am getting some errors. Is this the right way ?

The current_state_infos is a List of state_info for current decoding streams, so pop and del is on sequence level not the frame level. If you want to pop and delete on frame level, you have to add some C++ code to do that. But I think it doesn't make sense to do so, see comments above.

@kbramhendra
Copy link
Author

kbramhendra commented Nov 24, 2023

@pkufool Thank you for the detailed explanation. I highly appreciate it.

So when you meet endpoint, you initialize a new state_info, right?

Yes , I am initializing a new state.

Here my actual goal is to increase the number of streams that I can process. As of now i am only able to process 100 to 150 streams. I am guessing these current_state_infos could be a blocker to further increase.

From my point of view, the previous chunks history makes little difference for final result, the CTC system does not depend on the previous frames, I think you can initialize a new state_info when meetting an endpoint directlly.

Like you mention as ctc doesn't depend on previous history, i am trying to keep the history only previous one. I could update after every chunk but thats giving poor result.

Neverthless thanks for the reply. I will see what i can do to up the num of streams

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants