Question Clarification on streaming decoding for HLG #1242

kbramhendra · 2023-09-14T10:07:59Z

Hi,
I am using conformer ctc with HLG decoding for streaming. I have used the implementation mentioned in #1218 online_decode.py. While decoding longer calls I am getting higher latency. In this my question is the current_state_info object for OnlineIntersect contains all the history of previous chunks or is it only limited to few previous chunks. It seems like its taking all the previous history and I am getting OOM for longer calls like > 20 min. Do i have to implement end pointing or something similar for the Online Intersector or it automatically takes care ?...It seems it doesn't do that. Can you please shed some light on this. Thank you.

pkufool · 2023-09-15T02:29:13Z

Yes, both of our online decoding implemented on GPU (RNNT & Online CTC) will keep all the history of previous chunks. This suffers from higher latency and OOM for long utterances. I think you need an end pointer for long audios.

kbramhendra · 2023-09-15T04:02:08Z

Thank you for the answer, really helpful.

kbramhendra · 2023-11-21T12:02:33Z

Hi, in this online decoding how to keep only previous chunk history, i mean how to update the decoder states in such a way that it only keeps previous history ? can i use the pop function ? and how to find out the length and size of decoder states. sizeof, and len functions are returning constant 48 and 1 for all the time?

pkufool · 2023-11-22T02:55:40Z

@kbramhendra Which decoder states? Could you point me to the code? Sorry, I don't get your idea of "only keeps previous history", could you explain further, an example would be better, thanks!

kbramhendra · 2023-11-22T04:46:02Z

@pkufool apologies for lack of clarity. I am using conformer ctc with HLG decoding for streaming. I have used the implementation mentioned in #1218 online_decode.py (line no 175 to 179). From the earlier explanation I understood that current_state_infos object carries history of all the previous chunk history. Because of this i was getting OOM and latency increase for long calls. I have tried using end pointing, it worked for me.

In this I am trying to explore the effect of the previous chunks history. I am trying to keep only previous chunk history instead of all the previous chunk or before endpoint history. How to dynamically update the current_state_infos to only previous chunk history is my question? How to acheive this ?

The current_state_infos object had functions, in which pop and delitem were there. I tried using pop i am getting some errors. Is this the right way ?

pkufool · 2023-11-24T02:38:27Z

@kbramhendra Sorry for the late reply.

I have tried using end pointing, it worked for me.

So when you meet endpoint, you initialize a new state_info, right?

How to dynamically update the current_state_infos to only previous chunk history

To my knowledge, it is hard to keep only previous chunk history. The state_info is actually a RaggedTensor indexed with [frame][state] and [frame][state][arc], see

k2/k2/csrc/intersect_dense_pruned.h

Lines 118 to 138 in 45450bf

    
           // DecodeStateInfo contains the history decoding states for each sequence, this 
        
           // is normally constructed from `frames_` in MultiGraphDenseIntersectPruned 
        
           // by using `Stack` and `Unstack`. 
        
           struct DecodeStateInfo { 
        
             // States that survived for the previously decoded frames. Indexed 
        
             // [frame_idx][state_idx], state_idx just enumerates the active states 
        
             // on this frame (as state_idx01's in a_fsas_). 
        
             // 
        
             // Note: frame_idx may be larger than the real number of frames decoded, it 
        
             // may contain empty lists as this is normally the output of `Unstack`. 
        
             Ragged<intersect_pruned_internal::StateInfo> states;  // 2 axes: frame, state 
        
             // Indexed [frame_idx][state_idx][arc_idx].. the first 2 indexes are 
        
             // the same as those into 'states' (the first 2 levels of the structure 
        
             // are shared), and the last one enumerates the arcs leaving each of those 
        
             // states. 
        
             Ragged<intersect_pruned_internal::ArcInfo> arcs;  // 3 axes: frame, state, arc 
        
             // current search beam for this sequence 
        
             float beam; 
        
           };

To keep only previous chunk, you have to slice on frame dimension, that's doable, but I am afraid the FormarOutput (to generate lattice) requires the first state of state_info is the start state of the decoding graph. So such a slicing might cause a failure when generating the lattice.

From my point of view, the previous chunks history makes little difference for final result, the CTC system does not depend on the previous frames, I think you can initialize a new state_info when meetting an endpoint directlly.

One easy way to explore the effect of previous chunks is to initialize the state_info with state_info of previous segment (I mean the segments split by endpoint), by doing this way, you can keep several previous chunks history. I guess this won't raise any errors.

The current_state_infos object had functions, in which pop and delitem were there. I tried using pop i am getting some errors. Is this the right way ?

The current_state_infos is a List of state_info for current decoding streams, so pop and del is on sequence level not the frame level. If you want to pop and delete on frame level, you have to add some C++ code to do that. But I think it doesn't make sense to do so, see comments above.

kbramhendra · 2023-11-24T04:42:10Z

@pkufool Thank you for the detailed explanation. I highly appreciate it.

So when you meet endpoint, you initialize a new state_info, right?

Yes , I am initializing a new state.

Here my actual goal is to increase the number of streams that I can process. As of now i am only able to process 100 to 150 streams. I am guessing these current_state_infos could be a blocker to further increase.

From my point of view, the previous chunks history makes little difference for final result, the CTC system does not depend on the previous frames, I think you can initialize a new state_info when meetting an endpoint directlly.

Like you mention as ctc doesn't depend on previous history, i am trying to keep the history only previous one. I could update after every chunk but thats giving poor result.

Neverthless thanks for the reply. I will see what i can do to up the num of streams

kbramhendra closed this as completed Sep 15, 2023

kbramhendra reopened this Nov 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question Clarification on streaming decoding for HLG #1242

Question Clarification on streaming decoding for HLG #1242

kbramhendra commented Sep 14, 2023

pkufool commented Sep 15, 2023

kbramhendra commented Sep 15, 2023

kbramhendra commented Nov 21, 2023

pkufool commented Nov 22, 2023

kbramhendra commented Nov 22, 2023

pkufool commented Nov 24, 2023

kbramhendra commented Nov 24, 2023 •

edited

Loading

Question Clarification on streaming decoding for HLG #1242

Question Clarification on streaming decoding for HLG #1242

Comments

kbramhendra commented Sep 14, 2023

pkufool commented Sep 15, 2023

kbramhendra commented Sep 15, 2023

kbramhendra commented Nov 21, 2023

pkufool commented Nov 22, 2023

kbramhendra commented Nov 22, 2023

pkufool commented Nov 24, 2023

kbramhendra commented Nov 24, 2023 • edited Loading

kbramhendra commented Nov 24, 2023 •

edited

Loading