Adding ILM beam search and decoding #1291

AmirHussein96 · 2023-10-05T01:46:08Z

This is a Librispeech zipformer recipe using HAT loss from k2-fsa/k2#1244. The recipe includes HAT training, greedy decoding, modified beam search decoding, and subtracting ILM with RNN-LM shallow fusion.

So far, @desh2608 and I have tested this on Librispeech, and the results are similar to regular RNN-LM shallow fusion. However, the intended use of this is adaptation to a new domain with an external RNN-LM trained on that domain.

Model	Train	Decode	LM scale	ILM scale	test-clean	test-other
Zipformer-HAT	train-960	greedy_search	-	-	2.22	5.01
		modified_beam_search	0	0	2.18	4.96
		+ RNNLM shallow fusion	0.29	0	1.96	4.55
		- ILME	0.29	0.1	1.95	4.55
		- ILME	0.29	0.3	1.97	4.5

desh2608 · 2023-10-05T02:19:55Z

@AmirHussein96 if you have some time, you can try out the experiment suggested by @marcoyang1998: #1271 (comment).

@marcoyang1998 do you have a RNNLM trained on GigaSpeech?

marcoyang1998 · 2023-10-05T15:04:41Z

I believe @yfyeung has an RNNLM trained on GigaSpeech. @yfyeung Would you mind sharing one, maybe you can upload it to huggingface?

yfyeung · 2023-10-08T07:28:48Z

Yeah, I have RNNLM trained on GigaSpeech but not in icefall style.

https://huggingface.co/yfyeung/icefall-asr-gigaspeech-rnn_lm-2023-10-08

yfyeung · 2023-10-09T08:43:34Z

@AmirHussein96 I note that you modified k2.rnnt_loss_pruned in k2. Would you mind sharing your branch?

desh2608 · 2023-10-09T15:47:11Z

@AmirHussein96 I note that you modified k2.rnnt_loss_pruned in k2. Would you mind sharing your branch?

check this: k2-fsa/k2#1244

AmirHussein96 · 2023-10-10T01:54:33Z

I conducted benchmarking on the following scenario:
Zipformer was initially trained on LibriSpeech and then adapted to Gigaspeech using text only. For the adaptation process, I utilized the Gigaspeech transcripts corresponding to the 1000h, M subset, to train the RNN-LM. Below, you'll find a comparison of various methods: RNN-LM Shallow Fusion (SF), RNN-LM LODR Bigram, and RNN-LM Shallow Fusion integrated with our ILME implementation.

	LM scale	ILM / LODR scale	giga dev	giga test
modified_beam_search (baseline)	0	0	20.81	19.95
+RNNLM SF	0.1	0	20.3	19.55
+ RNNLM SF	0.29	0	19.88	19.21
+ RNNLM SF	0.45	0	20.1	19.46
+ RNNLM SF LODR(bigram)	0.45	0.16	20.42	19.6
+ RNNLM SF - ILME	0.29	0.1	19.7	18.96
+ RNNLM SF - ILME	0.45	0.1	19.54	18.89
+ RNNLM SF - ILME	0.29	0.2	19.84	18.99

Choice of ILM/LODR and RNNLM weights:
ILM:[0.05 0.2] with step of 0.05
LODR:[0.02 0.45] with step of 0.05
RNNLM: [0.05 0.45] with step of 0.05

The configuration for the RNNLM and the training command is as following:

./rnn_lm/train.py \
    --world-size 4 \
    --exp-dir ./rnn_lm/exp \
    --start-epoch 0 \
    --num-epochs 30 \
    --start-epoch 19 \
    --use-fp16 0 \
    --tie-weights 1 \
    --embedding-dim 512 \
    --hidden-dim 512 \
    --num-layers 2 \
    --batch-size 300 \
    --lr 0.0001 \
    --lm-data data/lm_training_bpe_500/sorted_lm_data.pt \
    --lm-data-valid data/lm_training_bpe_500/sorted_lm_data-valid.pt

RNNLM results on dev: total nll: 776663.5668945312, num tokens: 261759, num sentences: 5715, ppl: 19.435
RNNLM results on test: total nll: 2401851.5998535156, num tokens: 805072, num sentences: 19930, ppl: 19.755

marcoyang1998 · 2023-10-10T06:07:32Z

@AmirHussein96 I noticed that you are using a positive scale for LODR, this should be negative. You can check the code here:

icefall/egs/librispeech/ASR/pruned_transducer_stateless2/beam_search.py

Lines 2629 to 2634 in 9af144c

    
           hyp_log_prob += ( 
        
               lm_score[new_token] * lm_scale 
        
               + LODR_lm_scale * current_ngram_score 
        
               + context_score 
        
           )  # add the lm score

Would you mind re-running the decoding experiment with LODR, thanks!

AmirHussein96 · 2023-10-10T12:34:44Z

icefall/egs/librispeech/ASR/pruned_transducer_stateless2/beam_search.py

@marcoyang1998 I used the implementation of modified_beam_search_lm_rescore_LODR()below which uses negative weight for LODR

icefall/egs/librispeech/ASR/pruned_transducer_stateless2/beam_search.py

Line 1563 in 9af144c

am_scores.values / lm_scale + lm_scores - LODR_scores * lodr_scale

AmirHussein96 · 2023-10-10T14:41:44Z

@marcoyang1998 I tried the modified_beam_search_LODR with LODR_scale=-.24 from https://k2-fsa.github.io/icefall/decoding-with-langugage-models/LODR.html and also LODR_scale=-.16 from my best modified_beam_search_lm_rescore_LODR() results.

	beam	LM scale	ILM / LODR scale	giga dev	giga test
modified_beam_search (baseline)	4	0	0	20.81	19.95
+ RNNLM SF	4	0.1	0	20.3	19.55
+ RNNLM SF	4	0.29	0	19.88	19.21
+ RNNLM SF	4	0.45	0	20.1	19.46
+ RNNLM SF	12	0.29	0	19.77	19.01

+ RNNLM lm_rescore_LODR (bigram)	4	0.45	0.16	20.42	19.6
+ RNNLM LODR (bigram)	4	0.45	-0.24	19.38	18.71
+ RNNLM LODR (bigram)	4	0.45	-0.16	19.47	18.85
+ RNNLM LODR (bigram)	12	0.45	-0.24	19.1	18.44

+ RNNLM SF - ILME	4	0.29	0.1	19.7	18.96
+ RNNLM SF - ILME	4	0.45	0.1	19.54	18.89
+ RNNLM SF - ILME	4	0.29	0.2	19.84	18.99
+ RNNLM SF - ILME	12	0.45	0.1	19.21	18.57

The LODR results now are much better so I think modified_beam_search_lm_rescore_LODR() should be removed from beam_search.py.

The decoding command is below

for method in modified_beam_search_LODR; do
  ./zipformer_hat/decode.py \
  --epoch 40 --avg 16 --use-averaged-model True \
  --beam-size 4 \
  --exp-dir ./zipformer_hat/exp \
  --bpe-model data/lang_bpe_500/bpe.model \
  --max-contexts 4 \
  --max-states 8 \
  --max-duration 800 \
  --decoding-method $method \
  --use-shallow-fusion 1 \
  --lm-type rnn \
  --lm-exp-dir rnn_lm/exp \
  --lm-epoch 25 \
  --lm-scale 0.45 \
  --lm-avg 5 \
  --lm-vocab-size 500 \
  --rnn-lm-embedding-dim 512 \
  --rnn-lm-hidden-dim 512 \
  --rnn-lm-num-layers 2 \
  --tokens-ngram 2 \
  --ngram-lm-scale $LODR_scale
done

marcoyang1998 · 2023-10-10T14:54:16Z

The LODR results now are much better so I think modified_beam_search_lm_rescore_LODR() should be removed from beam_search.py

Please have a look at #1017 and https://icefall.readthedocs.io/en/latest/decoding-with-langugage-models/index.html for a comparison between different decoding methods with language models.

Another important comment is that the current ILME implementation is Shallow Fusion so it can be used in streaming but LODR is a language model rescoring.

LODR works in both shallow fusion and rescoring. modified_beam_search_LODR is the shallow fusion type LODR and modified_beam_search_lm_rescore_LODR is the rescoring type. You usually need to set a large --beam-size to achieve good results with rescoring-type methods (see https://icefall.readthedocs.io/en/latest/decoding-with-langugage-models/rescoring.html#id3).

JuanPZuluaga · 2023-10-10T15:56:40Z

Hi, sorry to step into this conversation. I have a question regarding the LM, is there any motivation why it is preferred RNNLM instead of Transformer-based LM for these experiments?

Thanks.

AmirHussein96 · 2023-10-11T00:59:24Z

Hi, sorry to step into this conversation. I have a question regarding the LM, is there any motivation why it is preferred RNNLM instead of Transformer-based LM for these experiments?

Thanks.

The primary reason for choosing RNN-LM is its computational efficiency and suitability for streaming applications. Additionally, the improvement from using a Transformer-LM compared to RNN-LM for rescoring is minimal.

AmirHussein96 · 2023-10-12T14:43:23Z

@marcoyang1998 I tried the modified_beam_search_LODR with LODR_scale=-.24 from https://k2-fsa.github.io/icefall/decoding-with-langugage-models/LODR.html and also LODR_scale=-.16 from my best modified_beam_search_lm_rescore_LODR() results.

beam LM scale ILM / LODR scale giga dev giga test
modified_beam_search (baseline) 4 0 0 20.81 19.95

RNNLM SF 4 0.1 0 20.3 19.55

RNNLM SF 4 0.29 0 19.88 19.21

RNNLM SF 4 0.45 0 20.1 19.46

RNNLM SF 12 0.29 0 19.77 19.01

RNNLM lm_rescore_LODR (bigram) 4 0.45 0.16 20.42 19.6

RNNLM LODR (bigram) 4 0.45 -0.24 19.38 18.71

RNNLM LODR (bigram) 4 0.45 -0.16 19.47 18.85

RNNLM LODR (bigram) 12 0.45 -0.24 19.1 18.44

RNNLM SF - ILME 4 0.29 0.1 19.7 18.96

RNNLM SF - ILME 4 0.45 0.1 19.54 18.89

RNNLM SF - ILME 4 0.29 0.2 19.84 18.99

RNNLM SF - ILME 12 0.45 0.1 19.21 18.57
The LODR results now are much better so I think modified_beam_search_lm_rescore_LODR() should be removed from beam_search.py.

The decoding command is below
for method in modified_beam_search_LODR; do
  ./zipformer_hat/decode.py \
  --epoch 40 --avg 16 --use-averaged-model True \
  --beam-size 4 \
  --exp-dir ./zipformer_hat/exp \
  --bpe-model data/lang_bpe_500/bpe.model \
  --max-contexts 4 \
  --max-states 8 \
  --max-duration 800 \
  --decoding-method $method \
  --use-shallow-fusion 1 \
  --lm-type rnn \
  --lm-exp-dir rnn_lm/exp \
  --lm-epoch 25 \
  --lm-scale 0.45 \
  --lm-avg 5 \
  --lm-vocab-size 500 \
  --rnn-lm-embedding-dim 512 \
  --rnn-lm-hidden-dim 512 \
  --rnn-lm-num-layers 2 \
  --tokens-ngram 2 \
  --ngram-lm-scale $LODR_scale
done

@marcoyang1998, you can check the updated table with beam 12. The results in the updated table show very close performance, with slight improvements in LODR over ILME. These results align with the findings presented in LODR paper: https://arxiv.org/pdf/2203.16776.pdf. Additionally, I conducted an MPSSWE statistical test, which indicates that there is no statistically significant difference between LODR and ILME.

	baseline	RNNLM SF	LODR	ILME
RNNLM SF	<0.001	-	<0.001	<0.001
LODR	<0.001	<0.001	-	1
ILME	<0.001	<0.001	1	-

danpovey · 2023-10-12T15:24:38Z

Great work! Perhaps we can put a note saying that the RNNLM rescoring of paths is not normally recommended, and instead direct people to the appropriate method. Did you see any difference between zipformer with normal RNN-T and zipformer-HAT?

…

On Thu, Oct 12, 2023 at 10:43 PM Amir Hussein ***@***.***> wrote: @marcoyang1998 <https://github.com/marcoyang1998> I tried the modified_beam_search_LODR with LODR_scale=-.24 from https://k2-fsa.github.io/icefall/decoding-with-langugage-models/LODR.html and also LODR_scale=-.16 from my best modified_beam_search_lm_rescore_LODR() results. beam LM scale ILM / LODR scale giga dev giga test modified_beam_search (baseline) 4 0 0 20.81 19.95 - RNNLM SF 4 0.1 0 20.3 19.55 - RNNLM SF 4 0.29 0 19.88 19.21 - RNNLM SF 4 0.45 0 20.1 19.46 - RNNLM SF 12 0.29 0 *19.77* *19.01* - RNNLM lm_rescore_LODR (bigram) 4 0.45 0.16 20.42 19.6 - RNNLM LODR (bigram) 4 0.45 -0.24 19.38 18.71 - RNNLM LODR (bigram) 4 0.45 -0.16 19.47 18.85 - RNNLM LODR (bigram) 12 0.45 -0.24 *19.1* *18.44* - RNNLM SF - ILME 4 0.29 0.1 19.7 18.96 - RNNLM SF - ILME 4 0.45 0.1 19.54 18.89 - RNNLM SF - ILME 4 0.29 0.2 19.84 18.99 - RNNLM SF - ILME 12 0.45 0.1 *19.21* *18.57* The LODR results now are much better so I think modified_beam_search_lm_rescore_LODR() should be removed from beam_search.py. The decoding command is below for method in modified_beam_search_LODR; do ./zipformer_hat/decode.py \ --epoch 40 --avg 16 --use-averaged-model True \ --beam-size 4 \ --exp-dir ./zipformer_hat/exp \ --bpe-model data/lang_bpe_500/bpe.model \ --max-contexts 4 \ --max-states 8 \ --max-duration 800 \ --decoding-method $method \ --use-shallow-fusion 1 \ --lm-type rnn \ --lm-exp-dir rnn_lm/exp \ --lm-epoch 25 \ --lm-scale 0.45 \ --lm-avg 5 \ --lm-vocab-size 500 \ --rnn-lm-embedding-dim 512 \ --rnn-lm-hidden-dim 512 \ --rnn-lm-num-layers 2 \ --tokens-ngram 2 \ --ngram-lm-scale $LODR_scale done @marcoyang1998 <https://github.com/marcoyang1998>, you can check the updated table with beam 12. The results in the updated table show very close performance, with slight improvements in LODR over ILME. These results align with the findings presented in LODR paper: https://arxiv.org/pdf/2203.16776.pdf. Additionally, I conducted an MPSSWE statistical test, which indicates that there is no statistically significant difference between LODR and ILME. baseline RNNLM SF LODR ILME RNNLM SF <0.001 - <0.001 <0.001 LODR <0.001 <0.001 - 1 ILME <0.001 <0.001 1 - — Reply to this email directly, view it on GitHub <#1291 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZFLO4TNCYBUGOP73UKHXTX676ZTANCNFSM6AAAAAA5TPKIDM> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

AmirHussein96 · 2023-10-12T16:08:04Z

Did you see any difference between zipformer with normal RNN-T and zipformer-HAT?

Yes we compared zipformer with the zipformer-HAT using greedy and modified beam search, and the performance is almost the same.

AmirHussein96 · 2023-11-29T20:47:24Z

Please let me know if any modifications are needed to finalize the merging of the pull request.

desh2608 · 2023-11-30T14:31:51Z

Please let me know if any modifications are needed to finalize the merging of the pull request.

@AmirHussein96 this needs the k2 PR (k2-fsa/k2#1244) to be merged first.

@csukuangfj besides ILM, I am also using HAT for joint speaker diarization (with my SURT model), and Amir is using it for joint language ID in code-switched ASR. We will make PRs for those recipes in the coming months, but it would be great to have these ones checked in first.

csukuangfj · 2023-12-01T02:25:35Z

@marcoyang1998 Could you have a look at this PR?

marcoyang1998 · 2023-12-01T02:37:48Z

egs/librispeech/ASR/zipformer_hat/train.py

+export CUDA_VISIBLE_DEVICES="0,1,2,3"
+
+# For non-streaming model training:
+./zipformer/train.py \


Please update the recipe name.

marcoyang1998 · 2023-12-01T02:55:09Z

Could you please add a section about HAT (WERs, training command, decoding command etc.) in RESULTS.md?

marcoyang1998 · 2023-12-01T02:56:09Z

I had a glance and left a few comments. The rest looked fine, thanks for the work!

Would you mind uploading your HAT model to huggingface so that other people can try it?

desh2608 · 2024-06-19T13:22:12Z

@AmirHussein96 if you have some time, can we make a final push to get this checked in?

adding ILM beam search and decoding

5c4a7ff

desh2608 linked an issue Oct 5, 2023 that may be closed by this pull request

Hybrid autoregressive transducer #1271

Open

Update decode.py

6107374

marcoyang1998 reviewed Dec 1, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding ILM beam search and decoding #1291

Adding ILM beam search and decoding #1291

AmirHussein96 commented Oct 5, 2023 •

edited

Loading

desh2608 commented Oct 5, 2023

marcoyang1998 commented Oct 5, 2023

yfyeung commented Oct 8, 2023 •

edited

Loading

yfyeung commented Oct 9, 2023

desh2608 commented Oct 9, 2023

AmirHussein96 commented Oct 10, 2023 •

edited by csukuangfj

Loading

marcoyang1998 commented Oct 10, 2023

AmirHussein96 commented Oct 10, 2023 •

edited

Loading

AmirHussein96 commented Oct 10, 2023 •

edited

Loading

marcoyang1998 commented Oct 10, 2023

JuanPZuluaga commented Oct 10, 2023

AmirHussein96 commented Oct 11, 2023

AmirHussein96 commented Oct 12, 2023

danpovey commented Oct 12, 2023 via email

AmirHussein96 commented Oct 12, 2023 •

edited

Loading

AmirHussein96 commented Nov 29, 2023

desh2608 commented Nov 30, 2023

csukuangfj commented Dec 1, 2023

marcoyang1998 Dec 1, 2023

marcoyang1998 commented Dec 1, 2023

marcoyang1998 commented Dec 1, 2023

desh2608 commented Jun 19, 2024

Adding ILM beam search and decoding #1291

Are you sure you want to change the base?

Adding ILM beam search and decoding #1291

Conversation

AmirHussein96 commented Oct 5, 2023 • edited Loading

desh2608 commented Oct 5, 2023

marcoyang1998 commented Oct 5, 2023

yfyeung commented Oct 8, 2023 • edited Loading

yfyeung commented Oct 9, 2023

desh2608 commented Oct 9, 2023

AmirHussein96 commented Oct 10, 2023 • edited by csukuangfj Loading

marcoyang1998 commented Oct 10, 2023

AmirHussein96 commented Oct 10, 2023 • edited Loading

AmirHussein96 commented Oct 10, 2023 • edited Loading

marcoyang1998 commented Oct 10, 2023

JuanPZuluaga commented Oct 10, 2023

AmirHussein96 commented Oct 11, 2023

AmirHussein96 commented Oct 12, 2023

danpovey commented Oct 12, 2023 via email

AmirHussein96 commented Oct 12, 2023 • edited Loading

AmirHussein96 commented Nov 29, 2023

desh2608 commented Nov 30, 2023

csukuangfj commented Dec 1, 2023

marcoyang1998 Dec 1, 2023

Choose a reason for hiding this comment

marcoyang1998 commented Dec 1, 2023

marcoyang1998 commented Dec 1, 2023

desh2608 commented Jun 19, 2024

AmirHussein96 commented Oct 5, 2023 •

edited

Loading

yfyeung commented Oct 8, 2023 •

edited

Loading

AmirHussein96 commented Oct 10, 2023 •

edited by csukuangfj

Loading

AmirHussein96 commented Oct 10, 2023 •

edited

Loading

AmirHussein96 commented Oct 10, 2023 •

edited

Loading

AmirHussein96 commented Oct 12, 2023 •

edited

Loading