Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hybrid autoregressive transducer (HAT) #1244

Merged
merged 17 commits into from
Dec 19, 2023
Merged

Hybrid autoregressive transducer (HAT) #1244

merged 17 commits into from
Dec 19, 2023

Conversation

desh2608
Copy link
Contributor

This is an implementation of the HAT loss proposed in https://arxiv.org/abs/2003.07705.

The test produces reasonable looking losses. I am working on a LibriSpeech zipformer recipe using this loss. In general, it is not expected to improve upon the RNNT loss by itself, but may be useful for things like using external LMs. I am planning to use it in multi-talker ASR for speaker attribution (e.g. https://arxiv.org/abs/2309.08489)

@danpovey
Copy link
Collaborator

Great!

@desh2608
Copy link
Contributor Author

desh2608 commented Dec 2, 2023

@csukuangfj could you also check this when you have some time? Thanks!

@csukuangfj
Copy link
Collaborator

@csukuangfj could you also check this when you have some time? Thanks!

Thanks! Left a minor comment. Otherwise, it looks good to me.

@desh2608
Copy link
Contributor Author

@csukuangfj could you also check this when you have some time? Thanks!

Thanks! Left a minor comment. Otherwise, it looks good to me.

Sorry it took a while since I was on vacation last 2 weeks. I have made the change.

@csukuangfj
Copy link
Collaborator

Thanks!

@csukuangfj csukuangfj merged commit 7711d16 into k2-fsa:master Dec 19, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants