Reduce model initialization time for online speech recognition #215
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Implemented
model_type
parameter for online transducer models as suggested by @csukuangfj in this issue and based largely on this PR.Add a new argument
--model-type
so that it only needs to load the model once.Otherwise, it needs to load the model twice, where the first loading is to determine the model type.
I have tested csukuangfj/sherpa-onnx-streaming-zipformer-en-2023-06-26 via the Python API on Linux, by specifying
model_type="zipformer2"
during initialization.The model loading time is reduced from ~6 seconds to ~3 seconds for the fp32 model; while for the int8 model the time is reduced from 4.4 seconds to 2.1 seconds.