Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text output in 20 second chunks #240

Open
alechirsch opened this issue Oct 13, 2023 · 4 comments
Open

Text output in 20 second chunks #240

alechirsch opened this issue Oct 13, 2023 · 4 comments

Comments

@alechirsch
Copy link

I am using the websocket server docker image for the english model. I am feeding it a live stream of converted (to wav) audio for telephony purposes. I have noticed that the websocket returns parsed text in no more than 20 second chunks of speech. This is causing issues where the transcription can get cut off in the middle of a word around the 20 second mark per chunk. Is this a known limitation? Is there any way to increase the time of each finalized text chunk?

@nshmyrev
Copy link
Contributor

In model.conf you add

--endpoint.rule5.min-utterance-length=100

it will be 100 seconds instead of 20.

In general you are not really interested in very long utterances. It should stop earlier due to pause.

@alechirsch
Copy link
Author

alechirsch commented Oct 14, 2023 via email

@alechirsch
Copy link
Author

If there is a cleaner way do this without using a volume, please let me know

docker run -it alphacep/kaldi-en /bin/bash -c "echo '--endpoint.rule5.min-utterance-length=100' >> /opt/vosk-model-en/model/conf/model.conf && python3 ./asr_server.py /opt/vosk-model-en/model"

@GuillaumeV-cemea
Copy link

Using a custom Dockerfile seems cleaner to me, something like this :

FROM alphacep/kaldi-en
RUN echo '--endpoint.rule5.min-utterance-length=100' >> /opt/vosk-model-en/model/conf/model.conf
CMD [ "python3", "./asr_server.py", "/opt/vosk-model-fr/model" ]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants