Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vosk-server and vtt_client.py sample mismatch #235

Open
echoTab opened this issue Aug 15, 2023 · 4 comments
Open

Vosk-server and vtt_client.py sample mismatch #235

echoTab opened this issue Aug 15, 2023 · 4 comments

Comments

@echoTab
Copy link

echoTab commented Aug 15, 2023

I have vosk-server running on a VPS server under Ubuntu 22.04, cloned from https://github.com/alphacep/vosk-server. And I have vtt_client.py running on Windows 10 via WSL2/Ubuntu, cloned from https://github.com/MaxVRAM/vosk_vtt_client.git. Lots of problems getting pyaudio to work but finally got it to run after installing via conda (although vtt_client still thros many ASLA lib errors).

However when I start vosk-server and then vtt_client I get "sampling frequency mismatch, expected 16000, got 8000". I have tried hard coding vosk-server to 8k, and also tried hard coding vtt_client to 16K. Neither of these changed the error message. Also tried running the server with --allow_{upsample,downsample} but this did not help either.

Run out of ideas, are you able to help?

@nshmyrev
Copy link
Contributor

What model are you running on the server?

To change everything to 16khz, you need to change both server:

https://github.com/alphacep/vosk-server/blob/master/websocket/asr_server.py#L95

and client

https://github.com/MaxVRAM/Vosk-VTT-Client/blob/main/vtt_client.py#L61

In general, we recommend sounddevice for microphone recording, we do not recommend pyaudio. We also recommend to use our examples instead of external projects.

@echoTab
Copy link
Author

echoTab commented Aug 17, 2023

Thanks for your reply. I have been running with vosk-model-small-en-us-0.15, which I understand requires 16k sample rate. This may be a dumb question but looking at the code of asr_server.py I realise that maybe I have been confused between 'model' and 'spk_model'. Could you please explain how these differ?

@nshmyrev
Copy link
Contributor

spk_model is for voice recognition (speaker identity).

@echoTab
Copy link
Author

echoTab commented Aug 17, 2023

Thank you. It is working now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants