Browser models are too small, unlikely they recognize accurately. They are more for simple predefined phrase.
You can probably try vosk-api on the desktop-grade machine. You need big models from https://alphacephei.com/vosk/models, they require like 8Gb to run but they are much more accurate.
'm using vosk browser: https://github.com/ccoreilly/vosk-browser
To do speech to text locally and it works very well for English.