Browser models are too small, unlikely they recognize accurately. They are more for simple predefined phrase.
You can probably try vosk-api on the desktop-grade machine. You need big models from, they require like 8Gb to run but they are much more accurate.
'm using vosk browser:
To do speech to text locally and it works very well for English.