Nice! How's the speech recognition accuracy and response latency?

kkielhofner · on May 15, 2023

Thanks!

Faster than Alexa (and only going to get faster)[0].

Between the far-field speech optimizations provided by the ESP BOX and Espressif frameworks and our inference server (open sourcing next week) using Whisper, and our unique streaming format we've found it to be comparable in terms of quality to Alexa/Echo even with background noise and at distances of up to 30 feet.

[0] - https://www.youtube.com/watch?v=8ETQaLfoImc

tikkun · on May 15, 2023

That's really nice - and thanks for including the demo link too, impressive!

kkielhofner · on May 15, 2023

Thanks again!

Not only are we working on improving performance with the inference server, local on device command recognition is extremely fast. Like "did that really just happen?" fast.

In my local setup when using locally-controlled Wemo switches I swear the latency with local devices is around 300ms or so.

I should make another demo video with that...