Thanks! If I'm being perfectly honest I'm surprised we got it this far already. ...

michaelmior · on May 15, 2023

> the ESP BOX itself can recognize up to 400 commands directly on the device.

That's really cool! Does this mean 400 specific commands, e.g. "turn on the living room lights" or 400 commands that can be applied to different targets, e.g. "turn on the X lights" where X is some light. (400 actually feels like it would be enough to speed up the vast majority of interactions either way, but I'm curious :)

kkielhofner · on May 15, 2023

400 commands where "turn on X" is one and "turn off X" is two.

With Home Assistant this means turning on and off two hundred entities. We currently pull light and switch entities from Home Assistant and build the local Multinet speech grammar.

We have goals for better dynamic and adaptive configuration of Willow and part of that is using a Willow Home Assistant component with user configuration inthe HA dashboard, etc to easily select entities, define commands, etc and dynamically update all associated Willow devices.

We feel that with this 400 commands is enough to be practical and useful. Additionally, because the Multinet model returns probability on match to command "fuzzy matching" actually works quite well where "light", "lights", and slightly mis-worded commands still match correctly.

haarts · on May 15, 2023

With regard to this:

> - On the wire/protocol stuff. We're doing pretty rudimentary "open new connection, stream voice, POST somewhere". This adds extra latency and CPU usage because of repeated TLS handshakes, etc. We have plans to use Websockets and what-not to cut down on this.

I've recently used the Noise protocol[1] to do some encrypted communication between two services I control but separated by the internet.

It was surprisingly easy!

[1]: https://noiseprotocol.org/

kkielhofner · on May 15, 2023

Thanks for mentioning noise! I've certainly looked at it before but our challenge is the sheer scope of what we're doing. Not to mention (similar to WebRTC that people have asked about) I'm not completely understanding the fit and benefit for our use case and application.

I talk about websockets because they achieve our mission and goal (in this case shaving milliseconds off command -> action -> confirmation) with robust, battle-tested client implementations already available in the ESP framework libraries. Same thing for MQTT. Both are supported by Home Assistant (and almost everything else in the space) today.

Because of this existing framework support, we'll have websockets done today-ish. Then we can (for now) move on to all of the other things people have asked for :). Hah, priorities!

Not saying Noise won't/can't ever happen - just that this is a very ambitious project as it stands and we have plenty of work to do all over the place :)!

Want to write a noise implementation for ESP IDF :)?

tikkun · on May 15, 2023

Nice! How's the speech recognition accuracy and response latency?

kkielhofner · on May 15, 2023

Thanks!

Faster than Alexa (and only going to get faster)[0].

Between the far-field speech optimizations provided by the ESP BOX and Espressif frameworks and our inference server (open sourcing next week) using Whisper, and our unique streaming format we've found it to be comparable in terms of quality to Alexa/Echo even with background noise and at distances of up to 30 feet.

[0] - https://www.youtube.com/watch?v=8ETQaLfoImc

tikkun · on May 15, 2023

That's really nice - and thanks for including the demo link too, impressive!

kkielhofner · on May 15, 2023

Thanks again!

Not only are we working on improving performance with the inference server, local on device command recognition is extremely fast. Like "did that really just happen?" fast.

In my local setup when using locally-controlled Wemo switches I swear the latency with local devices is around 300ms or so.

I should make another demo video with that...

michaelmior · on May 15, 2023

> Open sourcing our inference server

I'm curious if this is something lightweight enough that might be possible to run as a Home Assistant add-on on relatively low-powered hardware such as an RPi.

kkielhofner · on May 15, 2023

I talk about this a bit on the wiki[0] but our goal is to have a Willow Home Assistant component do the Willow specific stuff and enable users to use any of the STT/TTS modules provided by Home Assistant.

We'll also (likely) be creating our own TTS/STT HA component for our inference server that does some special/unique things to support Willow.

[0] - https://github.com/toverainc/willow/wiki/Home-Assistant