Nice. Siri is completely unusable with an accuracy of less than 10%. I'm guessin...

kkielhofner · on May 15, 2023

GPU (currently CUDA only) is our primary target for our inference server implementation. It "runs" on CPU but our goal is to enable an ecosystem that is competitive with Alexa in every possible way and even with the amazing work of whisper.cpp and other efforts it's just not happening (yet).

We're aware that's controversial and not really applicable to many home users - that's why we want to support any TTS/STT engine on any hardware supported by Home Assistant (or elsewhere) in addition to ESP BOX on device local command recognition.

But for the people such as yourself, and other commercial/power/whatever users our inference server that we're releasing next week that works with Willow provides impressive results - on anything from a GTX 1060 to an H100 (we've tested and optimized for anything in between the two).

We use ctranslate2 (like faster-whisper) and some other optimizations for performance improvements and conservative VRAM usage. We can simultaneously load large-v2, medium, and base on a GTX 1060 3GB and handle requests without issue.

Again, it's controversial but the fact remains a $100 Tesla P4 that idles at 5 watts and has max TDP of 60 watts from eBay with our inference server implementation does the following:

large-v2, beam 5 - 3.8s of speech, inference time 1.1s

medium, beam 1 (suitable for Willow tasks) - 3.8s of speech, inference time 588ms

medium, beam 1 (suitable for Willow tasks), 29.2s of speech, inference time 1.6s

An RTX 4090 with large-v2, beam 5 does 3.8s of speech in 140ms and 29.2s of speech with medium beam 1 (greedy) in 84ms.

macrolime · on May 15, 2023

You've convinced me. Just ordered an ESP-BOX :p

Got a Home Assistant Yellow not long ago, so would be nice to get some decent voice control for it.

CharlesW · on May 15, 2023

> Siri is completely unusable with an accuracy of less than 10%.

That seems unusual. I've been using both for the last few weeks while replacing my Homebridge setup, and Siri has been as accurate as Alexa — good enough that I've decided that I can now leave the Alexa ecosystem. To be more specific, both are (conservatively) 95%+ accurate for my home control scenarios.

macrolime · on May 15, 2023

I've never tried any voice recognition system that works well. Maybe my accent is too different from typical training data or something. I had a voice recognition program on my computer in 1994 that had about the same accuracy for me as any modern voice recognition system that I have tried.