A long while ago, I wrote a little tutorial[0] on quantizing a speech commands network to the Raspberry. I used that to control lights directly and also for wake word detection.
More recently, I found that I can just use more classic VAD because my uses typically don't suffer if I turn on/off the microphone. My main goal is to not get out the mobile phone for information. That reduces the processing when I turn on the radio...
Not high-end as your solution, but nice enough for my purposes.
A long while ago, I wrote a little tutorial[0] on quantizing a speech commands network to the Raspberry. I used that to control lights directly and also for wake word detection.
More recently, I found that I can just use more classic VAD because my uses typically don't suffer if I turn on/off the microphone. My main goal is to not get out the mobile phone for information. That reduces the processing when I turn on the radio...
Not high-end as your solution, but nice enough for my purposes.
[0]. https://devblog.pytorchlightning.ai/applying-quantization-to...