All local models:
- VAD: Webrtcvad (first fast check) followed by SileroVAD (high compute verification)
- Transcription: base.en whisper (CTranslate2)
- Turn Detection: KoljaB/SentenceFinishedClassification (selftrained BERT-model)
- LLM: hf.co/bartowski/huihui-ai_Mistral-Small-24B-Instruct-2501-abliterated-GGUF:Q4_K_M (easily switchable)
- TTS: Coqui XTTSv2, switchable to Kokoro or Orpheus (this one is slower)
That would be absolutely awesome. But I doubt it, since they released a shitty version of that amazing thing they put online. I feel they aren't planning to give us their top model soon.