I’m the author of the post and will also be releasing an open source speech to text app in the coming weeks. It’s what I’ve been using for months, but packaging it for people
It’s going to be extremely simple, and hopefully easy enough to use. MIT licensed, free
Right now the main priority is just getting the data out, but in the future may have some interest in this. Or perhaps we can open an API for others to build this as well
Llamafile could certainly be released without the GPU binaries included by default and it would slim down the size tremendously.
The extra 70MiB is that the CUDA binaries for LocalScore are built with CuBLAS and for more generations of NVIDIA architectures (sm60->sm120), whereas Llamafile is built with TinyBLAS and for just a few generations in particular
I think it's possible to randomize weights with a standard set of layers, and maybe a possibility for the future
reply