Wait until you hear about frankenmodels. You rip parts of one model (often attention heads) and transplant them in another and somehow that produces coherent results! Witchcraft
With, but it's still bonkers that it works so well
>Also is there a practical motivation for creating them?
You could get in-between model sizes (like 20b instead of 13b or 34b). Before better quantization it was useful for inference (if you are unlucky with vram size), but now I see this being useful only for training because you can't train on quants
https://huggingface.co/chargoddard