Hacker News new | past | comments | ask | show | jobs | submit login

Wait until you hear about frankenmodels. You rip parts of one model (often attention heads) and transplant them in another and somehow that produces coherent results! Witchcraft

https://huggingface.co/chargoddard




>somehow that produces coherent results

with or without finetuning? Also is there a practical motivation for creating them?


> with or without finetuning?

With, but it's still bonkers that it works so well

>Also is there a practical motivation for creating them?

You could get in-between model sizes (like 20b instead of 13b or 34b). Before better quantization it was useful for inference (if you are unlucky with vram size), but now I see this being useful only for training because you can't train on quants


> With, but it's still bonkers that it works so well

Ehhhh…




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: