Hacker News new | past | comments | ask | show | jobs | submit login

Mixture of experts is different from ensembles because MoE happens at every layer as opposed to joining the models once at the end



Thanks, that makes sense - and isn't obvious from the explanations I see people give.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: