Mixture of experts is different from ensembles because MoE happens at every laye... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

why_only_15 on July 11, 2023 | parent | context | favorite | on: GPT-4 details leaked?

Mixture of experts is different from ensembles because MoE happens at every layer as opposed to joining the models once at the end

TeMPOraL on July 11, 2023 [–]

Thanks, that makes sense - and isn't obvious from the explanations I see people give.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact