the paper with details: https://arxiv.org/pdf/2411.02265 They use - 16 experts, ...

		1R053 12 hours ago \| parent \| context \| favorite \| on: Tencent Hunyuan-Large the paper with details: https://arxiv.org/pdf/2411.02265 They use - 16 experts, of which one is activated per token - 1 shared expert that is always active in summary that makes around 52B active parameters per token instead of the 405B of LLama3.1.