RAM for 4-bit is 1GB per 2 billion parameters. So you will want 256GB RAM and at...

		lanceflt 9 hours ago \| parent \| context \| favorite \| on: Tencent Hunyuan-Large RAM for 4-bit is 1GB per 2 billion parameters. So you will want 256GB RAM and at least one GPU. If you only have one server and one user, it's the full parameter count. (If you have multiple GPUs/servers and many users in parallel, you can shard and route it so you only need the active parameter count per GPU/server. So it's cheaper at scale.)

zamadatix 8 hours ago [–]

Do the inactive parameters need to be loaded into RAM to run an MoE model decently enough?