Yeah the Mixture of Experts might have not been called out by name, but it was p...

og_kalu · on July 11, 2023

Sparse architectures are a way to theoritcally utilize only a small portion of a general models parameters at any given time. All "experts" are trained on the exact same data. They're not experts in the way you seem to think they are and they're certainly not wholly different models. The "experts" work at the token level. An expert for one token could be different from the expert chosen for the very next.

GPT-4 isn't "nothing like AGI" any more than its dense equivalent would be.

htss2013 · on July 11, 2023

I dont see how LLMs using many experts means it's very different from AGI. Why would anyone assume that human AGI isn't based on multiple models running in a similar architecture? At minimum humans are operating with a left and right brain, which process data very differently.