The real question will be, how much you can quantize that while still retaining sanity. 400b at 2-bit would be possible to run on a Mac Studio - probably at multiple seconds per token, but sometimes that's "fast enough".
Yes. I expect an explosion of research and experimentation in model compression. The good news is I think there are tons of avenues that have barely been explored at all. We are at the very beginning of understanding this stuff, and my bet is that in a few years we'll be able to compress these models 10x or more.