Probably because the parent comment didn't contain much of substance. "Oh, I'd love to see this with [insert my favorite model here]" doesn't really add a lot to the discussion.
For example, the parent commenter could have talked about the specific attributes of that model that make it superior. I personally am aware that Mixtral is one of the best performing models right now, but is everyone else? Also, does Mixtral need to be uncensored? I've used vanilla Mistral for some...interesting...prompts and had no issues with it moralizing at me.
Yeah, so they demo a bigger model on an RTX 4090 with 24 GB VRAM. Granted an implementation of sparse activations with the Mixture of Experts could be non-trivial, I think it’s a brilliant move, that could potentially allow for even, e.g., CPU only processing and/or much cheaper GPU processing… Mixtral technically already has neural network controlled sparse activations, but like the Inception meme says: we must go deeper…
It's more about what's possible to build. Dual 4090 or 3090 is possible to setup without hassle. Beyond that not really because it'd be above home power socket rating, not possible to fit on the board and case etc.
It's true you can also build dual A6000 with 48+48 = 96GB VRAM also, but that's $10k+ setup just for GPUs on legacy generation.