Running uncensored Mixtral on this would be really nice. More than 3 bits quanti...

eurekin · on Dec 20, 2023

Downvoters care to comment? Uncensored llm versions typically perform better (at least on benchmarks) to their "lobotomized" or aligned counterparts

infotainment · on Dec 20, 2023

Probably because the parent comment didn't contain much of substance. "Oh, I'd love to see this with [insert my favorite model here]" doesn't really add a lot to the discussion.

For example, the parent commenter could have talked about the specific attributes of that model that make it superior. I personally am aware that Mixtral is one of the best performing models right now, but is everyone else? Also, does Mixtral need to be uncensored? I've used vanilla Mistral for some...interesting...prompts and had no issues with it moralizing at me.

lannisterstark · on Dec 21, 2023

I mean, does it need to? Not every comment has to be plethora of hidden information. Sometimes people are just excited.

legel · on Dec 20, 2023

Yeah, so they demo a bigger model on an RTX 4090 with 24 GB VRAM. Granted an implementation of sparse activations with the Mixture of Experts could be non-trivial, I think it’s a brilliant move, that could potentially allow for even, e.g., CPU only processing and/or much cheaper GPU processing… Mixtral technically already has neural network controlled sparse activations, but like the Inception meme says: we must go deeper…

mirekrusin · on Dec 20, 2023

Dual GPUs should be considered normal/consumer grade setup, hopefully they'll add it soon, on 4bits it's enough with plenty of space for context.

This whole thing is a fork of llamacpp, also hoping it'll all go upstream sooner or later.

8n4vidtmkvmk · on Dec 21, 2023

4090s aren't really normal either. How many people have dual GPUs? I don't think it helps much with games last I checked so you'd only buy 2 for AI.

mirekrusin · on Dec 21, 2023

It's more about what's possible to build. Dual 4090 or 3090 is possible to setup without hassle. Beyond that not really because it'd be above home power socket rating, not possible to fit on the board and case etc.

It's true you can also build dual A6000 with 48+48 = 96GB VRAM also, but that's $10k+ setup just for GPUs on legacy generation.

kridsdale1 · on Dec 21, 2023

There’s the physical hassle. It was very difficult for me to fit 1 3090 in my case.

mirekrusin · on Dec 21, 2023

Yes, watercooled variants are better for dual setup (at least one, better two).

llamaInSouth · on Dec 21, 2023

looking good https://www.youtube.com/watch?v=q2KpPUOsBCs