Was I the only who got to the end and was like, “and then…?” You installed it an...

JohnTheNerd · on Jan 14, 2024

it actually works really well when I use it, but is slow because of the 4060Ti's (~8 seconds) and there is slight overfitting to the examples provided. none of it seemed to affect the actions taken, just the commentary.

I don't have prompts/a video demo on hand, but I might get and post them to the blog when I get a chance.

I didn't intend to make a tech demo, this is meant to help anyone else who might be trying to build something like this (and apparently HomeAssistant itself seems to be planning such a thing!).

blagie · on Jan 14, 2024

> no matter how careful your prompt engineering is, you simply can’t use tiny models to do big complicated tasks.

I can and do! The progress in ≈7B models has been nothing short of astonishing.

> My experience with this stuff has been mixed

That's a more accurate way to describe it. I haven't figured out a way to use ≈7B models for many specific tasks.

I've followed a rapidly growing number of domains where people have figured out how to make them work.

wokwokwok · on Jan 15, 2024

> I can and do!

I’m openly skeptical.

Most examples I’ve seen of this have been frankly rubbish, which has matched my experience closely.

The larger models, like 70B are capable of generating reasonably good structured outputs and some of the smaller ones like codellama are also quite good.

The 7b models are unreliable.

Some trivial tasks (eg. Chatbot) can be done, but most complex tasks (eg. Generating code) require larger models and multiple iterations.

Still, happy to be shown how wrong I am. Post some examples of good stuff you’ve done on /r/localllama

…but so far, beyond porn, the 7B models haven’t impressed me.

Examples that actually do useful things are almost always either a) claimed with no way of verifying or doing it yourself, or b) actually use the openAI API.

That’s been my experience anyway.

I standby what I said: prompt engineering can only take you so far. There’s a quantitative hard limit on what you can do with just a prompt.

Proof: if it was false, you could do what GPT4 does with 10 param model and a good prompt.

You can’t.

blagie · on Jan 15, 2024

> Proof: if it was false, you could do what GPT4 does with 10 param model and a good prompt.

This is oh so very much a strawman. There is rapid progress in AI. For my domains, the first useful model (without finetuning or additional training) was GPT3, which was released in 2020, and had 175B parameters.

We've had three years of optimization on the models, as well as a lot of progress on how to use them. That means we need fewer parameters today than we did in 2020. That doesn't imply there isn't a hard lower bound somewhere. We just don't know where or what it is.

My expectation is we'll continue to do better and better until, where e.g. a 2030 1B parameter model will be competitive with a 2020 200B parameter model, and a 2030 200B parameter model will be much better than either. After some amount of progress, we'll hit it (or more accurately, asymptotically converge to it).

I don't use local LLMs for coding, but for things related to text (it is a large LANGUAGE model, after all). For that, 7B parameter models became adequate sometime in 2023. For reference, in 2020, they were complete nonsense. You'd get cycles of repeating text, or just lose coherence after a sentence or two.

With my setup, local models aren't anywhere close to fast enough for real-time use. For coding, I need real-time use. It wouldn't surprise me if that domain needed more parameters, just based on what I've seen, but I could be proven wrong. If you buy me an H100, I can experiment with it too. As a footnote, many LARGE models work horribly for coding too; OpenAI did a very good job with GPT there (and I haven't used it enough to know, but I've heard Google did too from people who've used Bard).

moffkalast · on Jan 14, 2024

> The progress in ≈7B models has been nothing short of astonishing.

I'd even still rank Mistral 7B above Mixtral personally, because the inference support for the latter is such a buggy mess that I have yet to get it working consistently and none of what I've seen people claim it can do has ever materialized for me on my local setup. MoE is a real fiddly trainwreck of an architecture. Plus 7B models can run on 8GB LPDDR4X ARM devices at about 2.5 tok/s which might be usable for some integrated applications.

It is rather awesome how far small models have come, though I still remember trying out Vicuna on WASM back in January or February and being impressed enough to be completely pulled into this whole LLM thing. The current 7B are about as good as the 30B were at the time, if not slightly better.

stbtrax · on Jan 15, 2024

Which domains?

blagie · on Jan 15, 2024

Mostly ones related to text transformation (e.g. changing text style) and feedback (e.g. giving suggestions for how to improve text). A year ago, the ones I tried were useless and dumb. Right now, they work quite well.

rubymamis · on Jan 14, 2024

I was expecting a video showing it in action...

nurettin · on Jan 14, 2024

I was expecting to see funny interactions between the user and their GlaDos prompt. And watching people respond to this post in serious LinkedIn tones is as hilarious as his project which seems to be tailored for a portal nerd.

hxypqr · on Jan 14, 2024

mixtral 7*8B does indeed have this characteristic. It tends to disregard the requirement for structured output and often outputs unnecessary things in a very casual manner. However, I have found that models like qwen 72b or others have better controllability in this aspect, at least reaching the level of gpt 3.5.