The Qwen family of models are REALLY impressive. I would encourage anyone who hasn't paid them any attention to at least add them to your mental list of LLMs worth knowing about.
QwQ is the Qwen team's exploration of the o1-style of model that has built in chain-of-thought. It's absolutely fascinating, partly because if you ask it a question in English it will often think in Chinese before spitting out an answer in English. My notes on that one here: https://simonwillison.net/2024/Nov/27/qwq/
Most of the Qwen models are Apache 2 licensed, which makes them more open than many of the other open weights models (Llama etc).
(Unsurprisingly they all get quite stubborn if you ask them about topics like Tiananmen Square)
Thanks for the summary. I have been testing QwQ on my M1 (via ollama). I tried a couple double-slit quantum thought experiments, and also found the reasoning mode absolutely fascinating. Occasionally a few logographs appear, but they so far they were not in the way.
The funniest was asking for an ascii graphics depiction of a minecraft watch recipe, and I was actually feeling quite sorry for it, 'wait that can't be right' 'let me try' 'still not right' round and round it went, at least a few pages at which point it decided to try the second recipe I'd asked about to see if that helped with the first.
I didnt know about the other models, 'coder' is downloading now, and fingers crossed it fits in 32GB and knows a bit about Zig.
It sounds like you got the vision one running locally on your M2, nice. I'm running Asahi Linux and not tried anything AI/SD/graphical orientated yet. But nice that you got some SVG out of coder, I never thought of using a coding model in that way.
QwQ often spits out Chinese characters smack dab in the middle of a sentence. Weirdly, it doesn’t break up the coherence or logic, there’s just symbols added.
I haven’t seen the architecture of QwQ but I just assumed it learns languages insofar as to pick up relationships between words. It must mean it picks up logic across languages. Huh
Browser use is very easy. Can even do that headless. That way, you can also do bulk processing. For a client, I did some 16k websites with a simple LLM agent. With “computer use” how long would that take, and what would it cost? For me, it was ~$20 (I used Gemini for this task).
Agree. It is amazing that you can run an o1 style model on a Mac. I was able to run QwQ on my 24GB M3 MacBook Air, though results on complex reasoning on domain specific tasks did not work well, and I saw the Chinese 'thinking' too (they don't work well in o1 either). It opens up experimentation which is great, and the reasoning traces for domain specific tasks for RL is where the next improvements are going to come from
Why does this model think in Chinese and o1 think in English? Is this because chain-of-thought is achieved by training these models on examples of what “thinking” looks like, which have been constructed by their respective model developers, as opposed to being a more generic feature?
I noticed that too, but I haven't seen it think in numbers in Chinese like most bilingual chinese speakers prefers. Or at least I haven't been able to trigger it.
Recently I was scrolling through HF to try a very small model. Fired up Qwen 0.5 B and surprisingly for my purposes it did between that even Lllama 2 7B. That was very surprising to me.
That seems to be just for LLMs, not visual. I'm wanting to go from images of maths notation (photos, scans, digital handwriting) to formulas in Latex or MathML or something. Qwen2-VL can do it, but it's pretty heavyweight for just that.
IMHO Qwen are shipping the best OSS models you can run locally on consumer GPUs right now, getting great results from both qwen2.5-coder:32b and qwq:32b running at ~18 tok/s and running on older NVidia A4000 GPUs. Definitely my first choices for local workloads.
It's also great to see qwq's open chain of thought baked in a OSS LLM so you can see it reason with itself in real-time, it's the kind of secret sauce that proprietary LLMs like o1 would prefer to keep hidden to try build a moat around.
We've got a lot to thank Meta and Qwen for in continually releasing improving high quality OSS models which also encourages others to follow. High quality OSS models are the best thing keeping the cost of LLMs down, you can get unbelievable value on OpenRouter with qwen2.5-coder:32b at $0.08/$0.18 M/tok qwq:32 available at $0.15/$0.60 M/tok which is more than 18x cheaper than Anthropic's latest budget Haiku 3.5 model at $0.80/$4 M/tok (4x price hike over Haiku 3.0).
Previously I've always been very skeptical of rosy pictures of a possible future where "everyone has an ai that's there to accomplish tasks for them" - given that I imagined such ai (if it ever came to exist) being run by the usual big tech who have their own incentives not so cleanly aligned with our own.
Right now, with the availability of open weights for cutting-edge models, it feels like this wave of technological advance is pleasantly decentralised however. I can download and run a model and tinker with things which at least feel like the seeds of such a future, where I _might_ be able to build things with my own interests at heart.
But what happens if these models stop being shared, and how likely is that? Reading about the vast quantities of compute deployed to train them, replicating the successes of the main players with a community of volunteers just seems an order of magnitude less achievable than traditional OSS efforts like Linux. This wave feels so tied to massive scale for its success, what do we do if big-tech stop handing out models?
I think we're all fortunate in that the companies behind the best OSS models, i.e. Meta/llama and Alibaba/Qwen are funding their compute and R&D from secondary business models instead of VC capital or an AI company who's primary business model is direct revenue from their models and will be seeking ROI. That's why I don't expect we can rely on Mistral AI to OSS their best models in the long run since that's their primary business model. This is reflected in their hosting costs which charge a healthy premium that's always more expensive than OpenRouter providers hosting their OSS models.
But I don't see why Meta and Alibaba would stop releasing their best models as OSS, since they benefit from the tooling, optimizations and software ecosystems being developed around their OSS models and don't benefit from a future where the best AI models are centralized behind the big tech Corps. As long as their core business remain profitable I don't expect them to stop improving and sharing their OSS models.
I read an article about humanoid robots yesterday and it scared me that it seemed like the expectation is still that the robot will be 24/7 online and "thinking" using some cloud brain. The current models described more in detail all used Open AI as a brain.
Having a personal robot would be great, but they have to invent a fully offline real positronic brain before I will consider allowing one in my house.
Fully open source might be too much to hope for, but that would obviously be the ideal. If it is closed source it definitely should be offline. I can have another, carefully sandboxed, AI in my computer that can help out with tasks that require online access. No need for the two types to be built into the same device.
Prediction 1: The value isn't in the foundation model, it's in fine tuning and in tightly integrated products.
Prediction 2: The ecosystem around open source models will grow to be much larger, richer, and deeper than closed source models.
If these are true, then OpenAI and Anthropic are in a precarious place. They basically burned a lot of capital to show the open source second movers what to build.
Nova Micro is $0.035/$0.14 and Google's Gemini 1.5 Flash 8B is $0.0375/$0.15 - just beating those OpenRouter prices, but it may well be that the Qwen models provide better results.
Yeah I'm currently using Gemini 2 Flash (exp) free quota for a premium hosted model, it's a surprisingly great model, IMO Google has caught up with the leaders with their latest experimental models. I've also tested Nova's models, which are pretty high quality and exception value (lite/micro) for their performance.
Also worth shouting out you can get Meta's latest llama-3.3:70b (comparable to llama3.1:405b but must faster and cheaper) within GroqCloud's free quotas running at an impressive 276 tok/s.
You know that you can make any model call and use tools simply by giving it few shot examples and writing your own parsing logic. I’ve done it many times for clients, both at the prompt and at the fine-tune level.
How would a bunch of weights make a backdoor? The worst it could do is detect it's accessing an actual console and run a logged, visible command that tries to mess with your config or phone home, which is more of a front door with flashing lights saying "here I am!", so why would they bother?
Letting an LLM run arbitrary commands in your main user account seems risky even without worrying about conspiracies.
Just to wear my tin foil hat for fun, it's not that the model would attempt to phone home itself (what would it have to say, anyway?) but that given the opportunity it would go around kicking doors open for later infiltration by an outside party. Subtle bugs being introduced to your Django app, invisible characters that break your ssh configs, that sort of thing.
Yes, deliberately introducing vulnerabilities when generating code is a good one, and could be quite subtle. For running console commands though, anything touching configuration for ssh, gpg, bash aliases, ~/bin, cron, etc., should be immediately obvious.
I was thinking "here's an IP address and ssh key" would be what to phone home with, and that could be encrypted/hidden pretty well, but any network access should be pretty suspicious right away.
There is no backdoor but the model is heavily censored and biased towards China. It refuses to discuss Chinese or North Korean politicians, Tiananmen square, Uyghurs or anything sensitive to China. It's quite positive about Putin - it doesn't mind trashing Western leaders, though. It may write clever code, and I understand that Chinese researchers have to abide by local laws, but it certainly has opinions that are incompatible with mine.
given how Western models have their own biases, it occurs to me that we might be better off with a panel of models playing mock UN to cover everything.
Qwen2-VL is a decent vision model. You can try it out online here: https://huggingface.co/spaces/GanymedeNil/Qwen2-VL-7B - I got great results from it for OCR against handwritten text: https://simonwillison.net/2024/Sep/4/qwen2-vl/
Qwen2.5-Coder-32B is an excellent (I'd say even GPT-4 class) model at generating code which I can run on a 64GB M2 MacBook Pro: https://simonwillison.net/2024/Nov/12/qwen25-coder/
QwQ is the Qwen team's exploration of the o1-style of model that has built in chain-of-thought. It's absolutely fascinating, partly because if you ask it a question in English it will often think in Chinese before spitting out an answer in English. My notes on that one here: https://simonwillison.net/2024/Nov/27/qwq/
Most of the Qwen models are Apache 2 licensed, which makes them more open than many of the other open weights models (Llama etc).
(Unsurprisingly they all get quite stubborn if you ask them about topics like Tiananmen Square)