Coool. I remember when the OG pebble launched but I couldn't get one for myself (it wasn't available in my region and my pocket money didn't allow for it either ;) ). Looking forward to this #bitesNailsFuriously
> Unfortunately if you naively quantize all layers to 1.58bit, you will get infinite repetitions in seed 3407: “Colours with dark Colours with dark Colours with dark Colours with dark Colours with dark” or in seed 3408: “Set up the Pygame's Pygame display with a Pygame's Pygame's Pygame's Pygame's Pygame's Pygame's Pygame's Pygame's Pygame's”.
This is really interesting insight (although other works cover this as well). I am particularly amused by the process by which the authors of this blog post arrived at these particular seeds. Good work nonetheless!
would be great to have dynamic quants of V3-non-R1 version, as for some tasks it is good enough. Also would be very interesting to see degradation with dynamic quants on small/medium size MoEs, such as older Deepseek models, Mixtrals, IBM tiny Granite MoE. Would be fun if Granite 1b MoE will still be functioning at 1.58bit.
Oh yes one could provide a repetition penalty for example - the issue is it's not just repetition that's the issue. I find it rather forgets what it already saw, and so hence it repeats stuff - it's probably best to backtrack, then delete the last few rows in the KV cache.
Another option is to employ min_p = 0.05 to force the model not to generate low prob tokens - it can help especially in the case when the 1.58bit model generates on average 1/8000 tokens or so an "incorrect" token (for eg `score := 0`)
You likely mean sampler, not decoder. And no, the stronger the quantization, the more the output token probabilities diverge from the non-quantized model. With a sampler you can't recover any meaningful accuracy. If you force the sampler to select tokens that won't repeat, you're just trading repetitive gibberish for non-repetitive gibberish.
> And no, the stronger the quantization, the more the output token probabilities diverge from the non-quantized model. With a sampler you can't recover any meaningful accuracy.
OF course you can't recover any accuracy, but LLM are in fact prone to this kind of repetition no matter what, this is a known failure mode that's why samplers aimed at avoiding this have been designed over the past few years.
> If you force the sampler to select tokens that won't repeat, you're just trading repetitive gibberish for non-repetitive gibberish.
But it won't necessary be gibberish! even a highly quantized R1 has still much more embedded information than a 14 or even 32B model, so I don't see why it should output more gibberish than smaller models.
Maybe I missed something but this is a round about way of doing things where an embedding + ML classifier would have done the job. We don't have to use an LLM just because it can be used IMO
Nicely summarised. Another important thing that clearly standsout (not to undermine the efforts and work gone into this) is the fact that more and more we are now seeing larger and more complex building blocks emerging (first it was embedding models then encoder decoder layers and now whole models are being duck-taped for even powerful pipelines). AI/DL ecosystem is growing on a nice trajectory.
Though I wonder if 10 years down the line folks wouldn't even care about underlying model details (no more than a current day web-developer needs to know about network packets).
PS: Not great examples, but I hope you get the idea ;)
why not fix the calculator in a way that avoids/mitigates scenarios where users get to wrong quotes and then do an A/B test? This setup seemingly tilts towards some sort of a dark pattern IMO
Because the results were probably wrong because the inputs were wrong (exagerated by over-cautious users). There is no automated way to avoid that in a calculator; only a conversation with a real person (sales, tech support) will reveal the bad inputs.
I wonder if some of that could have been automated. Have a field to indicate if you are an individual, small business, or large business, and then at least flag fields that seem unusually high (or low, don’t want to provide too-rosy estimates) for that part of the market.
Thank you for the nohello.net thing, I am usually pretty awkward when it comes to starting conversations and but I guess I never paid attention to why that was the case. The discussion on this thread clears out the impression I had that it is usually rude to directly jumping to the question/task. I got my queue! :)
+1 and also the fact that Google Pixel's "Now Playing" feature is such an amazing application of the same idea. Though I wonder how different are their implementations
Google's now playing feature is somehow always offline (to relieve privacy concerns) and is somehow still incredible at recognizing even obscure songs. Really impressive.
I also love that it just shows up on my lock screen.
Supposedly while building the backend, they realised the actual summary data for a reasonable breadth of tracks (say, anything you'd likely hear on the radio or on a jukebox) was tiny and so, why build a service at all when you can just ship the data to phones ?
Recently for whatever reason I was listening to the twist/ cover "Stacy's Dad" and Now Playing recognised it as the rather more famous original Fountain's of Wayne "Stacy's Mom". So yeah, it doesn't know everything. It also doesn't recognise lots of obscure stuff I own that's like B-sides or special editions that never saw radio play, or bands that my friends were in (but everybody I know has read both Steve Albini's "Some of your friends are probably already this fucked" and the KLF's "The Manual" and so none of them signed a recording contract and thus you've never heard of them) but I've never had a situation where I heard something I liked at like a bar or somewhere and Now Playing didn't know what it is.
yeah packaging the data and updating it async does make a lot more sense. Also, I guess its fine that it doesn't know it all but covers a good percentage of requests