Hacker News new | past | comments | ask | show | jobs | submit | raghavbali's comments login

Coool. I remember when the OG pebble launched but I couldn't get one for myself (it wasn't available in my region and my pocket money didn't allow for it either ;) ). Looking forward to this #bitesNailsFuriously


> Unfortunately if you naively quantize all layers to 1.58bit, you will get infinite repetitions in seed 3407: “Colours with dark Colours with dark Colours with dark Colours with dark Colours with dark” or in seed 3408: “Set up the Pygame's Pygame display with a Pygame's Pygame's Pygame's Pygame's Pygame's Pygame's Pygame's Pygame's Pygame's”.

This is really interesting insight (although other works cover this as well). I am particularly amused by the process by which the authors of this blog post arrived at these particular seeds. Good work nonetheless!


Hey! :) Coincidentally the seeds I always use are 3407, 3408 and 3409 :) 3407 because of https://arxiv.org/abs/2109.08203

I also tried not setting the seeds, but the results are still the same - quantizing all layers seems to make the model forget and repeat everything - I put all examples here: https://docs.unsloth.ai/basics/deepseek-r1-dynamic-1.58-bit#...


would be great to have dynamic quants of V3-non-R1 version, as for some tasks it is good enough. Also would be very interesting to see degradation with dynamic quants on small/medium size MoEs, such as older Deepseek models, Mixtrals, IBM tiny Granite MoE. Would be fun if Granite 1b MoE will still be functioning at 1.58bit.


Oh yes multiple people have asked me about this - I'll see what I can do :)


Can't this kind of repetition be dealt with at the ~~decoder~~ (edit: sampler) level, like for any models? (see DRY ~~decoder~~ sampler for instance: https://github.com/oobabooga/text-generation-webui/pull/5677)


Oh yes one could provide a repetition penalty for example - the issue is it's not just repetition that's the issue. I find it rather forgets what it already saw, and so hence it repeats stuff - it's probably best to backtrack, then delete the last few rows in the KV cache.

Another option is to employ min_p = 0.05 to force the model not to generate low prob tokens - it can help especially in the case when the 1.58bit model generates on average 1/8000 tokens or so an "incorrect" token (for eg `score := 0`)


You likely mean sampler, not decoder. And no, the stronger the quantization, the more the output token probabilities diverge from the non-quantized model. With a sampler you can't recover any meaningful accuracy. If you force the sampler to select tokens that won't repeat, you're just trading repetitive gibberish for non-repetitive gibberish.


> You likely mean sampler, not decoder.

Indeed, that's posting before being fully awake.

> And no, the stronger the quantization, the more the output token probabilities diverge from the non-quantized model. With a sampler you can't recover any meaningful accuracy.

OF course you can't recover any accuracy, but LLM are in fact prone to this kind of repetition no matter what, this is a known failure mode that's why samplers aimed at avoiding this have been designed over the past few years.

> If you force the sampler to select tokens that won't repeat, you're just trading repetitive gibberish for non-repetitive gibberish.

But it won't necessary be gibberish! even a highly quantized R1 has still much more embedded information than a 14 or even 32B model, so I don't see why it should output more gibberish than smaller models.


You can deal with this through various sampling methods, but it doesn't actually fix the fried model.


Maybe I missed something but this is a round about way of doing things where an embedding + ML classifier would have done the job. We don't have to use an LLM just because it can be used IMO


Nicely summarised. Another important thing that clearly standsout (not to undermine the efforts and work gone into this) is the fact that more and more we are now seeing larger and more complex building blocks emerging (first it was embedding models then encoder decoder layers and now whole models are being duck-taped for even powerful pipelines). AI/DL ecosystem is growing on a nice trajectory.

Though I wonder if 10 years down the line folks wouldn't even care about underlying model details (no more than a current day web-developer needs to know about network packets).

PS: Not great examples, but I hope you get the idea ;)


why not fix the calculator in a way that avoids/mitigates scenarios where users get to wrong quotes and then do an A/B test? This setup seemingly tilts towards some sort of a dark pattern IMO


Because the results were probably wrong because the inputs were wrong (exagerated by over-cautious users). There is no automated way to avoid that in a calculator; only a conversation with a real person (sales, tech support) will reveal the bad inputs.


I wonder if some of that could have been automated. Have a field to indicate if you are an individual, small business, or large business, and then at least flag fields that seem unusually high (or low, don’t want to provide too-rosy estimates) for that part of the market.


They tried to mitigate :

   But any attempt to address one source of confusion inevitably added another.


Yet to go through in detail but this is really powerful. Initiatives such as these are what we need to further democratize DL. Kudos team


Thank you! We definitely stand by broader adoption of DL :)


the browser wars are heating up again! nice


The browser window chrome[1] wars*

1:(As in the pre Chrome meaning of the word)


This is such an amazing way to teach so many of engineering topics. Time to dust off that mechanix kit!


Thank you for the nohello.net thing, I am usually pretty awkward when it comes to starting conversations and but I guess I never paid attention to why that was the case. The discussion on this thread clears out the impression I had that it is usually rude to directly jumping to the question/task. I got my queue! :)


+1 and also the fact that Google Pixel's "Now Playing" feature is such an amazing application of the same idea. Though I wonder how different are their implementations


Google's now playing feature is somehow always offline (to relieve privacy concerns) and is somehow still incredible at recognizing even obscure songs. Really impressive.

I also love that it just shows up on my lock screen.


Supposedly while building the backend, they realised the actual summary data for a reasonable breadth of tracks (say, anything you'd likely hear on the radio or on a jukebox) was tiny and so, why build a service at all when you can just ship the data to phones ?

Recently for whatever reason I was listening to the twist/ cover "Stacy's Dad" and Now Playing recognised it as the rather more famous original Fountain's of Wayne "Stacy's Mom". So yeah, it doesn't know everything. It also doesn't recognise lots of obscure stuff I own that's like B-sides or special editions that never saw radio play, or bands that my friends were in (but everybody I know has read both Steve Albini's "Some of your friends are probably already this fucked" and the KLF's "The Manual" and so none of them signed a recording contract and thus you've never heard of them) but I've never had a situation where I heard something I liked at like a bar or somewhere and Now Playing didn't know what it is.


yeah packaging the data and updating it async does make a lot more sense. Also, I guess its fine that it doesn't know it all but covers a good percentage of requests


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: