be careful what you wish for, you are giving up your freedom to movement in the name of security. you might make the argument that you can hail a cab. that's more expensive than owning your own car and with self driving cabs you will lose your privacy when you use them. any movement between 2 points will always be recorded with at least video and as you are moving, someone else other than you can pinpoint your exact location. with your own vehicle, you could unplug your phone and car GPS/tracking device and have some privacy.
> you are giving up your freedom to movement in the name of security
Driving in American cities is the opposite of freedom. The necessity of regulating apes piloting heavy machinery in close proximity to each other and society is a major source of our modern police state.
We do not have a freedom to movement _by motor vehicle_ in the US.
It is a privilege licensed by the State and regularly revoked through due process or expiry.
While your concern about mobility and privacy are valid, I would contend that public safety is what it’s to be weighed against. Some people really are better riders than drivers.
if you are running a 2bit quant, you are not giving up performance but gaining 100% performance since the alternative is usually 0%. Smaller quants are for folks who won't be able to run anything at all, so you run the largest you can run relative to your hardware. I for instance often ran Q3_K_L, I don't think of how much performance I'm giving up, but rather how without Q3, I won't be able to run it at all. With that said, for R1, I did some tests against 2 public interfaces and my local Q3 crushed them. The problem with a lot of model providers is we can never be sure what they are serving up and could take shortcuts to maximize profit.
That's true only in a vacuum. For example, should I run gpt-oss-20b unquantized or gpt-oss-120b quantaized? Some models have a 70b/30b spread, and that's only across a single base model, where many different models exist at different quants could be compared for different tasks.
Definitely. As a hobbyist, I have yet to put together a good heuristic for better-quant-lower-params vs. smaller-quant-high-params. I've mentally been drawing the line at around q4, but now with IQ quants and improvements in the space I'm not so sure anymore.
Yeah, I've kinda quickly thrown in the towel trying to figure out what's 'best' for smaller memory systems. As things are just moving so quickly, whatever time I invest into that is likely to be for nil.
For GPT OSS in particular, OpenAI only released the MoEs in MXFP4 (4bit), so the "unquantized" version is 4bit MoE + 16bit attention - I uploaded "16bit" versions to https://huggingface.co/unsloth/gpt-oss-120b-GGUF, and they use 65.6GB whilst MXFP4 uses 63GB, so it's not that much difference - same with GPT OSS 20B
llama.cpp also unfortunately cannot quantize matrices that are not a multiple of 256 (2880)
Don't listen to this crowd, these are "technical folks". Most of your audience will fail to figure it out. You can provide an option that llama.cpp is missing and give them an option where you auto install it or they can install it themselves and do manual configuration. I personally won't tho.
Who do you think the audience is here if not technical. We are in a discussion about a model that requires over 250gb of ram to run. I don't know a non-technical person with more than 32gb.
I think most of the people like this in the ML world are extreme specialists (e.g.: bioinformaticians, statisticians, linguists, data scientists) who are "technical" in some ways but aren't really "computer people". They're power users in a sense but they're also prone to strange bouts of computing insanity and/or helplessness.
garbage benchmark, inconsistent mix of "agent tools" and models. if you wanted to present a meaningful benchmark, the agent tools will stay the same and then we can really compare the models.
there are plenty of other benchmarks that disagree with these, with that said. from my experience most of these benchmarks are trash. use the model yourself, apply your own set of problems and see how well it fairs.
I also publish my own evals on new models (using coding tasks that I curated myself, without tools, rated by human with rubrics). Would love you to check out and give your thoughts:
zed is just on the hype train, obviously very talented people, they are thinking hard about LLMs but I'm not really sure where they are going, their pivot is probably going to be more interesting...
By your argument, once anything makes it in, then it can't be removed. Billions of people are going to use the web every day and it won't stop. Even the most obscure feature will end up being used by 0.1% of users. Can you name a feature that's supported by all browsers that's not being used by anyone?
Yes. That is exactly how web standards work historically. If something will break 0.1% of the web it isn't done unless there are really really strong reasons to do it anyway. I personally watched lots of things get bounced due to their impact on a very small % of all websites.
This is part of why web standards processes need to be very conservative about what's added to the web, and part of why a small vocal contingent of web people are angry that Google keeps adding all sorts of weird stuff to the platform. Useful weird stuff, but regardless.
3. It seems there are plenty of examples of features being removed above that threshold NPAPI/SPDY/WebSQL/etc.
4. Resources are finite. It’s not a simple matter of who would be impacted. It’s also opportunity cost and people who could be helped as resources are applied to other efforts.
As a general rule of thumb, 0.1% of PageVisits (1 in 1000) is large, while 0.001% is considered small but non-trivial. Anything below about 0.00001% (1 in 10 million) is generally considered trivial. There are around 771 billion web pages viewed in Chrome every month (not counting other Chromium-based browsers). So seriously breaking even 0.0001% still results in someone being frustrated every 3 seconds, and so not to be taken lightly!
--- end quote ---
Read the full doc. They even give examples when they couldn't remove a feature impacting just 0.0000008% of web views.
Mark the accounts as kids account, they will not collect data till the birthday on the account turns 18. Pick any recent month and 2025 and you get 17+yrs of minimal data collection. They won't turn it on for new data sharing options.
reply