That makes sense, but how do you efficiently evaluate the Gaussian kernel based approach (“operator-based data structures (OBDS)”)? Presumably you want to do it in a way that keeps a dynamically updating data structure instead of computing a low rank approximation to the kernel etc? In my understanding the upside of the kNN based approaches are fast querying and ability to dynamically insert additional vectors..?
Thank you for the thoughtful comment. Your questions are valid given the title, which I used to make the post more accessible to a general HN audience. To clarify: the core distinction here is not kernelization vs kNN, but field evaluation vs point selection (or selection vs superposition as retrieval semantics). The kernel is just a concrete example.
FAISS implements selection (argmax ⟨q,v⟩), so vectors are discrete atoms and deletion must be structural. The weighted formulation represents a field: vectors act as sources whose influence superposes into a potential. Retrieval evaluates that field (or follows its gradient), not a point identity. In this regime, deletion is algebraic (append -v for cancellation), evaluation is sparse/local, and no index rebuild is required.
> When I ran this program, I expected the `CF_OEMTEXT` string to have the byte 44, but it didn’t. It had the byte 90. We will start unraveling this mystery next time.
Whoa there exists something Raymond Chen didn’t know about Windows core APIs?
This seems to be incorporated into current LLM generations already -- when code execution is enabled both GPT-5.x and Claude 4.x automatically seem to execute Python code to help with reasoning steps.
If you compare the outputs of a CoT input vs a control input, the outputs will have the reasoning step either way for the current generation of models.
That thread is a fun (though frustrating for them!) conversation to read through.
After about a hundred back-and-forths getting the guy with the actual hardware to try different commands, I was thinking to myself man, maybe he should just give him remote access to work on the target PC, this is torture for both of them. And then I see him comment:
> Honestly I'm thinking of this and maybe something insane like organizing ssh access or something to quit torturing Nadim with building and rebooting all the time
And Nadim replies:
> Haha, sorry, but there's no way I'm giving you SSH access!
> I’m fine with continuing with tests!
Which is fair enough! But was funny to see right when I was thinking the same thing. Great perseverance from both of them.
Was slightly disappointing they they moved off GitHub to Discord eventually so after all that, we miss the moment of them actually getting it working!
Man, if I had a nickel every time some old Linux utility ignored a command-line flag I'd have a lot of nickels. I'd have even more nickels if I got one each time some utility parsed command-line flags wrong.
I have automated a lot of things executing other utilities as a subprocess and it's absolutely crazy how many utilities handle CLI flags just seemingly correct, but not really.
This doesn't look like a bug, that is, something overlooked in the logic. This seems like a deliberately introduced regression. Accepting an option and ignoring it is a deliberate action, and not crashing with an error message when an unsupported option is passed must be a deliberate, and wrong, decision.
It certainly doesn't look intentional to me- it looks like at some point someone added "-r" as a valid option, but until this surfaced as a bug, no one actually implemented anything for it (and the logic happens to fall through to using the current date).
It's wrong (and coreutils get it right) but I don't see why it would have to be deliberate. It could easily just not occur to someone that the code needs to be tested with invalid options, or that it needs to handle invalid options by aborting rather than ignoring. (That in turn would depend on the crate they're using for argument parsing, I imagine.)
Could parsing the `-r` be added without noticing it somehow?
If it was added in bulk, with many other still unsupported option names, why does the program not crash loudly if any such option is used?
A fencepost error is a bug. A double-free is a bug. Accepting an unsupported option and silently ignoring it is not, it takes a deliberate and obviously wrong action.
At least from what I can find, here's the original version of the changed snippet [0]:
let date_source = if let Some(date) = matches.value_of(OPT_DATE) {
DateSource::Custom(date.into())
} else if let Some(file) = matches.value_of(OPT_FILE) {
DateSource::File(file.into())
} else {
DateSource::Now
};
And after `-r` support was added (among other changes) [1]:
let date_source = if let Some(date) = matches.get_one::<String>(OPT_DATE) {
DateSource::Human(date.into())
} else if let Some(file) = matches.get_one::<String>(OPT_FILE) {
match file.as_ref() {
"-" => DateSource::Stdin,
_ => DateSource::File(file.into()),
}
} else if let Some(file) = matches.get_one::<String>(OPT_REFERENCE) {
DateSource::FileMtime(file.into())
} else {
DateSource::Now
};
Still the same fallback. Not sure one can discern from just looking at the code (and without knowing more about the context, in my case) whether the choice of fallback was intentional and handling the flag was forgotten about.
> Accepting an unsupported option and silently ignoring it is not, it takes a deliberate and obviously wrong action.
No, it doesn't. For example, you could have code that recognizes that something "is an option", and silently discards anything that isn't on the recognized list.
I would say that Canonical is more at fault in this case.
I'm frankly appalled that an essential feature such as system updates didn't have an automated test that would catch this issue immediately after uutils was integrated.
Nevermind the fact that this entire replacement of coreutils is done purely out of financial and political rather than technical reasons, and that they're willing to treat their users as guinea pigs. Despicable.
What surprises me is that the job seems rushed. Implementation is incomplete. Testing seems patchy. Things are released seemingly in a hurry, as if meeting a particular deadline was more important for the engineers or managers of a particular department than the qualify of the product as a whole.
This feels like a large corporation, in the bad sense.
I’m pretty sure you are supposed to declare agricultural products at customs. Sure, if the apples are cooked into a pie that’s probably fine but I believe most countries don’t let people bring in fresh fruit because of the possibility that some pest (insect, fungus) could be hitching a ride on it.
I believe the point is that in other countries they won’t rifle through your bag to verify whether or not you have brought apples. I’m not familiar with Australian customs though so I could be mistaken.
That the annotation applies to variables and not types is surely an oversight or mistake right? Seems like it could have been easier to initially implement that way but it just doesn’t seem to fit with how C type system works. (Yes it will make declarations uglier to do it on types but that ship has sailed long ago; see cdecl.org)
And how would you type a string vs byte array then? C doesn't even have proper string support yet, ie unicode strings. Most wchar functions don't care at all about unicode rules. Zero-terminated byte buffers are certainly not strings, just garbage.
C will never get proper string support, so you'll never be able to seperate them from zero-terminated byte buffers vs byte-buffers in the type system.
So annotating vars is perfectly fine.
The problem was that the PM and Release manager was completely unaware of the state of the next branch, of its upcoming problems and fixes, and just hacked around in his usual cowboy manner. Entirely unprofessional. A release manager should have been aware of Kees' gcc15 fixes.
But they have not tooling support, no oversight, just endless blurbs on their main mailinglist. No CI for a release candidate? Reminds us of typical cowboys in other places.
But you would still need to change it everywhere, right? Like, instead of changing the annotation everywhere you have to change the type everywhere. Doesn't seem like a huge difference to me.
If the CI system didn't get the Fedora upgrade then it would not have caught it. Aside from that the kernel has a highly configurable build process so getting good coverage is equally complex.
Plus, this is a release candidate, which is noted as being explicitly targeted at developers and enthusiasts. I'm not sure the strength of Kees' objections are well matched to the size of the actual problem.
> That the annotation applies to variables and not types is surely an oversight or mistake right?
I don't think so. It doesn't make sense on the type. Otherwise, what should happen here?
char s[1];
char (__nonstring ns)[1]; // (I guess this would be the syntax?)
s[0] = '1';
ns[0] = '\0';
char* p1 = s; // Should this be legal?
char* p2 = ns; // Should this be legal?
char* __nonstring p3 = s; // Should this be legal?
char* __nonstring p4 = ns; // Should this be legal?
foo(s, ns, p1, p2, p3, p4); // Which ones can foo() assume to be NUL-terminated?
// Which ones can foo() assume to NOT be NUL-terminated??
By putting it in the type you're not just affecting the initialization, you're establishing an invariant throughout the lifetime of the object... which you cannot enforce in any desirable way here. That would be equivalent to laying a minefield throughout your code.
Perhaps unsigned could help here with understanding.
unsigned means, don't use of an integer MSB as sign bit. __nonstring means, the byte array might not be terminated with a NUL byte.
So what happens if you use integers instead of byte arrays? I mean cast away unsigned or add unsigned. Of course these two areas are different, but one could try to design such features that they behave in similar ways where it makes sense.
I am unsure but it seems, if you cast to a different type you lose the conditions of the previous type. And "should this be legal", you can cast away a lot of things and it's legal. That's C.
But whatever because it's not implemented. This all is hypothetical. I understand GCC that they took the easier way. Type strictness is not C's forte.
> Perhaps unsigned could help here with understanding.
No, they're very different situations.
> unsigned means, don't use of an integer MSB as sign bit.
First: unsigned is a keyword. This fact is not insignificant.
But anyway, even assuming they were both keywords or both attributes: "don't use an MSB as a sign bit" makes sense, because the MSB otherwise is used as a sign bit.
> __nonstring means, the byte array might not be terminated with a NUL byte.
The byte array already doesn't have to contain a NUL character to begin with. It just so happens that you usually initialize it somewhere with an initializer that does, but it's already perfectly legal to strip that NUL away later, or to initialize it in a manner that doesn't include a NUL character (say, char a[1] = {'a'}). It doesn't really make sense to change the type to say "we now have a new type with the cool invariant that is... identical to the original type's."
> I understand GCC that they took the easier way. Type strictness is not C's forte.
People would want whatever they do to make sense in C++ too, FWIW. So if they introduce a type incompatibility, they would want it to avoid breaking the world in other languages that enforce them, even if C doesn't.
No actually, that was the point. I was asking, what do you think should happen if you store a NUL when you're claiming you're not. Or if you don't store a NUL, when you claim it's there.
Well, as a human compiler, I said "Hey, you've non-NUL terminated a NUL terminated string". If that was what you intended you should use the type annotation for that, so I think that case worked as intended.
EDIT:
> what do you think should happen if you store a NUL when you're claiming you're not
I don't believe nonstring implies it doesn't end with a NUL, just that it isn't required to.
But char[] already isn't required to be NUL-terminated to begin with. char a[1] = {'a'} is perfectly fine, as is a[0] = '1'. If all you want to do is to document the fact that a type can do exactly what it already can... changing the type to something new doesn't make sense.
Note that "works as intended" isn't sole the criterion for "does it make sense" or "should we do this." You can kill a fly with a cannon too, and it achieves the intended outcome, but that doesn't mean you should.
reply