Hacker Newsnew | past | comments | ask | show | jobs | submit | jey's commentslogin

That makes sense, but how do you efficiently evaluate the Gaussian kernel based approach (“operator-based data structures (OBDS)”)? Presumably you want to do it in a way that keeps a dynamically updating data structure instead of computing a low rank approximation to the kernel etc? In my understanding the upside of the kNN based approaches are fast querying and ability to dynamically insert additional vectors..?

Thank you for the thoughtful comment. Your questions are valid given the title, which I used to make the post more accessible to a general HN audience. To clarify: the core distinction here is not kernelization vs kNN, but field evaluation vs point selection (or selection vs superposition as retrieval semantics). The kernel is just a concrete example.

FAISS implements selection (argmax ⟨q,v⟩), so vectors are discrete atoms and deletion must be structural. The weighted formulation represents a field: vectors act as sources whose influence superposes into a potential. Retrieval evaluates that field (or follows its gradient), not a point identity. In this regime, deletion is algebraic (append -v for cancellation), evaluation is sparse/local, and no index rebuild is required.

The paper goes into this in more detail.


> When I ran this program, I expected the `CF_OEM­TEXT` string to have the byte 44, but it didn’t. It had the byte 90. We will start unraveling this mystery next time.

Whoa there exists something Raymond Chen didn’t know about Windows core APIs?


This seems to be incorporated into current LLM generations already -- when code execution is enabled both GPT-5.x and Claude 4.x automatically seem to execute Python code to help with reasoning steps.



I remember seeing that GPT-5 had two python tools defined in its leaked prompt one them would hide the output from user visible chain of thought UI.


Same with CoT prompting.

If you compare the outputs of a CoT input vs a control input, the outputs will have the reasoning step either way for the current generation of models.


Yeah, this is honestly one of the coolest developments of new models.


And the person who did the implementation, Lyapsus, did it without access to the hardware?? https://github.com/nadimkobeissi/16iax10h-linux-sound-saga/i...


That thread is a fun (though frustrating for them!) conversation to read through.

After about a hundred back-and-forths getting the guy with the actual hardware to try different commands, I was thinking to myself man, maybe he should just give him remote access to work on the target PC, this is torture for both of them. And then I see him comment:

> Honestly I'm thinking of this and maybe something insane like organizing ssh access or something to quit torturing Nadim with building and rebooting all the time

And Nadim replies:

> Haha, sorry, but there's no way I'm giving you SSH access!

> I’m fine with continuing with tests!

Which is fair enough! But was funny to see right when I was thinking the same thing. Great perseverance from both of them.

Was slightly disappointing they they moved off GitHub to Discord eventually so after all that, we miss the moment of them actually getting it working!


I just read it too and now I think I know what the suspense novels robots write for each other will be like.


I also enjoyed reading through it, but wish I'd seen this comment first and avoided missing out on the moment of success too. :)


Anyone have a link to the patch in uutils? Curious to see that the problem and solution were.


This comment[0] explains it.

The core bug seems to be that support for `date -r <file>` wasn't implemented at the time ubuntu integrated it [1, 2].

And the command silently accepted -r before and did nothing (!)

0: https://lwn.net/Articles/1043123/

1: https://github.com/uutils/coreutils/issues/8621

2: https://github.com/uutils/coreutils/pull/8630


Man, if I had a nickel every time some old Linux utility ignored a command-line flag I'd have a lot of nickels. I'd have even more nickels if I got one each time some utility parsed command-line flags wrong.

I have automated a lot of things executing other utilities as a subprocess and it's absolutely crazy how many utilities handle CLI flags just seemingly correct, but not really.


This doesn't look like a bug, that is, something overlooked in the logic. This seems like a deliberately introduced regression. Accepting an option and ignoring it is a deliberate action, and not crashing with an error message when an unsupported option is passed must be a deliberate, and wrong, decision.


It certainly doesn't look intentional to me- it looks like at some point someone added "-r" as a valid option, but until this surfaced as a bug, no one actually implemented anything for it (and the logic happens to fall through to using the current date).


a `todo!()` away from something being way more obvious. Unfortunate!


It's wrong (and coreutils get it right) but I don't see why it would have to be deliberate. It could easily just not occur to someone that the code needs to be tested with invalid options, or that it needs to handle invalid options by aborting rather than ignoring. (That in turn would depend on the crate they're using for argument parsing, I imagine.)


Could parsing the `-r` be added without noticing it somehow?

If it was added in bulk, with many other still unsupported option names, why does the program not crash loudly if any such option is used?

A fencepost error is a bug. A double-free is a bug. Accepting an unsupported option and silently ignoring it is not, it takes a deliberate and obviously wrong action.


At least from what I can find, here's the original version of the changed snippet [0]:

    let date_source = if let Some(date) = matches.value_of(OPT_DATE) {
        DateSource::Custom(date.into())
    } else if let Some(file) = matches.value_of(OPT_FILE) {
        DateSource::File(file.into())
    } else {
        DateSource::Now
    };
And after `-r` support was added (among other changes) [1]:

    let date_source = if let Some(date) = matches.get_one::<String>(OPT_DATE) {
        DateSource::Human(date.into())
    } else if let Some(file) = matches.get_one::<String>(OPT_FILE) {
        match file.as_ref() {
            "-" => DateSource::Stdin,
            _ => DateSource::File(file.into()),
        }
    } else if let Some(file) = matches.get_one::<String>(OPT_REFERENCE) {
        DateSource::FileMtime(file.into())
    } else {
        DateSource::Now
    };
Still the same fallback. Not sure one can discern from just looking at the code (and without knowing more about the context, in my case) whether the choice of fallback was intentional and handling the flag was forgotten about.

[0]: https://github.com/yuankunzhang/coreutils/commit/850bd9c32d9...

[1]: https://github.com/yuankunzhang/coreutils/blob/88a7fa7adfa04...


> Accepting an unsupported option and silently ignoring it is not, it takes a deliberate and obviously wrong action.

No, it doesn't. For example, you could have code that recognizes that something "is an option", and silently discards anything that isn't on the recognized list.


> silently discards anything that isn't on the recognized list.

That's a deliberate action.


I would say that Canonical is more at fault in this case.

I'm frankly appalled that an essential feature such as system updates didn't have an automated test that would catch this issue immediately after uutils was integrated.

Nevermind the fact that this entire replacement of coreutils is done purely out of financial and political rather than technical reasons, and that they're willing to treat their users as guinea pigs. Despicable.


What surprises me is that the job seems rushed. Implementation is incomplete. Testing seems patchy. Things are released seemingly in a hurry, as if meeting a particular deadline was more important for the engineers or managers of a particular department than the qualify of the product as a whole.

This feels like a large corporation, in the bad sense.


> deliberately introduced regression

> deliberate and wrong decision

Yeah... I hope "we" will not switch to it just because it is written in Rust. There is much more than just the damn language behind it.



It would be really nice if something said what the actual problem was.

The last commit[0] is a fix for date parsing to bring it in line with the GNU semantics, which seems like a pretty good candidate.

Edit: Or not, see evil-olive's comment[1] for a more likely candidate.

0: https://github.com/uutils/coreutils/commit/0047c7e66ffb57971...

1: https://news.ycombinator.com/item?id=45687743


The problem is the existence of the project of Rust rewrite itself.


Get two subscriptions if it's delivering that much value and you hit the limits?


Greenland is an “autonomous territory” of Denmark


1) That's avoiding to answer my question.

2) And it's still false.


I think it’s feasible because of their token prefix prompt caching, available to everyone via API: https://docs.anthropic.com/en/docs/build-with-claude/prompt-...


I’m pretty sure you are supposed to declare agricultural products at customs. Sure, if the apples are cooked into a pie that’s probably fine but I believe most countries don’t let people bring in fresh fruit because of the possibility that some pest (insect, fungus) could be hitching a ride on it.


I believe the point is that in other countries they won’t rifle through your bag to verify whether or not you have brought apples. I’m not familiar with Australian customs though so I could be mistaken.


The US and Canada will both do that. It's at the whim of the border guard though.


And it's taken pretty seriously. Enough to have dogs trained to smell out agricultural products in luggage.


That the annotation applies to variables and not types is surely an oversight or mistake right? Seems like it could have been easier to initially implement that way but it just doesn’t seem to fit with how C type system works. (Yes it will make declarations uglier to do it on types but that ship has sailed long ago; see cdecl.org)


And how would you type a string vs byte array then? C doesn't even have proper string support yet, ie unicode strings. Most wchar functions don't care at all about unicode rules. Zero-terminated byte buffers are certainly not strings, just garbage.

C will never get proper string support, so you'll never be able to seperate them from zero-terminated byte buffers vs byte-buffers in the type system.

So annotating vars is perfectly fine.

The problem was that the PM and Release manager was completely unaware of the state of the next branch, of its upcoming problems and fixes, and just hacked around in his usual cowboy manner. Entirely unprofessional. A release manager should have been aware of Kees' gcc15 fixes.

But they have not tooling support, no oversight, just endless blurbs on their main mailinglist. No CI for a release candidate? Reminds us of typical cowboys in other places.


I think the idea is simply to

  typedef __nostring__ char* bytes;
And then use that type instead of annotating every single variable declaration.


But you would still need to change it everywhere, right? Like, instead of changing the annotation everywhere you have to change the type everywhere. Doesn't seem like a huge difference to me.


There is a difference if the type is used inside a structure.


Fair, good point.


> No CI for a release candidate?

If the CI system didn't get the Fedora upgrade then it would not have caught it. Aside from that the kernel has a highly configurable build process so getting good coverage is equally complex.

Plus, this is a release candidate, which is noted as being explicitly targeted at developers and enthusiasts. I'm not sure the strength of Kees' objections are well matched to the size of the actual problem.


But Linus broke the kernel for gcc<15, a CI would have surely caught it.

And Linus is usually much more critical in what gets into master when it comes to other people's contribution, let alone into an RC.


> That the annotation applies to variables and not types is surely an oversight or mistake right?

I don't think so. It doesn't make sense on the type. Otherwise, what should happen here?

  char s[1];
  char (__nonstring ns)[1];  // (I guess this would be the syntax?)
  s[0] = '1';
  ns[0] = '\0';
  char* p1 = s;  // Should this be legal?
  char* p2 = ns;  // Should this be legal?
  char* __nonstring p3 = s;  // Should this be legal?
  char* __nonstring p4 = ns;  // Should this be legal?

  foo(s, ns, p1, p2, p3, p4);  // Which ones can foo() assume to be NUL-terminated?
                               // Which ones can foo() assume to NOT be NUL-terminated??
By putting it in the type you're not just affecting the initialization, you're establishing an invariant throughout the lifetime of the object... which you cannot enforce in any desirable way here. That would be equivalent to laying a minefield throughout your code.


Perhaps unsigned could help here with understanding.

unsigned means, don't use of an integer MSB as sign bit. __nonstring means, the byte array might not be terminated with a NUL byte.

So what happens if you use integers instead of byte arrays? I mean cast away unsigned or add unsigned. Of course these two areas are different, but one could try to design such features that they behave in similar ways where it makes sense.

I am unsure but it seems, if you cast to a different type you lose the conditions of the previous type. And "should this be legal", you can cast away a lot of things and it's legal. That's C.

But whatever because it's not implemented. This all is hypothetical. I understand GCC that they took the easier way. Type strictness is not C's forte.


> Perhaps unsigned could help here with understanding.

No, they're very different situations.

> unsigned means, don't use of an integer MSB as sign bit.

First: unsigned is a keyword. This fact is not insignificant.

But anyway, even assuming they were both keywords or both attributes: "don't use an MSB as a sign bit" makes sense, because the MSB otherwise is used as a sign bit.

> __nonstring means, the byte array might not be terminated with a NUL byte.

The byte array already doesn't have to contain a NUL character to begin with. It just so happens that you usually initialize it somewhere with an initializer that does, but it's already perfectly legal to strip that NUL away later, or to initialize it in a manner that doesn't include a NUL character (say, char a[1] = {'a'}). It doesn't really make sense to change the type to say "we now have a new type with the cool invariant that is... identical to the original type's."

> I understand GCC that they took the easier way. Type strictness is not C's forte.

People would want whatever they do to make sense in C++ too, FWIW. So if they introduce a type incompatibility, they would want it to avoid breaking the world in other languages that enforce them, even if C doesn't.


Do you mean s & ns to be swapped? ns starts with a NUL terminator and s does not.


No actually, that was the point. I was asking, what do you think should happen if you store a NUL when you're claiming you're not. Or if you don't store a NUL, when you claim it's there.


Well, as a human compiler, I said "Hey, you've non-NUL terminated a NUL terminated string". If that was what you intended you should use the type annotation for that, so I think that case worked as intended.

EDIT: > what do you think should happen if you store a NUL when you're claiming you're not

I don't believe nonstring implies it doesn't end with a NUL, just that it isn't required to.


But char[] already isn't required to be NUL-terminated to begin with. char a[1] = {'a'} is perfectly fine, as is a[0] = '1'. If all you want to do is to document the fact that a type can do exactly what it already can... changing the type to something new doesn't make sense.

Note that "works as intended" isn't sole the criterion for "does it make sense" or "should we do this." You can kill a fly with a cannon too, and it achieves the intended outcome, but that doesn't mean you should.


Is ns NUL terminated, or is it an array of chars that happens to end with NUL?


If ns is __nonstring, it could be the latter. Without it, it should be the former and warn if it's not. That's not ambiguous.


I either don't understand how the annotation would work on types, or what would be gained by it. What type would be annotated? A typedef to char[]?

edit: Unless what they actually mean is annotating struct members, that would actually make sense.


I do understand.

I imagine that it could work a little bit like unsigned: a modifier to integer types that tells that an integer's MSB is not to be used as a sign bit.

__nonstring__ tells that the last byte of a byte sequence doesn't need to be NUL.

I would find it sensible allowing putting the attribute to a type, but whatever.


But that doesn't make any difference in the way you have to address existing `char arr[4] = "abcd"` declarations.


True.

This would be only useful in typedefs. An API could declare some byte arrays not strings. But again, whatever.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: