As a person with a YouTube channel with most Italian content, and only coding content in English: as it is today, it is horrible. But if they fix it to the stop AI standards of voice translation, it will be a game changer. I'm referring to natural voice (matching the original one) lips sync, and there will no longer be language barriers on YouTube.
The chain of thoughts is not where the reasoning capabilities of a model happens: models have reasoning capabilities that are part of the next token inference, what CoT does is searching/sampling the model space of representations and notions in order to "ground" the final reply, putting in the context window in an explicit way all the related knowledge and ideas the model possess about the question.
It is absolutely obvious that algorithmic problems like the Tower of Hanoi can't benefit from sampling. Also, algorithmic problems are domains that are comfortable for the paper authors to have a verifiable domain of puzzles, but are very far from what we want the models to do, and what they are good at. Models would solve this by implementing an algorithm in Python and calling a tool to execute it. This is how they can more easily solve such problems.
Moreover: in most benchmarks CoT improves LLMs performances a lot, because sampling helps immensely to provide a better reply. So this paper negative result is basically against a very vast experience of CoT being a powerful tool for LLMs, simply because most benchmarks operate on domains where sampling is very useful.
In short, the Apple paper mostly says things that were very obvious: it is like if they were trying to reach a negative result. It was a widespread vision that CoT can't help performing algorithmic work by concatenating tokens, if not in the most obvious ways. Yet, it helps a lot when there is to combine existing (inside the model) knowedge/ideas to provide a better reply.
What they're saying is that pattern-matching isn't the path to AGI. Humans and AI can both solve the Tower of Hanoi, but once the number of disks goes up, we both struggle.
Apple's point is that if we want to build something smarter than us, we need to look at intelligence and reasoning from a different angle.
Exploring how to consistently arrive at a negative result is still a valid research goal. I don’t think we’ve had enough of that kind of research regarding LLMs—-everything is so positive that it defies basic statistics…
Please note that Redis 8 supports vector sets as a native type without building any module. And the API is very simple to use. This may simplify the building / bundling process perhaps.
I'm happy ValKey did great work in the area of I/O threading, and we started to incorporate the most interesting changes recently: a big thank you to all the ValKey contributors that did great work.
However, this article is a bit misleading:
>Antirez’s emphasis on a shared nothing architecture has been foundational for Redis. Nevertheless, as early as 2020, Redis added support for I/O threads. Unfortunately, they did not offer drastic improvement until recently. If you have previously tried and discarded I/O threads, it is time to evaluate again!
Please note that is that same Antirez that implemented I/O threading in 2020, exactly because it does not violate the same reasons why I believe in shared nothing.
<technical background>
What I/O threading does, is, when we return from the event loop, to recognized that write(2) / read(2) syscalls are very slow, and in that moment we have zero contention, so what about parallelizing in N threads just that I/O, and returning single threaded immediately after? So I implemented this system, and the ValKey folks did the awesome job of making it better (thanks again).
</technical background>
But it is not true that the system didn't work back then, even if now it was improved, as you can see from the graph in the article itself... I wonder if those posts are payed by somebody or what. There are similar non-sensical posts in the Redis side as well, and they suck likewise, blabling about random things. WTF... What a disservice is this kind of journalism.
Anyway, one reason why these stuff are interesting mostly for Redis the company itself, Amazon, Google, and only marginally for normal Redis users, that is in turn the reason why the feature is not so used, is that if you don't have many users in the same machine or you don't see extremely high loads in specific circumstances, you usually don't need to enable it. Many users of Redis, even big users (I remember the numbers of a few very popular social networks) have their Redis CPU usages low enough to don't bother.
Btw about threads, there are times when they simply fit. If you see my latest work at Redis, the new vector set data type, well, queries are threaded by default, and you can even use VADD (so writing to a vector set) in a threaded way. Why I changed my mind? Because HNSWs are the first data structures with huge constant times, Redis never had something like that, and this changed the design space that was worth considering. So in 2020 I was already positive about threads, in the past I already had implemented the threaded support for moduels operation, and now vector sets are threaded. It is not about being pro or against, it depends.
Not any random ensemble of "Valkey contributors" that did async IO but AWS: One of those cloud providers Redis moved away from FOSS for.
One year later, Valkey hasn't just survived – it's thriving! The Async I/O Threading model contribution from AWS unlocked 3x+ throughput by fundamentally changing how I/O threads work inside Redis.
Bryan Cantrill on this:
... those open source companies that still harbor magical beliefs ... cloud services providers are emphatically not going to license your proprietary software. I mean, you knew that, right?
... The cloud services providers are currently re-proprietarizing all of computing – they are making their own CPUs for crying out loud! – reimplementing the bits of your software that they need in the name of the service that their customers want (and will pay for!) won't even move the needle in terms of their effort.
"Trollope later justified the shift by saying the SSPL license only really "applies to Amazon and Google" – fellow cloud provider Microsoft has agreed commercial terms with Redis."
The latest AGPL shift is great and should have been the default for open source since 2010.
A post like this hitting the HN front page feels like a monthly occurrence. Normally I think of commenting in your support but never post it.
While I agree with your technical points, the constant criticism seems less about the specifics and more rooted in either a tendency to go after the incredibly successful, or classic tall poppy syndrome [0].
While we can't control how others react, reframing these kinds of posts as an indirect acknowledgment of your work's significance might be a healthier approach.
I personally learn more by actually seeing the relative similarity ranking and scores within a dataset, versus trying to visualize all of the nodes on the same graph with a massive dimension simplification.
That 3d visualization is what originally intrigued me though, to see how else I could visualize. :)
Any Feistel Network has the property you stated actually, and this was one of the approaches I was thinking using as I can have the seed as part of the non linear transformation of the Feistel Network. However I'm not sure that this actually decreases the probability of A xor B xor C xor D being accidentally zero, bacause the problem with pointers is that they may change only for a small part. When you using hashing because of avalanche effect this is going a lot harder since you are no longer xoring the pointer structure.
What I mean is that you are right assuming we use a transformation that still while revertible has avalanche effect. Btw in practical terms I doubt there are practical differences.
You can guarantee that the probability is the theoretical minimum with a bijection. I think that would be 2^-N since it's just the case where everything's on a maximum length cycle, but I haven't thought about it hard enough to be completely certain.
A good hash function intentionally won't hit that level, but it should be close enough not to matter with 64 bit pointers. 32 bits is small enough that I'd have concerns at scale.
As the person that wrote such documentation, I respectfully disagree, I understand you point but I want to tell you why I believe the way it is, is better for Redis. Redis is a system that is 99% backward compatible across all versions combinations: Redis 2 was released more than 10 years ago, and this is a very hard to find case where things are not perfectly backward compatible, but still in a case where the Redis 2 semantic was so odd to be avoided in most tasks. Now, in a system like that, to have man pages that tell you the differences among versions is much better than versioned documents where you would require to diff yourself among the different pages. In a single document you know all the behavioral history of a particular command, that often is just: "always as it used to be", plus features that are backward compatible entering newer versions.
I think a changelog on the page is key, as you say. I also think that having versioned docs does not requires us as users to do diffs manually, that's what a changelog/history is for. Ideally, I'd like to have both: all docs are versioned and all docs have "history" sections.
reply