Hacker News new | past | comments | ask | show | jobs | submit | gunnarmorling's comments login

> Luckily [the accelerator servo] is also the only actuator that doesn’t come with major safety implications

This is giving me chills, what if that RC servo gets "stuck" at full throttle? I suppose the assumption is you could hit the clutch, but depending on the specific situation there might not be a lot of time for realizing what's happening and reacting accordingly.


Wasn't that the supposed cause of all those Toyota crashes a few years back? People claimed the accelerator was stuck on full, and pressing the brakes did nothing, but the actual cause was people panicking and slamming on the accelerator instead of the brakes?

That and improperly sized or installed floormats catching the pedals. There are many examples of people hitting the gas instead of brake, freaking out, and hitting it even harder on the IdiotsInCars subreddit. Here's one:

https://www.reddit.com/r/IdiotsInCars/comments/xj8540/they_u...


As discussed in that thread, apparently this mostly happens to “2-foot drivers,” people who use one foot for the gas and one for the brakes. When in a state of panic or surprise, they can accidentally mix up which foot they need to stomp down on and end up slamming down the accelerator instead of the brakes.

I know some countries have very....."easy going" attitude to drivers ed, but surely.....especially if most of your vehicles are automatic.....you must have been told that your left foot never leaves the floor or you're going to seriously hurt yourself, right???

I and everyone I know around my age was taught to only use the right foot for both brake and gas, but apparently it was relatively common in the 1960s or so for drivers (at least of automatic transmissions) to be taught the 2-foot "method". I have nothing to back that up other than anecdotes on the Internet, though.

One outcome of this is that driver floor-mats are no longer free-floating; they clip in to the floor, so they don't shift.


A car's brakes are required to be able to overpower the engine at full throttle.

Concurrent garbage collectors of recent JVM versions (ZGC, Shenandoah) can give you sub-millisecond pause times, i.e. GC pauses are not really an issue any more for typical analytics query workloads. The price to pay is reduced throughput, but a scale-out architecture like Pinot makes it easy to make up for that by adding another node, if needed.


Indeed it's interesting to see aggregation as part of this list. Usually, the split of what can be pushed down to the storage layer and what cannot, is between stateless operations (filtering, projection) and stateful operations (e.g. joins, but also aggregations), as for instance they may require data from multiple storage nodes in a distributed data store.


A few potential reasons for this design coming to mind:

- Resource allocation; you might want to give just specific amount of memory, CPU, network I/O to specific modules of a system, which is not really feasible within a single JVM

- Resource isolation; e.g. a memory leak in one module of the system will affect just that specific JVM instance but not others (similar to why browsers run tabs in multiple processes);

- Upgrades; you can put a new version of one module of the system into place without impacting the others; while the JVM does support this via dynamic classloading (as e.g. used in OSGi or Layrry, https://github.com/moditect/layrry), this becomes complex quickly, you can create classloader leaks, etc.

- Security; You might have (3rd-party) modules you want to keep isolated from the memory, data, config, etc. of other modules; in particular with the removal of the security manager, OS-enforced process isolation is the way to


Also software design. You can split jvm into those that have to follow strict parameters (eg no allocations) and those that follow more traditional Java patterns.


Yeah, I think the HFT guys use CPU pinning a lot: 1 process - 1 CPU, so you'd need multiple processes to take advantage of multicores server.


Usually it is 1 thread - 1 CPU. There might be other reasons (address space separation has its own advantages - and disadvantages) to have distinct processes of course.


Nice one. I wrote about this a while ago, from a slightly different perspective, focusing on data change events [1]. Making a similar differentiation there between id-only events (which you describe as triggers of action; from a data change feed perspective, that action typically would be a re-select of the current state of the represented record), full events (your carriers of data) and patch events (carriers of data with only the subset of attributes whose value has changed).

[1] https://www.decodable.co/blog/taxonomy-of-data-change-events


Thanks for your feedback Gunnar, I appreciate it!

Your categorization makes total sense and fits well with what I called the "spectrum". I only mentioned the "id-only" events to show what the one end of the spectrum would look like. What I call the "trigger" events would be what you call "delta" events. I should have written that more clearly.

Interestingly a few people advocated for id-only events as a response to the article. I have some issues with that pattern.. already thinking about a follow-up article to elaborate on that.


It's not about mutability of events, but about mutating the underlying data itself. If the event only says "customer 123 has been updated", and a consumer of that event goes back to the source of the event to query the full state of that customer 123, it may have been updated again (or even deleted) since the event was emitted. Depending on the use case, this may or may not be a problem. If the consumer is only interested in the current state of the data, this typically is acceptable, but if it is needed in the complete history of changes, it is not.


Making a wacky 2-steps announcement protocol doesn't change the nature of your events.

If the consumer goes to your database and asks "what's the data for customer 123 at event F52A?" it better always get back the same data or "that event doesn't exist, everything you know is wrong".


> ... at event F52A

Sure, if the database supports this sort of temporal query, then you're good with such id-only events. But that's not exactly the default for most databases / data models.


I'm understanding what you have isn't really "events", but some kind of "notifications".

Events are part of a stream that define your data. The stream doesn't have to be complete, but if it doesn't make sense to do things like buffer or edit it, it's probably something else and using that name will mislead people.


> (...) and a consumer of that event goes back to the source of the event to query the full state of that customer 123, it may have been updated again (or even deleted) since the event was emitted.

So the entity was updated. What's the problem?


Thanks for sharing!

I don't think people did exactly that, but most indeed did leverage the fact that values only ranged from -99.9 to 99.9 with exactly one fractional digit and handled them as integers (avoiding FP maths) up until the very end when printing out the results.


It's not that much, but it's also not nothing. The input file of the challenge was very narrow, just a relatively short text column and a numeric value. It still is >13 GB uncompressed; an actual table with a few more columns with 1B rows will easily be beyond 100 GB. And chances are it's not the only table in your database, so it won't fit into RAM of most common database machines. I.e. still a non-trivial dataset in most contexts.


> an actual table with a few more columns with 1B rows will easily be beyond 100 GB

Especially if you need to have supplementary indexes to support common queries on that data.


non-trival yes, but still small enough for a hobbyist. my desktop box has 96GB of ram and I don't feel special about it, nor was it expensive.


Presenter of that talk here, very cool to see it being shared here.

Running 1BRC was immensely fun, I learned a ton from it. Had you told me before how far the community would be able to push this, I'd have not believed you.

One main take-away for me was that you could improve performance by one order of magnitude over the baseline basically by just doing a good job and avoiding basic mistakes. The resulting code is still well readable and maintainable. In most scenarios, this is where you should stop.

If you want to improve by another order of magnitude (like leaders in the challenge did), code becomes completely non-idiomatic, super-dense, and hard to maintain. You should go there only where it really, really matters, like when building a database kernel for instance. Or well, when trying to win a coding challenge ;)

Some more resources for those interested:

* Blog post with the results: https://www.morling.dev/blog/1brc-results-are-in/

* Show & Tell, featuring implementations in languages other than Java: https://github.com/gunnarmorling/1brc/discussions/categories...

* List of many more blog posts discussing 1BRC in different languages: https://github.com/gunnarmorling/1brc?tab=readme-ov-file#1br...

* 3h deep-dive into the implementation techniques by Thomas Würthinger and Roy van Rijn, two of the top participants of the challenge: https://www.youtube.com/watch?v=_w4-BqeeC0k


> One main take-away for me was that you could improve performance by one order of magnitude over the baseline basically by just doing a good job and avoiding basic mistakes. The resulting code is still well readable and maintainable. In most scenarios, this is where you should stop.

I’ve done a lot of performance work but never heard this expressed so clearly before. Thanks - I’m stealing that.

I’ve found exactly the same thing in my own work optimising text CRDTs. Just writing crdt code in a straightforward, correct way, without unnecessary allocations (and ideally using some good data structures) will get you very, very far. But there’s another order of magnitude available to anyone interested in using exotic, task specific data structures.

I suspect the same is true in just about every field of computer science. Write good code in go / c# / Java / JavaScript / etc and 99% of the time, performance will be just fine. Well, so long as you don’t do anything silly like pull in immutablejs. But there’s usually 10-100x more performance available if you really try.

If you want some examples, I highly recommend Algorithms for Modern Hardware, available for free online:

https://en.algorithmica.org/hpc/


This was such a great challenge to follow. I learned a ton about different programming languages, their advantages and disadvantages, their idioms and paradigms.

Overall this was just such a great event! Thanks for organizing it!

(A silent observer)


Amazing! I've just asked few secs ago to share more posts discussing 1BRC in different languages, and here we go. Thank you!


> basically by just doing a good job and avoiding basic mistakes.

honest question: where would one go to learn what are the basic mistakes? any specific resource?


Had you read on, you'd have seen that I am discussing this very point:

> leader election will only ever be eventually correct... So you’ll always need to be prepared to detect and fence off work done by a previous leader.


I'm sorry, I didn't mean to be bashful. I am not familiar with S3 and maybe what you describe is a perfectly safe solution for S3 and certain classes of usage.

I could not get past the point where you promulgate the idea that ZK can be used to implement locks.

Traditionally a 'lock' guarantees mutual exclusion between threads or processes.

"Distributed locks" are not locks at all. They look the same from API perspective, but they have much weaker properties. They cannot be used to guarantee mutual exclusion.

I think any mention of distributed locks / leader election should come with a giant warning: THESE LOCKS ARE NOT AS STRONG AS THE ONES YOU ARE USED TO. Skipping this warning is doing a disservice to your readers.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: