Hacker News new | past | comments | ask | show | jobs | submit login

You're not wrong. Whoever downvoted you is pushing an agenda and I'm not happy about it.

Reads per second is ultimately a measure of how many operations can be retired per second, not how long the operation takes. Anyone who has spent fifteen minutes learning about capacity planning should know that. 5 reads per second doesn't mean each read takes 200ms. It might mean 1 second per read spread across 5 threads of execution, or 3 seconds per read across 15 workers.




The parent poster (and many others in this thread) are assuming that performance under this deliberately pathological test is reflective of performance in the real world.

Optimistic locking systems inherently perform poorly under contention. But they also perform better than pessimistic concurrency systems overall because in the real world we design applications to avoid contention.

As an example, the Google App Engine datastore runs zillions of QPS across petabytes of data in a massive distributed cluster. But if you build an app that does nothing but mutate a single piece of state over and over, you'll top out at a couple transactions per second. This is painful if you're trying to build a simple counter, but with minimal care you can build a system that scales to any sized dataset and traffic volume.


So the benchmark is only informative from the standpoint of determining which records should be split to avoid concurrent writes.

For instance you wouldn't expect a single user to make 5 comments or upvotes per second so storing data about recent activity with the user isn't a bottleneck that you need to design for. Storing data about responses with an item might also be okay as long as you don't plan to be HN or Reddit (Github, for example, would be just fine). But if you want to track activity globally (eg, managing the watch list notifications in github), you will need to design around that number.


Aphyr's test is not a benchmark. It is a test of database correctness under a carefully constructed set of pathological circumstances. It cannot be used to infer real-world performance behavior.

Yes, the general advice for users of the GAE datastore is to build Entity Groups around the data for a single user. That isn't absolute though, and it doesn't cause problems for watch lists; the Watch can be part of the User's EG rather than the Issue's EG. Or it can be its own EG. In practice this doesn't require as much consideration as you probably imagine.


Regardless of locking strategy that throughput is abysmal. None would complain about high latency.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: