Hacker News new | past | comments | ask | show | jobs | submit login
LevelDB: SSTable and Log-Structured Storage (2012) (igvita.com)
66 points by jxub on Feb 24, 2018 | hide | past | favorite | 18 comments



LevelDB is really cool but I have to admit I still struggle with the variety of datastores where the design of your keys have such a huge influence down the line. I remember using the Node variety (levelup/leveldown) and seeing all kinds of weird patterns when dealing with ranges (using \xff as a marker for the end of the range) and how to access "groups" of related data.

How simple it is to setup and get going is still extremely appealing though.


Agreed. Reaching back into my memory banks. When bigtable first became available for app engine I was fascinated by the prospect of working with Google scale capable database. But there were so many odd hard to reason about vagaries of squeezing that performance out of the database by means of key structure. The developer ergonomics seemed simple at first but in the end were not at all what I expected.


.. is that magic that you want in any other product ? the `simplest` I know are redis commands that show the taken.


To me, it's not about magic, it's about the abstraction. Most implementations of this style of database demand that you consider internals of the engine before storing anything. If you don't design your keys the right way, then you're screwed down the road, and there's no way to fix it without just copying the data to a new table (and losing metadata like TTL). When I use something like Redis, or any SQL databse, I don't need to consider the inner workings of it unless I'm trying to absolutely max out my performance (one slight exception being atomicity of Redis commands).

But with LSM style databases, you are going to have a really bad time if you don't have a team member who has dedicated serious time to understanding the internal workings of LSM itself, and the details of your chosen implementation. That's a real mark against it, IMO.

tldr; LSM databases are like a colander in the world of leaky abstractions.


There is MyRocks, you work like usual mysql. When you want distrubted-db, yeah you need to design. But comparing RocksDB vs LMDB, the second is easier but not by much ?

Think citusdb without consideration you're gonna expect good performance in sharded environment?


> When you want distrubted-db, yeah you need to design

But one shouldn't be designing for implementation details. Usually, we start technologies out with leaky abstractions, and gradually get better at it. A good example is game development, where it used to be that you always used the drawing method of the display and the clock speed of the cpu to your advantage. Nowadays, we've moved past that, because it was working on horrible abstractions, and because the technology underneath improved.

I'm not saying I have a solution, and I agree that this problem rears it's head the most when you start bringing in distributed storage. But my point stands: these databases run on a highly leaky abstraction, and that's a big problem going forward.


Cost of message passing. It's probably cause you don't care `enough` for performance.

See the difference in 1 box of `1 process per core` `scylladb`,`voltdb`,`redis` compared to all other dbs `1-process-for-all-cores`.


Needs a (2012).

IMO, while still cool, LSM trees don't feel particularly novel at this point. It seems like every database and their dog has adopted/made available some kind of LSM tree storage engine, down to traditional relational databases (e.g. MySQL with MyRocks).


To be fair, RocksDB is a fork of LevelDB, so at least from a historical perspective this is fairly related to MyRocks.


Even SQLite had a version 4 based on LSM trees (https://sqlite.org/src4/doc/trunk/www/design.wiki), but it was recently stopped: https://news.ycombinator.com/item?id=15648280


why is this here, from 2012?


SSTables have had a very significant impact on how many other databases have been developed. LevelDB was their first public appearance (I believe).

It's also nice to recognize the work of people who have had a significant impact on the industry. Jeff Dean & Sanjay Ghemawat are an incredible force.


They were described in the Bigtable paper, and were widely used before leveldb was published (Cassandra, for example, used sstables and LSM concerts well before 2012)


I just wanted to point out that there is a good explanation of SSTables B-trees and other database structures in a whoke chapter of the great book Designing Data-Intensive Applications, by Martin Klepmann.


In the middle of this book right now, specifically the unreliable clocks part. The explanations have the right amount of depth. It's a great read in so far.


not to mention, some awesome references at the end of each chapter. I had made it a point after each chapter to randomly pick an interesting reference and read that as well.


I just ordered the book from your recommendation, thanks.


I also discovered it on Hacker News, hope you'll enjoy it!




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: