Hacker News new | past | comments | ask | show | jobs | submit login
Jepsen: MongoDB 3.6.4 (jepsen.io)
231 points by aphyr on Oct 23, 2018 | hide | past | favorite | 70 comments



> This interpretation hinges on interpreting successful sub-majority writes as not necessarily successful: rather, a successful response is merely a suggestion that the write has probably occurred, or might later occur, or perhaps will occur, be visible to some clients, then un-occur, or perhaps nothing will happen whatsoever.

> We note that this remains MongoDB’s default level of write safety.

This sounds pretty scary. How does it compare to other distributed dbs, like Riak? My understanding is that Riak lets you specify how many nodes a write must succeed on to be considered successful. Are its responses more reliable? Is this just a "distributed computing is hard" situation?


> Is this just a "distributed computing is hard" situation?

I think this is "gaming performance benchmarks and being right is hard".

MongoDB has spent a lot of effort on making sure it looks good in benchmarks, and that includes using worse defaults than any other distributed data-store I know of.

As long as MongoDB continues to wish to "win" at performance benchmarks by so much, they'll have trouble being able to provide a correct distributed system by default.


This isn't about distribution, but someone once wrote about getting PostgreSQL upsert performance to be better than MongoDB's by disabling some of PostgreSQL's safety features.

https://markandruth.co.uk/2016/01/08/how-we-tweaked-postgres...

This made me laugh. Snarky interpretation: "yeah, we can go fast and sloppy, too. We just usually don't."


And this is why I appreciate the analysis by aphyr (and kit!) so much... it really shows what's actually happening.

Just sad that it takes so long to do this depth of analysis...


It's not just distributed computing and being right is hard.

Most consumers of computing are happier with value and computers being mostly right/useful/evailable.

Evidence: the entire history of computing.


Basho went bankrupt over a year ago. Riak is still around, but I encourage checking out the level of support for it. Redis, Cassandra, Couch, Influx might be good alternatives depending on your use case.


It's the same for every DB, like you can say MySQL to ack a write but there was no fsync(). All those settings are usually tunable for various scenarios / performance you want.


It is, in fact, not the same for every DB. Databases vary significantly in their default and maximum guarantees for successfully acknowledged writes; Mongo chooses fairly weak ones. By contrast, consider systems like VoltDB, which aims for strict serializability by default, or etcd, which offers linearizability/sequential reads by default.


> etcd, which offers linearizability/sequential reads by default

Not quite true - you have to opt-in to quorum reads in etcd. Otherwise it just returns the first read (which used to be the default in k8s and caused a lot of grief).


Ah, yes, sorry! I forgot exactly how that all shook out. I thiiink that prevents dirty reads, and with client monotonicity you can recover sequential consistency? Not sure how many clients actually do that; maybe they just do regular reads at RC...


That is true. etcd v3 provides l-read by default. You have to opt-out that to improve latency.


No. You are wrong. etcdv3 enables l-read by default.


The MySQL default has always been thee safe, not the fast, option.

https://dev.mysql.com/doc/refman/5.7/en/innodb-parameters.ht...


Correct me if I'm wrong, but an ack-without-fsync will not "unapply" a write while keeping things happening after it intact? It also won't "unapply" those writes without something catastrophic happening, e.g. dead disk or power outage. This isn't a general failure mode in a distributed system. MySQL with synchronous replication won't ack until all/majority of nodes ack the write.

Also, just because you fsync doesn't mean your data will hit the disk. It's a fairly large problem that has nothing to do with the consistency model of the database replication.


Abruptly terminating a VM is quite common in cloud services, can "unapply" a write, because it's equivalent to a power outage. The data lives in the VM guest's cache until the fsync sends it to the host.

With VM termination in the cloud being not exactly uncommon, I would say that's a general enough failure mode in a distributed system that it's an unacceptable risk to run MongoDB in its default configuration (or any database in the equivalent configuration).


> Thus far, causal consistency has generally been limited to research projects ... MongoDB is one of the first commercial databases we know of which provides an implementation.

Cosmos DB provides session consistency (looks like an another name for causal consistency) at least since 2014 [1].

Cosmos DB's session guarantees [2]: consistent prefix, monotonic reads, monotonic writes, read-your-writes, write-follows-reads.

Mongo DB's causal consistency guarantees [3]: monotonic reads, monotonic writes, read-your-writes, write-follows-reads.

Doubt that four years later still qualities as one of the first.

[1] https://www.infoq.com/news/2014/08/microsoft-azure-documentd...

[2] https://docs.microsoft.com/en-us/azure/cosmos-db/consistency...

[3] https://docs.mongodb.com/manual/core/read-isolation-consiste...


Causal and session are definitely similar, but I'm not entirely sure if causal implies consistent prefix, and conversely, I think causal miiight have stronger implications than just the intersection of MR, MW, RYW, and WFR. Because we weren't entirely certain whether we could make that claim regarding Cosmos, we opted to be conservative.


I agree it's hard for me too to be precise about naming in academic sense. But this published paper "Writes: the dirty secret of causal consistency" says that both Cosmos DB and MongoDB have causal consistency so I don't know.. At least Cosmos DB and MongoDB provide the same guarantees for session/causal.


.. but the post didn't say it was first. Not even the part you quoted.


Well, the quote you reference says, "one of the first," doesn't it?


"we know of"


Was Cosmos DB tested by Jepsen?


It's off-topic. But yes, Cosmos DB has rigorous tests[1] including Jepsen (a tool).

[1] https://twitter.com/dharmashukla/status/869104163510034432


To clarify: no, Jepsen, as an organization, has not worked with CosmosDB.

I'm delighted they have rigorous tests, and I'm glad our tool has been helpful for them! We just can't say anything about those tests, because we haven't looked yet. Maybe someday!


I understand them having developed TLA+ to guide their design and test consistency primitives.


I thought "Jepsen.io" was just Kyle Kingsbury. Interesting that there's new author for this analysis. (Also might explain the lack of memes, which I always liked.)


Hi, new author here. :) Kyle was clear that I could write it however I wanted to, but I opted for the more formal tone used in recent analyses.


Nice job!


Yeah! Kit's done a bunch of work on Jepsen's core and Knossos, the linearizability checker. This is her first analysis, and I'm excited to see more. :)


These analyses are a great way to get educated about distributed systems, even if you aren't in a position to evaluate choosing one. Thanks for growing and creating Jepsen!


Kyle got so well known for Jepsen that it became a full time thing for him. He got such a backlog that he started bringing in people to help him do the testing.


I think Kyle recently said that he's been making analyses a bit more formal, so even when he writes analyses, there's less of what we're used to.


Completely unrelated to the core of the article, does someone know which program was used for the sketches/diagrams?


You might also enjoy this time lapse of the network topology graph.

https://www.youtube.com/watch?v=SQ8bbuqTVEw


I used procreate on an iPad pro.


Thank you for both comments! I'll look into that app.


This is off topic, and I might get downvoted, but I realized I am teed up waiting for the "mongoDB hate" comments to role in... seems to be not a lot of love on HN for MongoDB.

I wonder what positive use cases people have used Mongo for? I've used it for a few small/medium sized projects without problem myself.


The dislike for mongodb usually came from people who were responsible for maintaining medium-large scale deployments of its earlier versions.

IMO it all boils down to startup culture and growth; that these goals are incompatible with building a database system responsibly. It broke trust with a lot of people and never really regained it back. Not that it mattered much in the long run.

The lesson is that you can get your hands dirty while growing. If you don't grow you're dead anyway. If you do grow then you can throw money at the trust problem until it's fixed.

EDIT: Unless you're in the medical industry :P


MongoDB was a subpar product that rode the "nosql" document-store hype, but people following hype don't know about technical quality and this lead to it being used in a lot of places where it shouldn't, with marketing plastering over all the downsides. They've slowly made it better and now it's a rather smooth experience but overturning a poor reputation is extremely hard.

Meanwhile other database systems have further developed and the need for a pure document-store with weak aggregation and not much else just isn't very enticing. It's still good for data where schema is on-read or in your application, and needs completely flexibility like document and media management, low-volume logging, user profiles and sessions, etc.


Compare this Jepsen analysis with that of MongoDB 2.4.3 from 2013[1] and you'll see that MongoDB has come a long, long way in the last few years. Note that the linked report on 3.6.4 was actually funded by MongoDB. I think a lot of the hate in the past was caused by their marketing writing checks that the technology couldn't cash. I think that has changed somewhat, but I'm not sure whether popular opinion has come around.

1: https://aphyr.com/posts/284-call-me-maybe-mongodb


It's actually a completely different database between version 2 and 3.

They acquired WiredTiger in 2014 and MongoDB has never looked back since. Sure there is a lot of hate on HN but they are doing very well out in the real world. And actually it's a pretty good database if your domain model fits.


How many versions existed prior to 2.4.3? That's a lot of versions that don't work as advertised.


I agree. Their marketing was way ahead of their technology and that caught up to them. That being said, they’ve been rapidly improving the tech and it’s looking like a really solid option now.


That's still in the limited uses that not having proper SQL semantics allows.

I've had to deal with a key-valued-json database on another technology. Lots of different json shapes, even when they are supposed to represent the same concept. Every new feature bringing its own tweak.

Low performance, as ad hoc materialized scans + hash joins had to have been developed in the app. I hated every second of it.

In the end, I came away with the following exhortation for people tempted to eat the easiness of MongoDB-likes:

The world is not a tree. It is a graph. Each entity in the database exists in real life, and interacts with others freely each in a specific manner. This is why they cannot be represented as trees (which is what json is). You need a graph, types, constraint checks, and acid semantics; and boring postgres tables + SQL will provide that.

Sure trees are a great way to represent and transport a piece of state. It is self-contained and succinct. This is why I'm a big advocate of GraphQL on top of an RDBMS. Out of the graph of properly stored entities, you can extract the tree that is relevant for the app view you're working with.


> The world is not a tree. It is a graph.

What's your opinion of ember and its sideloading semantics for JSON? I found it interesting... until I had to write my own REST endpoints. I found that none of the database tools I had available were up to doing it. Having to write all those sideloads by hand ended up being too tedious for a non-paying gig and I dropped the experiment.


MongoDB inc itself published bullshit benchmarks claiming they were faster than the competition, omitting the fact that DB writes could fail without returning an error to the client. It seriously damaged their reputation. It also advertised as a replacement for RDBMS which it never was. You'll notice that no other NoSQL store has these reputation issues, no matter how bad they are, it's solely the fault of MongoDB inc and their questionable marketing.


I will personally never use MongoDB because when they first started their philosophy was speed over durability, which is a bad look for a database.

Even though they have apparently fixed those problems, it will take them a long time to win that trust back.

Also so far Elasticsearch seems to solve all my document storage problems.


As far as I can remember the Elasticsearch test that Jespen made years ago showed at by that time it was a very, very bad db. It was good for indexing content but not for storage.


> Also so far Elasticsearch seems to solve all my document storage problems.

Do you miss joins?


Not really, but that's because I've been using "nosql" so long that I've gotten used to writing joins in my apps.

But that is a fair point, things would be easier with joins. Usually when I need joins I use Postgres though.


Speaking as a sideliner:

MongoDB seems to have experienced a rather exaggerated version of the hype cycle. But it also seems like, at this point, the technology is well into the "slope of enlightenment" phase of its hype cycle, and may even have reached the "plateau of productivity". A lot of that is fueled by MongoDB's own efforts - they took the complaints about reliability quite seriously. Case in point: This series of Jepsen tests that they've been funding.


Hey this is the author. Yes, it cannot be understated that MongoDB solicited work directly to check their sharding system and their new CC feature and were helpful towards getting the work done.


Mongo works well as a straight-up JSON store for store-and-retrieve use cases (with no analytics). It is horizontally scalable, avoids the overhead of relational databases, has indexing capabilities, and provides a strong consistency model. The big improvement came with WiredTiger, which addressed many of the issues that plagued earlier versions of Mongo.

I've seen high-speed machine data stored in Mongo for logging and visualization purposes. It's an improvement over writing csv files to disk.

However, if you ever need to perform non-trivial analytics, Mongo's weaknesses quickly become obvious. For machine learning, typically you would want to first ETL the data into a dataframe-like structure (which is a structure native to SQL databases).


  It's an improvement over writing csv files to disk.
This is not a high bar.


Actually MongoDB is quite popular in the analytics space. It has a unique trick with Spark/Hadoop where its data gets represented as a single wide table. This allows you to use it as an analytical/ML feature store which is not possible with anything other than Cassandra.

Also not sure where you get the idea dataframes are unique to SQL databases because that's completely wrong. HBase and Cassandra were even the original big data databases and they aren't relational. And Spark can manifest almost any database as a dataframe.


> not sure where you get the idea dataframes are unique to SQL databases

I'm not sure I said this.

> HBase and Cassandra were even the original big data databases and they aren't relational.

They also had trouble doing joins and many other query operations which are common in analytics. Presto addresses this somewhat.

> And Spark can manifest almost any database as a dataframe.

Which entails a translation layer from whatever non-tabular form that data was (e.g. JSON) into a dataframe-like structure, rather keeping it in its native form, which reinforces my point. You still need to somehow transform data into tabular form. (ETL is just a batch way of doing this transformation; you can have live transformations of course, with accompanying overheads)

BI tools also generally require data to be in tabular form, which entails the use of a translation layer. The Mongo BI connector is one such translator.

> Actually MongoDB is quite popular in the analytics space.

I work in this space, interact regularly with vendors, and monitor the space actively for strategic developments. This does not track with my observations.


I've had lots of fun using Mongo with Spark, I have found it pleasant, though most Spark libraries are good to use, including the JDBC ones. I haven't had much luck with the kind of viz where you don't first do ETL from Mongo to some tabular format.


But why not memcached or Redis? You are describing a situation that those are designed for, instead of something that is pretended for years to be what it never was.


> you are describing a situation that those are designed for

I'm curious: how so?

memcached and Redis are commonly deployed for caching, but not for persistence (Redis has optional persistence). That is not to say you can't use them, but I'm not sure what makes them a necessarily better choice than Mongo in this situation (persisted high-speed machine data).

Edit: OK, I see you mean the "for store-and-retrieve use cases" part. You have a point, though Mongo seems to be ok for that use case too.


My life has been saner when we use the KV store for search or fast lookup and not as the system of record (for that we use a traditional database).

It tends not to occur to me that durability is a feature people are looking for in nosql, but if you’re trying to avoid having three to five copies (SQL, nosql, search, reporting, ?). of your data I could understand. But as the team gets bigger it gets harder to maintain that, figuratively and literally.


Do you ever make fun of people for getting fooled twice?

The haters that have lasted this long have been fooled once and very painfully. You don't forgive someone technical for straight up lying to you and your peers. For years. That's vendetta territory.


I have used it for a monitoring solution that we developed in-house and rented to outside customers. In retrospect we could have used PostgreSQL too, and would have easier time. Hacing someone enforce schema on us wouldn't hurt either. But apart from a few gotchas (like not releasing storage on deletes - this should be fixed now) it worked quite well, so I am not in the hate camp (as opposed to HBase). I did however learn to appreciate strictness of SQL and would need very strong reasons to pick anything else than relational DB for future projects.


db.webhooks.find({ "events": { $exists: true, $ne: [], $elemMatch: { "links.cancelInvoice": {$exists: true}} } })

What is this, I don't even. Mongo is a huge pain when it comes to searching nested arrays. And above is just an array in an object, not multiple levels of nesting. I have to re-learn the syntax every time I pick it up.

Other than that, it worked great for allowing users to build completely custom forms.


Honestly that doesn't look that bad. SQL is a lot weirder in many ways, and I say that as someone who has written a lot of SQL.


I have to agree. Basic search and replace queries are very readable in Mongo and very easily constructable. Imo the advantage of SQL queries shows the best when having to do complex aggregation in Mongo.


That was actually a very smart decision as sql parsers are actually hard to write so they saved a ton of dev time.


There are couple of apache projects like calcite that are being used of new db companies like mapd.


It's not really a big deal either way. If you have some experience with parsers/compilers, SQL parsers are pretty trivial. If you don't, there is one in SQLite.


Calcite looks like a really useful project


Code is data. I'm surprised to see developers willing to adopt "there is no custom grammar, just create this data structure" after how poorly Lisp was received.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: