Most regular inserts and regular selects: https://medium.com/serpapi/mongodb-ben...

winrid · 2025-02-25T10:26:50 1740479210

As someone that has ran every version from 3.2 to 8 on small nodes and large clusters (~100+ nodes)...

8 is waaay faster in the real world. It's not really comparable. Your micro benchmark is comparing the few nanoseconds of the heavier query planner, but in the real world that query planner gives real benefits. Not to mention aggregations, memory management improvements, and improvements when your working set size is very large/larger than memory.

hartator · 2025-02-25T15:47:08 1740498428

Can you share some data about this?

Here's another dataset about performance regression doing `$inc`s as fast as possible on the same object.

Mongo 3.4.24:

    332,037 stats update in 100s. (3,321 stats updates per s)

Mongo 8.0.4:

    287,553 stats update in 100s. (2,876 stats updates per s)

(higher is better)

memco · 2025-02-25T17:34:33 1740504873

Thanks for the data! I think I may have different use cases than are covered by your benchmarks.

Do you often do that many independent $incs (or any query) in a single second? I have gotten much better performance by using `BulkWrite` to do a bunch of small updates in a batch.

To go to a specific example from the "Driver Benchmark" on the link from your first reply:

   client[:users].insert_one(name: Digest::MD5.hexdigest(index.to_s))

I notice in this specific example that there's no separation of the hashing from the query timing. so I might try to do the hashing first then time just the inserts. I would also a batch of `insertOne`s and then do a bulk write so I'm making much fewer queries. I will often pick some random size like 1,000 queries or so and do the `bulkWrite ` when I have accumulated that many queries, have surpassed some time (like if it has been more than 0.5s since the last update) or if there's no more items to process. Additionally if the order of the inserts doesn't matter using `ordered: false` can provide additional speedup.

For me the limiting factor is mostly around the performance of BulkWrite. I haven't hit any performance bottlenecks there that would merit benchmarking different ways to use it, but I would mostly be trying to fine tune things like how to group the items in a BulkWrite for optimal performance if I did.

Even in the case of one-off queries it almost always feels faster on 7+ than earlier versions. As I mentioned the one bottleneck we hit with migration was that we had some queries where we were querying on fields that were not properly indexed and in those cases performance tanked horribly to the point where some queries actually stopped working. However, once we added an index the queries were always faster than on the old version. When we did hit problems, it took only a few minutes to figure out what to index then everything was fine. We didn't have to make changes to our application or the queries themselves to fix any issues we had.

winrid · 2025-02-25T18:23:22 1740507802

Again this microbenchark is useless. Don't pick databases this way. This is not the kind of operation you should be worrying about optimizing, it's not usually the bottleneck or what is slow.

Setup a clone of prod and build a tool to replay your traffic to it.

I have lots of data from datadog and ops manager but not going to take the time to publish ATM.

I just moved a 4tb deployment from 3.2 to 7. It cut max query time by about half. I actually went to instances with half the cpus, too (although I switched from ebs to ssds).

hartator · 2025-02-25T23:47:51 1740527271

> Again this microbenchark is useless. Don't pick databases this way. This is not the kind of operation you should be worrying about optimizing, it's not usually the bottleneck or what is slow

It was for us. API calls that need to aggregate stats on same ID. We found a way around, but it would not have been an issue if MongoDB 8 was like 2x faster.

> I just moved a 4tb deployment from 3.2 to 7. It cut max query time by about half. I actually went to instances with half the cpus, too (although I switched from ebs to ssds).

Just single-core performance improvement in the last 10-year might explain your outperformance.

winrid · 2025-02-26T07:38:28 1740555508

> Just single-core performance improvement in the last 10-year might explain your outperformance.

Nope, after migration max query time was still over a minute in some cases. What makes the biggest difference is performance tuning. After a week or so of index tuning, I got max index time below 6s. If Mongo makes each query take 2ms instead of 1ms, it literally doesn't matter to that customer or their customers, since it's just noise at that point. The old instances were M5s, so not that old.

The point is that the few nanoseconds difference you're measuring is not what you spend the most time on, usually.

Also you mentioned write performance. If you set journal commit interval to 500ms or something, then you can easily beat the old 3.2 write speeds, since if you're using 3.2 you probably don't care that much about journal commit intervals anyway ^_^