> Part of the lesson here is that if you're doing MongoDB on EC2, you should...

jasonmccay · on April 13, 2012

I understand your point, but the performance issues still stem off of poor IO performance on Amazon EBS. As we continue to use it, we continue to find it to be the source of most people's woes.

If you have solid (even reasonable) IO, then moving things in and out of working memory is not painful. We have some customers on non-EBS spindles that have very large working sets (as compared to memory) ... faulting 400-500 times per second, and hardly notice performance slow downs.

I think your suggestions are legit, but faulting performance has just as much to do with IO congestion. That applies to insert/update performance as well.

ismarc · on April 13, 2012

We are using Mongo in ec2 and raid10 with 6 ebs drives out performs ephemeral disks when the dataset won't fit in RAM in a raw upsert scenario (our actual data, loading in historical data). The use if mmap and relying on the OS to page in/out the appropriate portions is painful, particularly because we end up with a lot of moves (padding factor varies between 1.8 and 1.9 and because of our dataset, using a large field on insert and clearing in update was less performant than upserts and moves).

There's really two knobs to turn on Mongo, RAM and disk speed. Our particular cluster doesn't have enough RAM for the dataset to fit in memory, but could double its performance (or more) if each key range was mmapped individually rather than the entire datastore the shard is responsible for just because of how the OS manages pages. We haven't broken down to implement it yet, but with the performance vs. cost tradeoffs, we may have to pretty soon.

sausagefeet · on April 14, 2012

> but the performance issues still stem off of poor IO performance on Amazon EBS

But the point is it shouldn't need to do that I/O in the first place.