I would like think that it wouldn't be that hard to adapt an in-memory system to...

trapper · on March 14, 2009

Well, having done exactly that, you really do need a rewrite or fundamental refactoring. It will touch most functions in your codebase.

Memory mapped won't help when you have a 100gb+ file, and as you say it gets slow as it's definitely not optimal.

You also need custom indexing structures and data caching strategies for most algorithms that aren't easily moved to disk. And most aren't unfortunately. The other issue is that you end up doing a lot of research, because there just aren't many people who have done this. It's a time sucker.

I must say it was awesome seeing our decision tree system running on huge dataset sizes (tested > 100gb) in similar time (~30 seconds) to an in memory database after indexing.