Hacker News new | past | comments | ask | show | jobs | submit login

I would like think that it wouldn't be that hard to adapt an in-memory system to use something like a memory mapped file or even a custom cached memory mapped file. Of course, such a system might not be designed to avoid page swaps/cache hits.

Of course, this is about how the system could evolve - the possibility might not help a simple user now.




Well, having done exactly that, you really do need a rewrite or fundamental refactoring. It will touch most functions in your codebase.

Memory mapped won't help when you have a 100gb+ file, and as you say it gets slow as it's definitely not optimal.

You also need custom indexing structures and data caching strategies for most algorithms that aren't easily moved to disk. And most aren't unfortunately. The other issue is that you end up doing a lot of research, because there just aren't many people who have done this. It's a time sucker.

I must say it was awesome seeing our decision tree system running on huge dataset sizes (tested > 100gb) in similar time (~30 seconds) to an in memory database after indexing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: