I wrote redb (https://github.com/cberner/redb) using mmap, initially. However, I later removed it and switched to read()/write() with my own user space cache. I'm sure it's not as good as the OS page cache, but the difference was only 1.2-1.5x performance on the benchmarks I cared about, and the cache is less than 500 lines of code.
Also, by removing mmap() I was able to remove all the unsafe code associated with it, so now redb is memory-safe.
500 lines of code is still 500 lines of added complexity. For LMDB that'd be a 7% increase in LOCs, which would also introduce the need for manual cache configuration and tuning (further increasing complexity for the end user) and a 50% performance loss? Doesn't seem like a good tradeoff to me.
Yep, we've been using GPUs for quite a while (even before the alpha support in Kube), both the K80s in Azure and some Pascals in our own clusters. With the support in Kube now it's quite seamless.
Late reply, but Kube meant Kubernetes not Kubeflow.
Alpha GPU support landed in 1.6 if my memory serves me right.
Before that you had to do a bunch of stuff manually to make GPU work, mostly around scheduling etc.
Since 1.6, Kubernetes will automatically detect the GPUs on your node and thus correctly assign the workloads where they fit.
Kubeflow is an abstraction layer on top of that, that helps a lot when you want to do things such as distributed TensorFlow training. It also helps a bit for simpler jobs by (almost) removing the need to manually mount the NVIDIA drivers from the host into the container for example.
Ya, I'll add that to our list of feature requests. Would definitely like to put some more features in, to help people with blog posts or other research in the car space.
Ah yes, I would seem that green cars are going out of fashion, dunno who would want a purple car =P
Yep, totally agree with this. Last month we spent an hour or two going through the schema and removing any fields that didn't need to be stored, and making sure that only fields which we actually do queries on had index=true. I didn't test before and after results, but qualitatively it seemed to be faster afterwards.