Hacker News new | past | comments | ask | show | jobs | submit login

Well current popular systems are pretty much limited to Lustre and the new kid Weka, mostly Lustre though tbh.

You can try to use "standard" options like MinIO/Ceph(RADOS)/SeaweedFS but you will very quickly learn those systems aren't remotely fast enough for these usecases.

AI training is what this is used for, not inference (which has absolutely no need for any filesystem at all). What makes the workload somewhat special is that it's entirely random read and not cacheable at all as most reads are one and done.

Would Lustre be perfectly fine at 6TiB/s? Yes. Is it a huge pain in the ass to operate and make remotely highly available? Also yes. If this thing is capable of the throughput but easier to operate and generally more modern and less baroque it's probably an improvement. TLDR is Lustre is fast but that is literally it's only redeeming quality. I have lost far too many hours of my life to the Lustre gods.




> What makes the workload somewhat special is

I'll add that latency also doesn't matter that much. You are doing batched data loading for batch n+1 on CPU when GPUs are churning batch n-1 and copying batch n from host memory at the same time.

So as long as your "load next batch" doesn't run for like >1s it would be fine. But one single "load next batch" on one worker means thousands (if not more) random read.


They’re using the FS for caching the KV caches of past requests. It’s why they’re able to charge so little on prompt cache hit.


Ahh I missed that. Yes prefix caching and RAG are 2 cases were you will want something like this during inference time.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: