The comparison is a little pears to apple. Similar nutritions but different enough to not draw conclusions. The hardware in the Ceph test is only capable of max 1.7TiB/s traffic (optimally without any overhead whatsoever).
I also assume that the batch size (block size) is different enough that this alone would make a big difference.
That difference is still pronounced, yes. But the workload is so different. Training AI is hardly random read. Still not a comparison which should lead you to any conclusions.
I also assume that the batch size (block size) is different enough that this alone would make a big difference.