Every time I contemplate increasing my use of S3 or similar cloud services, I get annoyed at the extremely sorry state of anything on-prem that replicates to S3.
Heck, even confirming that one has uploaded one's files correctly is difficult. You can extract MD5 (!) signatures from most object storage systems, but only if you don't use whatever that particular system calls a multipart upload. You can often get CRC32 (gee thanks). With AWS, but not most competing systems, you can do a single-part upload and opt into "object integrity" and get a real hash in the inventory. You cannot ask for a hash after the fact.
I understand why computing a conventional cryptographically-secure has is challenging in a multipart upload (but that actually all that bad). But would it kill the providers to have first-class support for something like BLAKE3? BLAKE3 is a tree hash: one can separately hash multiple parts (with a priori known offsets, but that's fine for most APIs but maybe not Google's as is), assemble them into a whole in logarithmic time and memory, and end up with a hash that actually matches what b3sum would have output on the whole file. And one could even store some metadata and thereby allow downloading part of a large file and proving that one got the right data. (And AWS could even charge for that!)
But no, verifying the contents of one's cloud object storage bucket actually sucks, and it's very hard to be resistant to errors that occur at upload time.
Heck, even confirming that one has uploaded one's files correctly is difficult. You can extract MD5 (!) signatures from most object storage systems, but only if you don't use whatever that particular system calls a multipart upload. You can often get CRC32 (gee thanks). With AWS, but not most competing systems, you can do a single-part upload and opt into "object integrity" and get a real hash in the inventory. You cannot ask for a hash after the fact.
I understand why computing a conventional cryptographically-secure has is challenging in a multipart upload (but that actually all that bad). But would it kill the providers to have first-class support for something like BLAKE3? BLAKE3 is a tree hash: one can separately hash multiple parts (with a priori known offsets, but that's fine for most APIs but maybe not Google's as is), assemble them into a whole in logarithmic time and memory, and end up with a hash that actually matches what b3sum would have output on the whole file. And one could even store some metadata and thereby allow downloading part of a large file and proving that one got the right data. (And AWS could even charge for that!)
But no, verifying the contents of one's cloud object storage bucket actually sucks, and it's very hard to be resistant to errors that occur at upload time.