Unfortunately, for a multi-part upload it isn't a hash of the total object, it i...

infogulch · 2024-11-26T17:50:55 1732643455

If parts are aligned on a 1024-byte boundary and you know each part's start offset, it should be possible to use the internals of a BLAKE3 tree to get the final hash of all the parts together even as they're uploaded separately. https://github.com/C2SP/C2SP/blob/main/BLAKE3.md#13-tree-has...

Edit: This is actually already implemented in the Bao project which exploits the structure of the BLAKE3 merkle tree structure to offer cool features like streaming verification and verifying slices of a file as I described above: https://github.com/oconnor663/bao#verifying-slices

UltraSane · 2024-11-27T03:36:46 1732678606

That is very neat! I love clever uses of data structures like this.

vdm · 2024-11-26T07:40:52 1732606852

Ways to control etag/Additional Checksums without configuring clients:

CopyObject writes a single part object and can read from a multipart object, as long as the parts total less than the 5 gibibyte limit for a single part.

For future writes, s3:ObjectCreated:CompleteMultipartUpload event can trigger CopyObject, else defrag to policy size parts. Boto copy() with multipart_chunksize configured is the most convenient implementation, other SDKs lack an equivalent.

For past writes, existing multipart objects can be selected from inventory filtering ETag column length greater than 32 characters. Dividing object size by part size might hint if part size is policy.

vdm · 2024-11-26T10:44:26 1732617866

> Dividing object size by part size

Correction: and also part quantity (parsed from etag) for comparison

vdm · 2024-11-26T07:11:25 1732605085

Don't the SDKs take care of computing the multi-part checksum during upload?

> To create a trailing checksum when using an AWS SDK, populate the ChecksumAlgorithm parameter with your preferred algorithm. The SDK uses that algorithm to calculate the checksum for your object (or object parts) and automatically appends it to the end of your upload request. This behavior saves you time because Amazon S3 performs both the verification and upload of your data in a single pass. https://docs.aws.amazon.com/AmazonS3/latest/userguide/checki...

tedk-42 · 2024-11-26T10:14:43 1732616083

It does and has a good default. An issue I've come across though is you have the file locally and you want to check the e-tag value - you'll have to do this locally first and then compare the value to the S3 stored object.

vdm · 2024-11-26T10:46:47 1732618007

https://github.com/peak/s3hash

It would be nice if this got updated for Additional Checksums.