I'm not criticizing the programmer's effort but whenever I see a new project as ...

pdq · on July 28, 2015

I'm in agreement he should use more test cases, but it looks like he used the same as Golang's compression tests [1]. Note he has other test cases in the subdirectories (also the same as Go) [2].

However I believe even better testing would be to fuzz it for a few hours [3].

[1] http://golang.org/src/compress/testdata/

[2] https://github.com/klauspost/compress/tree/master/zip/testda...

[3] https://github.com/dvyukov/go-fuzz

on July 28, 2015

[deleted]

jasode · on July 28, 2015

>It's a stream-oriented format. There are no file-global offsets or lengths. There are some relative lengths and offsets, but they're going to be 8 or 16 bit values. Most of the tests you're suggesting don't look particularly useful.

I don't think that streaming vs file is relevant. The deflate algorithm (stream) may have no global knowledge of the eventual file size but the gzip file format does.

Anyways, even with a cursory google search, we can find a compression bug related to file sizes:

https://bugs.openjdk.java.net/browse/JDK-6419239

As for an example of streaming bugs, zlib's creator Mark Adler reported that Microsoft coded its deflate implementation incorrectly.

http://stackoverflow.com/a/11435898

The lesson is that re-implementing even well-known algorithms such as zlib/gzip/pkzip is not trivial.

quotedmycode · on July 28, 2015

It was also changed to only look for 4-byte or more minimum matches. "Better compression" is claimed, but I don't see any benchmarks to indicate the size of compressed files before and after.

0xdeadbeefbabe · on July 28, 2015

> I'm not criticizing the programmer's effort

Yes, you are not criticizing the programmer's effort. That's what a pull request is for. A pull request with a failing test in it would be some very concise criticism.