Any library that works with file formats needs binary files.
A lot of them malformed (or output is slightly different than standard output), because they need to ensure they can work even with files generated by other programs. Bugs like ' I tried to load this file and it failed, but works in XYZ' are extremly common.
These formats are often very complex and trying things like'zeroing out a high bit' doesn't cut it. Youvwould end up with binary code encoded in source.
Edit: one of simple improvements github/other forges could do is show content of archives in a diff. The payload was hidden in a archive test file and it would be displayed in a diff instead of "binary file change, no idea what is in it"
> Edit: one of simple improvements github/other forges could do is show content of archives in a diff.
That works if the archives are valid as checked in, but not if they’re corrupted in a predictable way such that they can trivially be “un-corrupted” as needed, perhaps by something as simple as tr.
Even if that’s not exactly what happened here, I think it’s pretty obvious how eminently doable that is, given the sophistication of so many aspects of this attack.
A lot of them malformed (or output is slightly different than standard output), because they need to ensure they can work even with files generated by other programs. Bugs like ' I tried to load this file and it failed, but works in XYZ' are extremly common.
These formats are often very complex and trying things like'zeroing out a high bit' doesn't cut it. Youvwould end up with binary code encoded in source.
Edit: one of simple improvements github/other forges could do is show content of archives in a diff. The payload was hidden in a archive test file and it would be displayed in a diff instead of "binary file change, no idea what is in it"