Hacker News new | past | comments | ask | show | jobs | submit login

Microstates vs macrostates. Doesn't it depend on the basis for the encoding and how many bits it is? Watermark, we usually think of something in the pixel plane, but a watermark can be in the frequency spectrum, and other steganographic (clever) resilient basises.



Simply running a cycle of "ok" quality jpeg compression can completely devastate information encoded at higher frequencies.

Quantization & subsampling do a hell of a job at getting "unwanted" information gone. If a human cannot perceive the watermark, then processes that aggressively approach this threshold of perception will potentially succeed at removing it.


At this point the watermark can be meaningful content like "every time there's a bird there is also a cirrus cloud", or "blades of grass lean slightly further to the left than a natural distribution".

Because our main interest is this meaningful content, it will be harder to scrub from the image.


That would be indistinguishable from a model that was also trained on that output, wouldn't it?

It seems much more likely that it's their solution to detect and filter AI images from being used in their training corpus - kind of a latent "robots.txt".


Different tools have different levels of ability to remove invisible data, e.g. jpegli vs. libjpeg-turbo




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: