If the site is public, you content is in all the major training sets.

bee_rider · 2024-09-01T17:06:29 1725210389

I wonder how long until something like Copilot+, but also with the ability to leak these images into the training data, is released by somebody (probably not by MS, they appear to be trying to maintain some appearance of not being entirely malware authors).

I wonder if “affirming the consequent” is still ok (not that you’ve done so, your post just brought it to mind).

_heimdall · 2024-09-01T17:35:33 1725212133

Affirming the consequent is always an interesting one to me when it overlaps with a tragedy of the commons situation.

If publicly available content can be used in training sets, any one model willing to use them will have an advantage. Is it reasonable to assume then that all publicly available content is in the most popular training sets, or is that falling into the affirming the consequent fallacy?