the way it's written about in Image Reconstruction section like it is just an image compression thing...is kind of interesting. for that stuff and its presented use there to be very much about storing images and reconstructing them. when "it doesn't actually store original images" and "it can't actually give out original images" are points that get used so often in arguments as a defense for image generators.
so it is just a multi-image compression file format, just a very efficient one. sure, it's "redrawing"/"rendering" its output and makes things look kinda fuzzy, but any other compressed image format does that as well.
what was all that 'well it doesn't do those things' nonsense about then? clearly it can do that.
>well it doesn't do those things' nonsense about then? clearly it can do that.
There is a model that is trained to compress (very lossy) and decompress the latent, but it's not the main generative model, of course the model doesn't store images in it, you just give the encoder an image and it will encode it and then you can decode it with the decoder and get a very similar image, this encoder and decoder is used during training so that the stage C can work on a compressed latent instead of directly at the pixel level because it's expensive, but the main generative model (stage C) should be able to generate any of the images that were present in the dataset or it fails to do its job. Stages C, B, and A do not store any images.
The B and A stages work like an advanced image decoder, so unless you have something wrong with image decoders in general, I don't see how this could be a problem (a JPEG decoder doesn't store images either, of course).
In a way it's just an algorithm than can compress either text or an image. The neat trick is that if you compress the text "brown bear hitting Vladimir Putin" and then decompress it as an image, you get an image of a bear hitting Vladimir Putin.
This principle is the idea behind all Stable Diffusion models, this one "just" achieved a much better compression ratio
well yeah. but it's not so much about what it actually does, but how they talk about it. maybe (probably) i missed them putting out something that's described like that before, but it's just the open admission in demonstration of it. i guess they're getting more brazen, given than they're not really getting punished for what they're doing, be it piracy or infringement or whatever.
The model works on compressed data. That's all it is. Sure, it could output a picture from its training set on decompression, but only if you feed that same picture into the compressor.
In which case what are you doing, exactly? Normally you feed it a text prompt instead, which won't compress to the same thing.