the way it's written about in Image Reconstruction section like it is just an im...

GaggiX · 2024-02-14T10:04:08 1707905048

>well it doesn't do those things' nonsense about then? clearly it can do that.

There is a model that is trained to compress (very lossy) and decompress the latent, but it's not the main generative model, of course the model doesn't store images in it, you just give the encoder an image and it will encode it and then you can decode it with the decoder and get a very similar image, this encoder and decoder is used during training so that the stage C can work on a compressed latent instead of directly at the pixel level because it's expensive, but the main generative model (stage C) should be able to generate any of the images that were present in the dataset or it fails to do its job. Stages C, B, and A do not store any images.

The B and A stages work like an advanced image decoder, so unless you have something wrong with image decoders in general, I don't see how this could be a problem (a JPEG decoder doesn't store images either, of course).

gmerc · 2024-02-14T02:02:21 1707876141

Ultimately this is abstraction not compression.

wongarsu · 2024-02-14T01:42:11 1707874931

In a way it's just an algorithm than can compress either text or an image. The neat trick is that if you compress the text "brown bear hitting Vladimir Putin" and then decompress it as an image, you get an image of a bear hitting Vladimir Putin.

This principle is the idea behind all Stable Diffusion models, this one "just" achieved a much better compression ratio

pxoe · 2024-02-14T01:49:07 1707875347

well yeah. but it's not so much about what it actually does, but how they talk about it. maybe (probably) i missed them putting out something that's described like that before, but it's just the open admission in demonstration of it. i guess they're getting more brazen, given than they're not really getting punished for what they're doing, be it piracy or infringement or whatever.

Filligree · 2024-02-14T13:56:39 1707918999

The model works on compressed data. That's all it is. Sure, it could output a picture from its training set on decompression, but only if you feed that same picture into the compressor.

In which case what are you doing, exactly? Normally you feed it a text prompt instead, which won't compress to the same thing.