I heard Stable Diffusion's model is just 4 GB. It's incredible that billions of ...

nl · on Sept 20, 2022

I don't think that thinking of it as "compression" is useful, and more than an artist recreating the Mona Lisa from memory is "decompressing" it. The process that diffusion models use is fundamentally different to decompression.

For example, if you prompt Stable Diffusion with "Mona Lisa" and look at the iterations, it is clearer what is happening - it's not decompressing so much as drawing something it knows looks like Mona Lisa and then iterating to make it look clearer and clearer.

It clearly "knows" what the Mona Lisa looks like, but what is is doing isn't copying it - it's more like recreating a thing that looks like it.

(And yes I realize lots of artist on Twitter are complaining that it is copying their work. I think "forgery" is a better analogy than "stealing" though - it can create art that looks like a Picasso or whatever, but it isn't copying it in a conventional sense)

Gigachad · on Sept 20, 2022

Forgery requires some kind of deception/fraud. Painting an imitation of the Mona Lisa isn’t forgery. Trying to sell it as if it is the original is.

nl · on Sept 20, 2022

Yes I agree with this too.

I think using that language is better than "stealing", because the immoral act is the passing off, not training of the model.

eru · on Sept 20, 2022

In this regard, stable diffusion is not so much comparable to a corpus of jpeg images, but with the jpeg compression algorithms.

akomtu · on Sept 20, 2022

I think it's easy to explain. If we split all those images into small 8x8 chunks, and put all the chunks into a fuzzy and a bit lossy hashtable, we'll see that many chunks are very similar and can be merged into one. To address this "space of 8x8 chunks" we'll apply PCA to them, just like in jpeg, and use only the top most significant components of the PCA vectors.

So in essense, this SD model is like an Alexandria library of visual elements, arranged on multidomensional shelves.