I heard Stable Diffusion's model is just 4 GB. It's incredible that billions of images could be squeezed in just 4 GB. Sure it's lossy compression but still.
I don't think that thinking of it as "compression" is useful, and more than an artist recreating the Mona Lisa from memory is "decompressing" it. The process that diffusion models use is fundamentally different to decompression.
For example, if you prompt Stable Diffusion with "Mona Lisa" and look at the iterations, it is clearer what is happening - it's not decompressing so much as drawing something it knows looks like Mona Lisa and then iterating to make it look clearer and clearer.
It clearly "knows" what the Mona Lisa looks like, but what is is doing isn't copying it - it's more like recreating a thing that looks like it.
(And yes I realize lots of artist on Twitter are complaining that it is copying their work. I think "forgery" is a better analogy than "stealing" though - it can create art that looks like a Picasso or whatever, but it isn't copying it in a conventional sense)
I think it's easy to explain. If we split all those images into small 8x8 chunks, and put all the chunks into a fuzzy and a bit lossy hashtable, we'll see that many chunks are very similar and can be merged into one. To address this "space of 8x8 chunks" we'll apply PCA to them, just like in jpeg, and use only the top most significant components of the PCA vectors.
So in essense, this SD model is like an Alexandria library of visual elements, arranged on multidomensional shelves.