Generate images fast with SD 1.5 while typing on Gradio

Der_Einzige · on Nov 12, 2023

The fact that LCM loras turn regular SD models into psudo-LCM models is insane.

Most people in the AI world don't understand that ML is like actual alchemy. You can merge models like they are chemicals. A friend of mine called it "a new chemistry of ideas" upon seeing many features in Automatic1111 (including model and token merges) used simultaneously to generate unique images.

Also, loras exist on a spectrum based on their dimensionality. Tiny loras should only be capable of relatively tiny changes. My guess is that this is a big lora, nearly the same size as the base checkpoint.

ttul · on Nov 12, 2023

To me, the crazy thing about LoRA is they work perfectly well adapting models checkpoints that were themselves derived from the base model on which the LoRA was trained. So you can take the LCM LoRA for SD1.5 and it works perfectly well on, say, RealisticVision 5.1, a fine-tuned derivative of SD1.5.

You’d think that the fine tuning would make the LCM LoRA not work, but it does. Apparently the changes in weights introduced through even pretty heavy fine tuning does not wreck the transformations the LoRA needs to make in order to make LCM or other LoRA adaptations work.

To me this is alchemy.

yorwba · on Nov 12, 2023

Finetuning and LoRAs both involve additive modifications to the model weights. Addition is commutative, so the order in which you apply them doesn't matter for the resulting weights. Moreover, neural networks are designed to be differentiable, i.e. behave approximately linearly with respect to small additive modifications of the weights, so as long as your finetuning and LoRA change the weights only a little bit, you can finetune with or without the LoRA, respectively train the LoRA on the finetuned model or its base, and get mostly the same result.

So this is something that can be somewhat explained using not terribly handwavy mathematics. Picking hyperparameters on the other hand...

rq1 · on Nov 13, 2023

This is trivially not true.

Pick eg. x -> sin(1/x) around zero and its derivatives.

The small modifications that you’re talking about are on the argument. These can lead to huuge changes in the values.

The stability is more likely due to the diffusive nature of the models and well executed trainings.

kelseyfrog · on Nov 13, 2023

I don't recall SD or variants using discontinuous terms like 1/x. Sigmoid, softmax, and SiLU are going to be what you're looking for.

rq1 · on Nov 13, 2023

They don’t use them indeed. I was replying to the general idea about the additions.

OTOH Gaussian kernels smoothen almost everything. Maybe it will be stable even with sin(1/x) as an “activation”.

yorwba · on Nov 13, 2023

If you want to use a counterexample to refute the general idea about additions, you need to pick one that fulfills the preconditions, like being differentiable. x → sin (1/x) is not differentiable at 0 and for any other value where it is differentiable, there's a small ɛ and a linear function L such that for all a and b < ɛ, sin(1/(x + a + b)) = sin(1/x) + L(a + b) + O(ɛ²) and because L is linear, L(a + b) = L(a) + L(b). The wrinkle is that ɛ might have to be extremely small indeed.

rq1 · on Nov 13, 2023

Around zero, not at zero.

Recalling the definition of exact differentiability is irrelevant.

Instead take the smallest interval that you can represent in fp32 not too far away from zero for example. Take few values in that infinite interval and check the behaviour of the said monstrous function.

This is a “trivial” example when studying eg. Distribution theory.

Said differently, you need to assess how smooth is the differential operator itself.

keonix · on Nov 12, 2023

Wait until you hear about frankenmodels. You rip parts of one model (often attention heads) and transplant them in another and somehow that produces coherent results! Witchcraft

https://huggingface.co/chargoddard

GaggiX · on Nov 12, 2023

>somehow that produces coherent results

with or without finetuning? Also is there a practical motivation for creating them?

keonix · on Nov 12, 2023

> with or without finetuning?

With, but it's still bonkers that it works so well

>Also is there a practical motivation for creating them?

You could get in-between model sizes (like 20b instead of 13b or 34b). Before better quantization it was useful for inference (if you are unlucky with vram size), but now I see this being useful only for training because you can't train on quants

ShamelessC · on Nov 13, 2023

> With, but it's still bonkers that it works so well

Ehhhh…

smusamashah · on Nov 12, 2023

Ok. I have seen the term LCM Lora a number of times. I have used both stable Diffusion and LORAs for fun for quite a while. But I always thought this LCM Lora is a new thing. It's simply not possible using current samplers to return an image under 4 steps. What you are saying is that just by adding a Lora we can get existing models and samplers to generate a good enough image in 4 steps?

jyap · on Nov 12, 2023

Yes check out this blog post: https://huggingface.co/blog/lcm_lora

I’ve used it with my home GPU. Really fast which makes it more interactive and real-time.

catwell · on Nov 12, 2023

It's a different sampler too.

temp72840 · on Nov 12, 2023

This is nuts. I did a double take at this comment - I thought you must have been talking about LoRAing a LCM distilled from Stable Diffusion.

LCMs are spooky black magic, I have no intuitions about them.

ttul · on Nov 12, 2023

When I was taking Jeremy Howard’s course last fall, the breakthrough in SD was going from 1000 steps to 50 steps via classifier-free guidance, which is this neat hack where you run inference with your conditioning and without and then mix the result. To this day I still don’t get it. But it works.

Now we find this way to skip to the end by building a model that learns the high dimensional curvature of the path that a diffusion process takes through space on its way to an acceptable image, and we just basically move the model along that path. That’s my naive understanding of LCM. Seems to good to be true, but it does work and it has a good theoretical basis too. Makes you wonder what is next? Will there be a single step network that can train on LCM to predict the final destination? LoL that would be pushing things too far..

hadlock · on Nov 12, 2023

Sounds like we've invented the kind of psychic time travel they use in Minority report. Let me show you right over to the Future Crimes division. We're arresting this guy making cat memes today because the curve of his online history traces that of a radicalized blah blah blah

esafak · on Nov 12, 2023

This is what happens when praxis runs ahead of theory.

GaggiX · on Nov 12, 2023

lcm-lora-sdv1-5 is 67.5M, lcm-lora-sdxl is 197M, so they are much smaller than the entire model, would be cool to check the rank used with these LoRAs tho

liuliu · on Nov 12, 2023

jimmySixDOF · on Nov 12, 2023

And here is a demo mashed up using LeapMotion free space hand tracking and a projector to manipulate a "bigGAN's high-dimensional space of pseudo-real images" to make it more like a modern dance meets sculpting meets spatial computing with a hat tip to the 2008 work of Johnny Chung Lee while at Carnage Mellon.

https://x.com/graycrawford/status/1100935327374626818