Flux better than Stable Diffusion

0xbadc0de5 · 2024-08-17T09:28:49.000000Z

Not just a little better, a lot better.

iirc, a lot of the devs who left Stable Diffusion went on to found/join Black Forest Labs.

bbor · 2024-08-17T00:03:31.000000Z

I was sure that this was self-promo, but after scrolling thru some of OP’s history it looks like good faith to me. I’m kinda burned tho — whoever runs this site has been engaging in “”organic marketing”” on Reddit’s /r/AiArt, which makes me sad.

FWIW I haven’t tried it for that reason alone. Curious to hear from people plugged into leaderboard competitions — how does this rank objectively? I feel like image models are super hard to evaluate though, to be fair. All I can find is instructions, but no centralized results

Eg https://huggingface.co/docs/diffusers/en/conceptual/evaluati...

cpldcpu · 2024-08-17T14:01:46.000000Z

>whoever runs this site has been engaging in

You are suggesting that BFL is using "organic marketing" to push their product??

It may be worth mentioning that the BFL team actually consists of the people who invented Latent Diffusion (At CompViS, a university lab), then developed Stable Diffusion at Stability.ai and now they are pushing the state-of-the-art with Flux.1.

The attention is well deserved and Flux.1 is definitely the top model right now.

edit: had a look at r/AiArt. This seems to be a place where people post their "Bing Image Creator" output. Maybe that's not where the enthusiasts are. Try r/StableDiffusion

somethingsome · 2024-08-17T09:00:24.000000Z

Hum, I actually submitted it so I can keep track of it (saved in my submitted record on HN) , it seemed promising, I hope to run some tests soon.

Didn't saw the organic marketing, I'll check better next time ;)

I just discovered it from https://youtu.be/stOiAuyVnyQ?si=eOVVVcUxJMClYZFa

somethingsome · 2024-08-17T09:22:03.000000Z

On another note, after posting, I went to the main website and they display several user examples. They promise 'text' in images but everything I saw failed miserably, so I guess we are not there yet.

For the metric to evaluate this kind of network, to me it's impossible to define it, or at least correctly, it's too subjective, if you put a metric on it, it will influence the results and therefore limit the images you can create. (not even speaking about the kind of images we want it to create)

So maybe the best we can have is a very generic metric for training, and finetuning it to some style afterward depending on your usage.

So I don't really trust 'objective ranking', but we could take all those networks, fine tune them on some well defined style and evaluate them with ground truth images when we ask to make the original piece, it would at least be some kind of proxy on the finetuning-ability of the network.

I'm skipping a lot of details such as: how to generate the prompt corresponding to the desired output, but without thinking too much about it, it could be like finding some fix-point function, and there, we can create a metric. (but this fixed function would depend on the model, and probably we can't use it to evaluate other models, and having a fixed function for all models would not compare fairly)

Note: thinking out loud