Hacker News new | past | comments | ask | show | jobs | submit login

The only truly successful commercial use of SDXL I know of is by NovelAI. Said company appears to have used an 256xH100 cluster to finetune it to produce anime art.

Open source efforts to produce a similar model seem to have failed due to the extreme compute requirements for finetuning. For example, Waifu Diffusion using 8XA40[0] have not managed to bend SDXL to their will after potentially months of training.

If you need 256xH100 to even finetune the model for your use case, what's stopping you from just training your own base model? Not much, as it turns out. Developers of NovelAI have stated they'll train from scratch for the next version of their model a few weeks ago.

So I agree, even with the licensing changes things might be looking somewhat dire for SAI.

https://gist.github.com/harubaru/f727cedacae336d1f7877c4bbe2...




What do you mean by extreme requirements? There's lots of SDXL fine tunings available at civit, like https://civitai.com/models/119012/bluepencil-xl for anime. The relevant discords for models/apps are full of people doing this at home.

Or are you looking at some very specific definition / threshold for fine tuning here?


There is something I find rather hard to communicate about the difference between these models on civitai and what I think a competent model should be able to do.

I'd describe them like "a bike with no handlebars" because they are incredibly difficult to steer to where you want.

For example if you look at the preview images like this one: https://civitai.com/images/3615715

The model seems to have completely ignored a good 35% of the text input, most egregiously I find the (flat chest:2.0), the parenthesis denoting a strengthening of that specific part of the prompt. The values I see people use with good general models range from 1.05~1.15. 2.0 in comparison is an extremely large value, that ended up _still not working at all_, if you take a look at the actual image.


> the model seems to have completely ignored a good 35% of the text input, Well, when you blindly cargo cult a prompting style designed to work around issues with SD1.x in SDXL and in the process spam several hundred tokens, mostly slight variants, into the 75-ish-token window (which, yes, the UIs use a merging strategy to try to accommodate), you have that problem.

> most egregiously I find the (flat chest:2.0)

The flat chest is fighting to compensate the also heavily weighted (hands on breasts:1.5) which not only affects hand placement but also the concept of "breasts", and the biases trained into many of the community models with that term mean that having that concept in the prompt and heavily weighted takes a lot to counteract. So, no, I don't think its ignoring that.


I'm just going out on a limb here, but a paid service needs to be good with limited input. I've used SD locally quite a lot and it takes quite a bit of work through x/y plots to find combinations of settings that produce good images somewhat consistently. Even when using decent fine tunings from CivitAI.

When I use a decent paid service, pretty much every prompt gives me a good response out of the box. Which is good, because otherwise I'd have no use for paid services, since I can run it all locally. This causes me to go to a paid service whenever I want something quick, but don't need full control. When I do want full control, I stick to my local solution, but that takes a lot more time.


They rented 8xA40 for $3.1k. That is actually kinda peanuts; I spent more on my gaming PC. I think there were kickstarter projects for AI finetunes that raised $200k before Kickstarter banned them?


Is that the correct link? I've never heard of A40s, the link is to release notes from a year and two months ago, and SD XL just came out a month or two ago. Hard for me to get to "SD XL cannot [be finetuned effectively]" from there.


A40: https://www.techpowerup.com/gpu-specs/a40-pcie.c3700

I have not heard about the team upgrading or downgrading from the hardware mentioned there, so I assumed it's still the same hardware they use.

>SD XL just came out a month or two ago

About 4.5 months actually.

For the SDXL cannot be finetuned efficiently claim, an attempt at a finetune was released here: https://huggingface.co/hakurei/waifu-diffusion-xl

The team was given early access by StabilityAI to SDXL0.9 for this. You'll have to test it out for yourself, if you're interested in comparing. From my experience, it is a world of a difference between the NovelAI and WaifuDiffusion models in both quality and prompt understanding.

Note, the very baseline I set for the WaifuDiffusionSDXL model was to beat their SD2.1 based model[0], which it did not in my opinion.

[0] https://huggingface.co/hakurei/waifu-diffusion-v1-4


> The team was given early access by StabilityAI to SDXL0.9 for this.

SDXL0.9 may have been a worse starting point than SDXL1.0, and certainly there's been more time put in and experience developed finetuning SDXL in the time since WD trained against SDXL0.9 by the people who have released the huge pile of finetunes that have been released since.


>Open source efforts to produce a similar model seem to have failed due to the extreme compute requirements for finetuning.

A distributed computing project similar to SETI @ Moon wouldn't help with training?


Not really with our current techniques. The increased latency and low bandwidth between nodes makes it absurdly slow.


Do you have any info or links on more details about this? I’ve wondered the same thing.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: