Hacker News new | past | comments | ask | show | jobs | submit login
SDXL Turbo: A Real-Time Text-to-Image Generation Model (stability.ai)
283 points by minimaxir on Nov 28, 2023 | hide | past | favorite | 152 comments



Noncommercial use - aside from being one of my licensing pet peeves - seems to indicate that the money is drying up. My guess is that the investors over at Stability are tired of subsidizing the part of the generative AI market that OpenAI refuses to touch[0].

The thing is, I'm not entirely sure there's a paying portion of the market? Yes, I've heard of people paying for ChatGPT because it answers programming questions really well, but that's individual users, who are cost sensitive. The real money was supposed to be selling these things as worker replacement, but Hollywood unions have (rightfully) shut AI companies out of the markets where a machine that can write endless slop might have made lots of money.

OpenAI can remain in Microsoft's orbit for as long as the stupid altruist / accelerationist doomer debate doesn't tear them apart[1]. Google has at least a few more years of monopoly money before either the US breaks them up or the EU bankrupts them with fees. I don't know who the hell is pumping more money into either Anthropic or Stability.

[0] Porn. It's always porn. OpenAI doesn't want to touch it for very obvious reasons.

[1] For what it's worth, Microsoft has shown that all the AI safety guardrails can be ripped out by the money people at a moment's notice, given how quickly they were able to make the OpenAI board blink.


The only truly successful commercial use of SDXL I know of is by NovelAI. Said company appears to have used an 256xH100 cluster to finetune it to produce anime art.

Open source efforts to produce a similar model seem to have failed due to the extreme compute requirements for finetuning. For example, Waifu Diffusion using 8XA40[0] have not managed to bend SDXL to their will after potentially months of training.

If you need 256xH100 to even finetune the model for your use case, what's stopping you from just training your own base model? Not much, as it turns out. Developers of NovelAI have stated they'll train from scratch for the next version of their model a few weeks ago.

So I agree, even with the licensing changes things might be looking somewhat dire for SAI.

https://gist.github.com/harubaru/f727cedacae336d1f7877c4bbe2...


What do you mean by extreme requirements? There's lots of SDXL fine tunings available at civit, like https://civitai.com/models/119012/bluepencil-xl for anime. The relevant discords for models/apps are full of people doing this at home.

Or are you looking at some very specific definition / threshold for fine tuning here?


There is something I find rather hard to communicate about the difference between these models on civitai and what I think a competent model should be able to do.

I'd describe them like "a bike with no handlebars" because they are incredibly difficult to steer to where you want.

For example if you look at the preview images like this one: https://civitai.com/images/3615715

The model seems to have completely ignored a good 35% of the text input, most egregiously I find the (flat chest:2.0), the parenthesis denoting a strengthening of that specific part of the prompt. The values I see people use with good general models range from 1.05~1.15. 2.0 in comparison is an extremely large value, that ended up _still not working at all_, if you take a look at the actual image.


> the model seems to have completely ignored a good 35% of the text input, Well, when you blindly cargo cult a prompting style designed to work around issues with SD1.x in SDXL and in the process spam several hundred tokens, mostly slight variants, into the 75-ish-token window (which, yes, the UIs use a merging strategy to try to accommodate), you have that problem.

> most egregiously I find the (flat chest:2.0)

The flat chest is fighting to compensate the also heavily weighted (hands on breasts:1.5) which not only affects hand placement but also the concept of "breasts", and the biases trained into many of the community models with that term mean that having that concept in the prompt and heavily weighted takes a lot to counteract. So, no, I don't think its ignoring that.


I'm just going out on a limb here, but a paid service needs to be good with limited input. I've used SD locally quite a lot and it takes quite a bit of work through x/y plots to find combinations of settings that produce good images somewhat consistently. Even when using decent fine tunings from CivitAI.

When I use a decent paid service, pretty much every prompt gives me a good response out of the box. Which is good, because otherwise I'd have no use for paid services, since I can run it all locally. This causes me to go to a paid service whenever I want something quick, but don't need full control. When I do want full control, I stick to my local solution, but that takes a lot more time.


They rented 8xA40 for $3.1k. That is actually kinda peanuts; I spent more on my gaming PC. I think there were kickstarter projects for AI finetunes that raised $200k before Kickstarter banned them?


Is that the correct link? I've never heard of A40s, the link is to release notes from a year and two months ago, and SD XL just came out a month or two ago. Hard for me to get to "SD XL cannot [be finetuned effectively]" from there.


A40: https://www.techpowerup.com/gpu-specs/a40-pcie.c3700

I have not heard about the team upgrading or downgrading from the hardware mentioned there, so I assumed it's still the same hardware they use.

>SD XL just came out a month or two ago

About 4.5 months actually.

For the SDXL cannot be finetuned efficiently claim, an attempt at a finetune was released here: https://huggingface.co/hakurei/waifu-diffusion-xl

The team was given early access by StabilityAI to SDXL0.9 for this. You'll have to test it out for yourself, if you're interested in comparing. From my experience, it is a world of a difference between the NovelAI and WaifuDiffusion models in both quality and prompt understanding.

Note, the very baseline I set for the WaifuDiffusionSDXL model was to beat their SD2.1 based model[0], which it did not in my opinion.

[0] https://huggingface.co/hakurei/waifu-diffusion-v1-4


> The team was given early access by StabilityAI to SDXL0.9 for this.

SDXL0.9 may have been a worse starting point than SDXL1.0, and certainly there's been more time put in and experience developed finetuning SDXL in the time since WD trained against SDXL0.9 by the people who have released the huge pile of finetunes that have been released since.


>Open source efforts to produce a similar model seem to have failed due to the extreme compute requirements for finetuning.

A distributed computing project similar to SETI @ Moon wouldn't help with training?


Not really with our current techniques. The increased latency and low bandwidth between nodes makes it absurdly slow.


Do you have any info or links on more details about this? I’ve wondered the same thing.


> Porn. It's always porn.

I posted about my AI porn site pornpen.ai here last year and it reached the top of the front page. And yes, it's still going strong :D (and we've integrated SDXL and videos recently)


oh wow you just got another user sir/madam GYAAAT

Edit: okay a lot of these look just bizarre as hell and seems to favour massive grotesque breast size

I wonder the damage on women this does


Emad just tweeted about the future monetization of their core models. Seems they want to use the Unity model - the original one, not the recent trick Unity pulled. AKA free to use until you make lots of money with it.

https://x.com/EMostaque/status/1729609312601887109


Are you suggesting the only use for locally run free SD derived models is porn?

Creating illustrations for articles/presentations and stock photo alteratives are huge!

The ability to run for free on your local machine allows for far more iterations than using SaaS, and the checkpoint/finetune ecosystem the openness sprouted has created models performing way better for these use cases than standard SD.


No, I'm suggesting that the only models you can use for porn are locally-run.

In particular the people offering hosted models do not want to touch porn, because the first thing people do with these things is try to make nonconsensual porn of real people, which is absolutely fucking disgusting. Hence why they have several layers of filtering.

SD also has a safety filter, but it's trivially removable, and the people who make nonconsensual porn do not care about trivial things like licensing terms. My assumption is that switching to a noncommercial license would mean that Stability could later add further restrictions to the commercial use terms, i.e. "if you're licensing the model for a generator app like Draw Things, you have to package it up in such a way that removing the safety filter is difficult or impossible".


We build the best video, image and other models with more downloads and usage than anyone.

It is quite revolutionary for creative industry which is a few hundred billion in size, which is a reasonable market globally.


> Porn. It's always porn.

I've been surprised at the explosion of porn. Well, not actually. Automatic1111 made that easy and anyone that CivitAI knows all too well what those models are being used for. I mean when you give teenagers the ability to undress their crushes[0] what do you think is going to happen (do laws adequately protect people (kids)? Can they? Will this force a shift towards actually chasing producers, distributors, and diddlers?)?

Porn is clearly an in demand market. But what does surprise me is that there's been a lot of work in depth maps and 3D rendering from 2D images in the past few years so I'm a bit surprised that given how popular VR headsets are (according to today's LTT episode, half as many Meta Quest 2s as PS5s have been sold?!). I mean if VR headsets are actually that prolific it seems like there'd be a good market for even just turning a bunch of videos into VR videos, not to mention porn (I don't have a VR headset, but I hear a lot of porn is watched. No Linus, I'm not going to buy second hand...). I think all it takes is for some group to optimize these models like they have for LLaMA and SD (because as a researcher I can sure tell you, we're not the optimizers. Use our work as ballpark figures (e.g. GANs 10x Diffusion) but there's a lot of performance on the table). You could definitely convert video frames to 3D on prosumer grade hardware (say a 90 minute movie in <8hrs? Aka: while you sleep).

There are a lot of wild things that I think AI is going to change that I'm not sure people are really considering (average people anyways or at least stuff that's not making it into popular conversation). ResNet-50 is still probably the most used model btw. Not sure why, but just about every project I see that's not diffusion of an LLM is using this as a backbone despite research models that are smaller, faster, and better (at least on ImageNet-22k and COCO).

[0] https://www.bbc.co.uk/news/world-europe-66877718


SDXL and ControlNet are already optimized, if thats what you mean: https://github.com/chengzeyi/stable-fast

(Note the links to various SD compilers).

But the whole field is moving so fast that people aren't even adopting the compilers and optimized implementations at large.


Not really what I mean. I mean TensorRT is faster than that according to their README. By optimized I'm specifically pointing to Llama cpp because 1) it's in C, 2) using quantized models, 3) there's a hell of a lot of optimizations in there. The thing runs on a raspberry pi! I mean not well but damn. SD is still pushing my 3080Ti for comparison.

But I wasn't thinking diffusion. Models are big and slow. GANs still reign in terms of speed and model sizes. I mean the StyleGAN-T model is 75M params (lightweight) or 1bn (full) (with 123M for text). That paper notes that the 56 images they use in the Fig 2 takes 6 seconds on a 3090 at 512 resolution. I have a 3080Ti and I can tell you that's about how long it takes for me to generate a batch size of 4 with an optimized TensorRT model. That's a big difference, especially considering those are done with interpolations. I mean the GAN vs Diffusion debate is often a little silly as realistically it is more a matter of application. I'll take diffusion in my photoshop but I'll take StyleGAN for my real time video upscaling.

But yes, I do understand how fast the field is moving. You can check my comment history to verify if register isn't sufficient indication.


>do laws adequately protect people (kids)? Can they? Will this force a shift towards actually chasing producers, distributors, and diddlers?

It's extremely complicated. Actual CSAM is very illegal, and for good reason. However, artistic depictions of such are... protected 1st Amendment expression[0]. So there's an argument - and I really hate that I'm even saying this - that AI generated CSAM is not prosecutable, as if the law works on SCP-096 rules or something. Furthermore, that's just a subset of all revenge porn, itself a subset of nonconsensual porn. In the US, there's no specific law banning this behavior unless children are involved. The EU doesn't have one either. A specific law targeted at nonconsensual porn is drastically needed, but people keep failing to draft one that isn't either a generalized censorship device or a damp squib.

You can cobble together other laws to target specific behavior - for example, there was a wave of women in the US copyrighting their nudes so they could file DMCA 512 takedown requests at Facebook. But that's got problems - first off, you have to put your nudes in the Library of Congress, which is an own goal; and it only works for revenge porn that the (adult) victim originally made, not all nonconsensual porn. I imagine EU GDPR might be usable for getting nonconsensual porn removed from online platforms, but I haven't seen this tried yet.

I'm disgusted, but not surprised, that teenage kids are generating CSAM like this. Even before we had diffusion models, we had GANs and deepfakes, which were almost immediately used for generating shittons of nonconsensual porn[1].

[0] https://en.wikipedia.org/wiki/Ashcroft_v._Free_Speech_Coalit... and the later https://en.wikipedia.org/wiki/United_States_v._Handley

[1] https://www.youtube.com/watch?v=OCLaeBAkFAY


> AI generated CSAM is not prosecutable

This is true, though "AI CSAM" is an oxymoron. There is no abuse in the creation of such works, and such it is not abuse material, unless of course real children are involved.


The post I was replying to specifically cited a Spanish AI CSAM ring that did, in fact, involve real child victims: https://www.bbc.co.uk/news/world-europe-66877718


Damn, that sucks.


I get your argument, but there are definitely laws about cartoon underage characters. Agree or disagree the difference is that today you don't need to be a highly skilled artist to make something that people are going to fap to. (I definitely agree priority should be focused on physical abuse and the people making the content, but this whole subject is touchy).


Do non-consensual porn not qualify as defamation? That and obscenity laws if existed should be able to handle most hyperrealistic porn so that only speeches remain.


Good question. US defamation law is fairly weak[0], but all the usual exceptions that make it weak wouldn't apply. e.g. "truth is an absolute defense against defamation" doesn't apply because AI generated or photoshopped nonconsenual porn is fake. I'm not a lawyer, but I think a defamation case would at least survive a motion to dismiss.

[0] Which, to be clear, is a good thing. Strong defamation law is a generalized censorship primitive.


Could this perhaps fall under something like trademark, like an unauthorized use of self, I'm sure I've heard of some celebrity cases that were for similar.


You're probably thinking of the Right of Publicity laws some US states have.


> I'm disgusted, but not surprised, that teenage kids are generating CSAM like this. Even before we had diffusion models, we had GANs and deepfakes, which were almost immediately used for generating shittons of nonconsensual porn

I think the big difference now is that 1) it's much easier to do now, and 2) the computational requirements and (more importantly) technical skills have dramatically dropped.

We should also be explicitly aware that deep fakes are still new. GANs in 2014 were not creating high definition images. They were doing fuzzy black and white 28x28 faces, poorly, and 32x32 color images that if you squint hard enough you could see a dog (https://arxiv.org/abs/1406.2661). MNIST was a hard problem at that time and that's 10 years. It took another 4 years to get realistic faces and objects (https://arxiv.org/abs/1710.10196) (mind you, those images are not random samples), another year to get to high resolution, and another 2 to get to diffusion and another 2 before those exploded. Deep fakes were really only a thing within the last 5 years and certainly not on consumer hardware. I don't think the legal system moves much in 10 years let alone 5 or 2. I think a lot of us have not accurately encoded how quickly this whole space has changed. (image synthesis is my research area btw)

I'm not surprised that these teenagers in a small town did this. But the fact that all those adjectives exist in that order is distinct. Discussions of deep fakes like that Tom Scott video were barely a warning (5 years is not a long time). It quickly went from researchers thinking it can happen in the next decade and starting discussions to real world examples making the news in under their prediction time (I don't think anyone expected how much money and man hours would be dumped into AI).


Open AI has a pretty robust and profitable business without Microsoft. In every enterprise I’ve been involved with over the last few years we have had some incredibly material and important use cases of OpenAI LLMs (as well as Claude). They aren’t spewing slop or whatever, they’re genuinely achieving valuable and foundational business outcomes. I’ve been a bit stunned at how fast we’ve achieved these things and it tells me that the AI hype isn’t hype, and that if we have done these things in a year, it’s hard to estimate how much impact the technologies will have in five but I think it’s substantial. So is our spend with OpenAI. Or rather, with Azure on OpenAI products. The only value from our experiences Microsoft offers is IAM - which is sufficient frankly.

Stability and Midjourney are also making money, but it’s largely with amateurs and people prototyping content for their own creations. A lot of single person Indy game developers are using these tools to generate assets, or at minimum first pass assets. I think a lot of media companies are producing the art for their articles or news letters etc using these tools. Whether this is enough, I don’t know.


Sigh I'm really tired of seeing people assume OpenAI is profitable. We have no idea of they are or not and have some indication that they're incinerating money on chatgpt to the point that they're turning off sign ups because they're out of compute.


I'm not sure that's a useful argument insomuch as A) its unfalsifiable until they IPO (reporting indicates they are very, very profitable) and B) running out of GPUs seems like an odd thing to name as an indicator you're _losing_ money.

People are genuinely way, way, way underestimating how intense the GPU shortage is.


I haven't seen any reporting about their costs. I'm sure they're making bonkers numbers in revenue, but that doesn't mean they're profitable if they're losing money on every gpt call. You're saying my claim that they're unprofitable is unfalsifiable. The same is true for claims that they're wildly profitable. We just don't know the financial state of the company because they're not public.


My understanding, which I can’t prove other than to say it comes from folks affiliated with OpenAI, is that chatgpt doesn’t make money but also doesn’t lose money (in aggregate, some accounts use way more than others but many accounts are fairly idle), and their API business is profitable and accounts for most of their GPU utilization. I have no insight into why they would turn off signups for chatgpt other than they may need the capacity for their enterprise customers, where they make a decent margin.


Isn't literally every imagegen AI that's not DALL-E or Midjourney based on Stable Diffusion?


There are exceptions, e.g. https://generated.photos/human-generator uses a GAN based model.

Edit: Also, Adobe uses its own model for Photoshop integration (inpainting via cloud). That model seems to be the same as this one: https://www.adobe.com/sensei/generative-ai/firefly.html


Are we sure that those arent based on stable diffusion?

No code black box, and we get to tease the closed source companies for wrapping FOSS stuff.

Midjourny I'm most convinced is just a SD with a fine-tuned model. That would explain why everything looks like pixar and can't follow the prompt.


Given that Midjourney predates StableDiffusion, that seems unlikely, though it is possible they threw away all their hard work to create their model to use one that's available to other people for free and then charge money for it.


The EU views Google as a tobacco company. The last thing they want to do is bankrupt them with fees. They want to milk Google - big tech generally - for tax revenue. And besides, it'd take $100 billion per year in fees, which is never going to happen. Meanwhile Google keeps getting bigger year after year (they have nearly doubled in size in four years, up to $300b in sales now) and Bing has made zero headway despite the AI-angled efforts (BingChat etc). Maybe the mainstream adoption of GPT (or similar) would severely damage Google, there's still a lot of time left for Google to take their shot at getting out in front of that outcome.

The US breaking Google up also won't end the monopoly money. More likely one of the children will spin out with an even better margin business.


> The US breaking Google up also won't end the monopoly money.

It's also weird because modern economics with tech has created a space that creates a lot of natural monopolies. Momentum is very powerful and it's the reason silicon valley companies will run at a loss for years creating a userbase. Trick is to keep them (or sell before buyer starts charging). There's what, 2 map companies and only one of them is in high usage because it's a default? Same with browsers. You can't compete because making a better product isn't enough in a system where network effects dominate the economics. Hardware companies are seeing that (along with patent issues, especially across borders). I have no idea how to think about this tbh because it is weird. In some cases breaking these companies up ultimately destroys them while in other cases, exactly what you say.


Seems like a great place to mention the MapQuest API. I tested it against 4 other services(inc. Google's) on 200 manually verified geocoding/reverse-geocoding tasks. Google managed a 92% whereas MapQuest scored 99%.

If you are doing geocoding or reverse-geocoding MapQuest outperforms Google Maps quite magnificently(YMMV). The cost is also lower and there are plans where you can keep the data.

The funny thing being that the C levels didn't give a shit about the results and went ahead with Google Maps anyway. So your point remains correct.


What were the others? Google Maps, MapBox, OSM are the three that come to my mind.


Waze? But my hyperbole aside ("only one"), how many people do you know use these other platforms? Colloquially people call things monopolies if they have sufficient market share, not absolute (which nearly never exists), or market collusion exists (e.g. ISPs, airlines, oil). People even do it for 2 dominating companies (e.g. Coke + Pepsi only controls 71% of market share (46.3+24.7)). Because the truth is that monopolies aren't always bad, they are just dangerous because they have so much weight that they can perform abusive tactics and that's the thing we actually care about. My whole point is that this gets to be a very sticky situation when the product is the market share (the more people that use Google Maps the better Google Maps gets. But maybe social networks are a clearer example).


Natural monopoly means that the market only has enough demand for one supplier. Canonical examples would include highways, railways, local telephone networks, and residential Internet access providers.

Most of the big tech companies we love to hate aren't natural monopolies. Google's anticompetitive moat is primarily made of inertia: they were the first to market with a halfway functional search index. Defaults are very sticky: Microsoft pushes Bing in Windows a lot because a lot of people don't know how to change the default search engine to anything else.

Browsers are monopolized because we let Apple do to iOS what we refused to let Microsoft do to Windows: make the platform browser the only option. In the heyday of Firefox, switching from IE was just a matter of downloading an installer and migrating your bookmarks and history. You can still do this (and I recommend that you do), but if you have an iPhone, it's more complicated. You can technically use an app called "Firefox", but it's using Safari under the hood. Apple won't let you port your own browser engine, which limits how you can improve the experience. So switching to Firefox on iOS is pointless.

Sometimes these monopolies interact. Google built Chrome specifically so they'd get a seat at the web standards table. They got people to use it by giving free advertising space to it on their homepage, and then they used their market position to introduce ridiculous numbers of new web APIs that every other browser vendor now has to reimplement to make Google properties work[0]. Microsoft and Opera found this to be such a hurdle that they switched their browsers over to Chromium. Firefox had to split engineering time up between implementing Chrome features, chasing down security bugs, and adopting multiprocess support, which slowed them down.

At no point is any of this 'natural'. The market can support five technologically independent browsers, but not when Google is trying to actively sabotage them.

[0] The most egregious example being Shadow DOM v0, which was never actually adopted as a standard, but used in YouTube's version of Polymer for years after Shadow DOM v1 was standardized. Firefox never implemented v0, so they were stuck running lots of polyfill code that Chrome wasn't.


> Natural monopoly means that the market only has enough demand for one supplier.

That's not true. Natural monopolies also form due to network effects. You can follow the story of Bell Labs for an early example, which I suspect you're aware of. The reason for a monopoly wasn't for lack of demand it was about tragedy of the commons.

In our modern tech economics we have similar tragedy of the commons but a bit more abstracted. The thing is most products are a result of their userbase, not the product itself. Look at HN. Or look at Reddit or YouTube. It may look like circular logic (because it is a feedback loop), but the fact that everyone is publishing on YouTube makes YouTube bigger and more useful to the consumer which causes more people to publish on that platform because there are more users. It then makes it very difficult to compete because what can you do? You can make a platform that is 100x better than YouTube in respects to serving videos, search, pay to creators, and so on, but you won't win because you have no users and you won't have users without creators who aren't going to produce on your platform because there are no users. In other words, there's a first mover disadvantage. You're an early user and the site sucks because it has no content but you're a true believer. You're an early creator and your pay sucks because there are so few viewers (even if your pay per viewer is 100x your pay per video/time spent is going to be 10000x less because there are 1000000x fewer users). Look at PeerTube, Nebula, or FloatPlane. They are all doing fine but nowhere near as successful as YouTube which everyone hates on (and for good reason). Hell, when YouTube started trying to compete with Twitch they had to REALLY incentivize early creators to move with very lucrative deals because they were not buying the creator, they were buying their userbase. It should be a clear signal that there's an issue if a giant like Google has a hard time competing with Amazon.

For a highly competitive market you need a low barrier to entry so that you can disrupt. There are thousands of examples where a technology/product that is superior in every way (e.g. price and utility) but are not the market winners because network effects exist. Even things like BetaMax vs VHS is a story of network effects (I wouldn't say BetaMax dominated VHS, but not important), because what mattered was what you could get at a store or share (via your neighbor or local rental).

And I'm glad you mention Firefox, because it's a good example of stickiness. I've tried to convert many friends who groan and moan about how hard it is and make up excuses like bookmarks and literally showing them that on startup it'll export for you they just create a new excuse or say UI/UX is trash because the settings button is 3 horizontal lines instead of 3 vertical dots so they can't find it despite being in the same place or tabs are not as curved so its "unusable." You might even see these comments on HN, a place full of tech experts.

What I'm getting at here is that the efficient market hypothesis is clearly false and market participants are clearly not rational (or at least based on the conventional -- economic -- definitions)


There's quite a few hosted SDXL platforms (mage.space, leonardo.ai, novel.ai, tensor.art, invoke.ai to name a few) and most consumers do not have the GPUs needed to run those models, only enthusiasts do.

It's always baffled me that stability didn't offer a competitive UI platform to use their models with, clipdrop is just bad quality and very bare-bones, and dreamstudio is pricey and still lacks most features. So this move to a new licensing strategy doesn't surprise me, it actually is somewhat comforting, as i expecting them to just stop releasing further trained models (e.g sdxl1.1 and up), and only offer those on their services (of course, that can still happen) cause how else were they going to monetize the consumers (i know they (planned to) offer custom trained/finetuned models to big corps, but that doesn't monetize consumers).

However, as most releases by stability these days, it has this feeling of close-but-no-cigar, and the recent LCM lora's might be a little slower, but these actually offer 1024^2 resolution, work with any existing lora's and finetunes (so they are usable for iterative development, unlike this turbo model, cause well, it's a different model, can't iterate on it then expect sdxl (with lora's, to a lesser extend also without) to generate a similar image) and support cfg-scale (and therefor negative prompts / prompt weighting). I suppose there's some niche market where you need all the speed you can get, but unless there's a giant leap in (temporal) consistency, that will remain niche, i don't see the mentioned real-time 3d "skinning" neither the video img-to-img (frame-to-frame) gimmicks take off with current quality and lack of flexibility. It's good research, optimizations have lots of value, but it needs quality as well.

Their recent video model is quite bad as well, especially compared to pika and runway gen-2, but well, but as with the the dalle-3 comparison one can say those are closed source and stability's offering is open.

Then we have the 3d model, close sourced, worse than luma's genie unfortunately.

The music model is nothing like suno's chirp (which might be multiple models, bark and a music model) used together), and the less said about their llm offerings the better.

Bottom line, stability needs a killer model again, they started strong with stable diffusion 1.5, took a wrong turn with 2.0 (kind of recovered by 2.1, but the damage was done), and while SDXL is't bad in a vacuum, neither was it the leap ahead that put it in front of competition like midjourney at the time, and Dalle-3 a little later, and now even a relatively small model like pixart-alpha, also opensource, can offer similar quality to what sdxl offers (with a lot of caveats, as it has been trained on so few images it just doesn't have info on many concepts). And more worrying, there's no hint of something better in the stability's pipeline. But maybe image-gen is as best as stability can get it, and they think they can make an impact pivoting in another direction or multiple directiobs, but currently, it feels a master-of-none situation.


It's not just porn, there is also copyright infringement, fraud, etc.


Hugging Face released a Colab Notebook for generation from SDXL Turbo using the diffusers library: https://colab.research.google.com/drive/1yRC3Z2bWQOeM4z0FeJ0...

Playing around with the generation params a bit, Colab's T4 GPU can batch-generate up to 6 images at a time at roughly the same speed as one.


Is Euler Ancestral recommended for this? I thought the Ancestral samplers added noise to every step, preventing convergence.


thanks for sharing

I got to experience the power of current models with just 5 lines of code

the pace of change is stressing me out :)


Does Turbo (the model, not this particular notebook) support negative prompts?


So the new SD model requires higher end hardware compare to the rest?


No, it's a smaller model than normal SDXL so it requires less hardware compared to SDXL.


I've been mucking with this stuff again and the SDXL + LCM sampling & LoRA makes 1280x800 images in like 2 second, so about a ~5x speed increase for me (so this would be roughly 2x faster than LCM (??, napkin math)). I've found that the method isn't as good at complex prompts. They claim here this can outperform SDXL 1.0 WRT prompt alignment, but I'm curious what their test methodology is. I searched the paper and I couldn't immediately find how it was evaluated. I think these sorts of subjective measurements are fiendishly hard to quantify given the infinity of possible prompts. Still, exciting stuff always happening here, what a time to be alive.


the use case here is really for segmented inpainting.

don't like a part of an image? replace it instantly


There are SO many use cases! I maintain we're not even scratching the surface here. You could programmatically reskin video based on crowd participation, you could re-texture VR spaces on the fly. The space of cool shit that you can do with this stuff is growing far faster than we're able to explore it right now.


[flagged]


?


I can't reply to dead comment, but I'll reply to yours RE above:

> You can rip off other people's work faster than ever!

I have views on IP that mean I reject the ripping off premise on it's face, BUT IF I DID ACCEPT IT

Who am I ripping off? What artist is being denied work if i reskin a VR world on the fly with AI? nobody was going to be painting a scene's worth of textures in a few seconds, the AI is enabling new cool things that haven't been seen before, y'all are absolutely out to lunch replying to a comment specifically calling out brand new things that weren't possible before with this garbage.


It's so interesting to me that hackernews will flag any comment that dissents to the use of this technology. Supposedly it's irrelevant to the conversation.

The value of these models is derived from the training set, not the ml model. Take away the training data and the model does nothing. So who cares if you reskin your vr world on the fly with AI? I'd argue many artists whose work was ingested into these models without consent care very much about that. Many of them have voiced their concerns publicly. So yeah, go ahead and use this bullshit. But you don't get to just ignore the ethics around it. If you want people like me to shut up and let you enjoy your automated slop, use licensed training sets. Until then, this technology is built on exploitation and alienation.

Also, you don't just get to dismiss intellectual property because you don't like it. It would be awesome if we lived in some utopia where people didn't need to leverage their skills to eat and pay rent. We don't live in that economy, and I don't think corporate AI is going to get us there unless Microsoft shareholders suddenly become bleeding hearts. Nearly every single dev on this site makes their living off of proprietary code, so it's really rich for y'all to just dismiss the concerns of people whose work is being used for this without permission.


I have the opposite view of you on the IP front (former 'pirate'), but your critique of HN flagging is spot on.

How do you interpret modern web design where everything looks like everything else? Is the next boring SPA also problematic IP usage?

These are genuine questions and not meant as an irritant. My view on IP is fairly extreme and I appreciate views that discourage such. Nothing you said about artists needing to eat moved the needle, though I can understand why it would for another person.


It has been a while since I last tried, but I never had very good results when I tried inpainting with SDXL in comparison to the SD1.5 inpainting models.


They seem to be comparing against SDXL 1.0 at 512x512 which makes no sense to me as SDXL 1.0 is horrible at 512x512.

> Using four sampling steps, ADD-XL outperforms its teacher model SDXL-Base at a resolution of 512^2 px


The license is non-commercial, but:

> For clarity, Derivative Works do not include the output of any Model.

https://huggingface.co/stabilityai/sdxl-turbo/blob/main/LICE...

Doesn't that mean that generated images from it should be fine for commercial?


I'm not a lawyer, but I believe there was a court ruling that says AI generated works cannot be copyrighted. So you could use them, but couldn't stop anyone else from doing what they want with them


> there was a court ruling

What jurisdiction? USA?



Actually, no. This is just another example of a headline leaving out important details of the actual case. In this case the plaintiff actually named the AI as the producer, not themselves. From the case:

"the sole issue of whether a work generated entirely by an artificial system absent human involvement"

This leaves a lot of wiggle room for AI created art with some form of human involvement. I.E. import the generated image into Photoshop and edit it. Perform some inpainting to improve certain parts. Possibly even prompting and configuring things in Automatic1111 might be regarded as human involvement.

This is going to bring many legal procedures before there's a clear answer to whether AI generated art can be copyrighted.


it doesn't touch trademarks either. If I generate a recognizable picture of a trademarked icon, I still can't use it for my own commercial purposes, even if it's not covered under copyright.


No, it just means that if they sue you, they're pre-committing to not try and foreclose on your own generated outputs by claiming they're derivatives that they would then own.

Of course, this is a water sandwich. If model outputs are derivatives of the model, it'd be difficult to argue that the model itself isn't a derivative of all the training data, most of which isn't licensed. So if anything, this covers Stability's ass, not yours. There's also the related question of if AI models - not their outputs, just the models themselves - have any copyright at all. The logic behind the non-copyrightability of AI art would also apply to the AI training process, so the only way you could get copyright would be a particularly creative way of organizing and compiling the training dataset.

Remember: while the "AI art isn't art" argument reeks to high heavens of artistic snobbery, it's not entirely wrong. There isn't a lot of creative control in the process. Furthermore, we don't give copyright to monkeys[2], so why should we give it to AI models?

"Noncommercial" isn't actually a thing in copyright law. Copyrighted works are inherently commercial artifacts[0], so if you just say "noncommercial use is fine", you've said nothing - and you've invited the legal equivalent of nasal goblins[1] into the courtroom. Creative Commons gets around this by defining their own concept of NonCommercial use. So what did Stability's lawyers cook up?

> “Non-Commercial Uses” means exercising any of the rights granted herein for the purpose of research or non-commercial purposes. Non-Commercial Uses does not include any production use of the Software Products or any Derivative Works.

Uh... yeah. That's replacing a meaningless phrase with a tautology. Fun. The only concrete grant of rights is research use, and they categorically reject "any production use", which is awfully close to all uses. Even using this to generate funny fanart mashups for your own personal enjoyment could be construed as a 'production use'. Stability could actually sue you for that (however unlikely that would be).

[0] In the eyes of the law. I actually hate this opinion, but it's the opinion the law takes.

[1] Under ISO 9899, it is entirely legal for C programs with undefined behavior to make goblins fly out of your nose.

[2] https://en.wikipedia.org/wiki/Monkey_selfie_copyright_disput...


I think the artists get to say whatever they want about this bullshit considering it's entirely dependent on their work to even function. I don't think the AI community gets to call the people who make the models possible at all snobs or anything else. The AI people didn't make shit. They should have some respect for the people that do.


The so called "AI People" built the entire architecture, something people didn't think was possible at the scale and quality a year ago, and the matter of "artists should get whatever they want" because it trained on their works isn't the point. Diffusion Models don't rip parts of pictures together, they happen to be trained to make art out of noise, finding patterns in art. same things happening with LLM's in court with the LLama model and book authors claiming it only makes books.

I still remember that piece of art that was submitted to an art contest and won, only to be announced an SD prompt later


These models can't exist without the training sets. Their value is entirely derived from existing data. The ml architecture does not matter at all. Sure, throw enough compute and data at a problem, do a little parallelization, and you can extract plenty of patterns. Does that mean the ml engineers understand art? Or are they just using glorified brute force to alienate people who actually make things from their labor? No, I have very little respect for the AI people. Once you get over the novelty, their creations inspire little else beside disgust. They seem to take pride in how little they understand about the models they create.


Do humans understand art? Does it matter? How do humans learn to create art? By looking at other peoples art, for the most part. Those other artists are not compensated for this either, nor do they need to be credited. If you exactly copy another persons art style, that may be frowned upon by some, but otherwise it is of no consequence, unless you claim the work is actually made by that artist. If you believe that humans should be afforded a privilege that machines (or rather their operators) should not be afforded, make your case on why.


You're the millionth person to make the argument that we should treat AI learning the same as human learning. It's still a bad argument, but I'm getting very tired of explaining why treating a human the same as a computer program sucks ass.


You're the millionth person to make the argument that AI art sucks. That didn't stop me from typing up a reply for you. Why don't you link one of your previous replies?


If these creations inspire such violent disgust, then it's likely that you perceive them as authentic art. If AI images were devoid of meaning or value, they wouldn't have sparked such passion.

You cannot claim ownership over culture, nor can AI. Culture is a collaborative process, and no one can barricade themselves from the input of others. Artists using AI are simply exercising their right to contribute to the collective creative pool. Art flourishes in an open environment where it can stimulate other artistic endeavors. The only art off-limits to AI is the art that remains unpublished.


This is, imo, an extremely naive take. You claim culture is a collaborative process, yet AI only takes from the communities that produce art. It gives nothing back. You claim AI produces culture, but all it does is atomize our society, promising personal yet meaningless experiences for everyone. There's no shared culture if everyone is just consuming individualized streams of content. It's simultaneously homogenizing too, producing uninteresting torrents of homogenous images from the same model. This also harms culture, stamping out uniqueness under the weight of thousands of meaningless images flooding online art spaces. You claim the only art that's off limits to AI is that which remains unpublished, yet you continue to use the labor of others without permission, discouraging people from publishing their work in the absence of any protection for that work. You claim AI will make art more open, yet most of these models are built and operated by massive corporations with closed source code. They steal from the public and cry out fair use while trying to build walled gardens they can monopolize.

So I'm sorry but there's an argument to every point you're making. I trust the artists I speak to far more than the proponents of this technology. At least they're striving for something genuinely instead of making disengenuous claims about "democratization".


I don't try to swim against the current. But you're welcome to do it.

What do you mean AI doesn't give back? It serves everyone and gives back everything it can create. Artists are the number one users here, and they will unlock the AI skills better than regular people playing around.

What individualized streams? you mean like imagination, where everyone of us has their own "individualized stream"? AI art is augmented imagination. No obstacle in sharing, in fact it's easier now. You don't need to be an artist to create depictions of your imaginations, and sharing a generated JPG is much easier than drawing it by hand.

> yet you continue to use the labor of others without permission

That's how culture works. The artist who never took inspiration from the cultural environment should throw the first stone. Pablo Picasso is widely quoted as having said that “good artists borrow, great artists steal.”


I think you (and possibly __loam) are talking past one another.

You're assuming that artists are just being elitist. While that's not entirely untrue, artistic skill is not merely a gatekeeping exercise. "Just prompt what you're thinking" is great until you need at least a little bit of control over what the AI generates, upon which the whole process disintegrates into banging your head at the model until you get something you want. Furthermore, art skills don't transfer to prompt engineering very well - that's more the realm of SEO keyword stuffers.

What __loam is imagining is that the best use of generative AI right now would be to create content slurry. If you've ever used TikTok or YouTube shorts you know what I mean - the vast majority of videos there are very cheaply made dopamine traps. And while AI can sorta kinda do art if you ask it politely and fight it a bit, it's really good at generating statistically plausible imitations of existing images[0]. Being able to generate lots of normal looking images for little effort is a grifter's best friend, and there's loads of people on YouTube bragging about how they make lots of money by spamming up art marketplaces with artistically meaningless pablum.

It doesn't matter how you make your art. You will be competing with the people who are shitting out spam art, and losing.

[0] To be clear, this is not the same thing as a photo mashup. I would actually be impressed by an AI that could take images and mash them up at inference time.


Yes, the vast majority of AI images so far are surprising but not better than human made art, and same for LLM generated text. Maybe they are a new kind of AI slurry, but that's just a phase. If you compare generative art one year ago vs today, or even six months ago vs today you know I am right. It won't be slurry forever. In fact I believe the internet will become an AI feedback system, and much of it will be built with AI.


That sounds completely horrible.


IANAL but it sounds more like SAI is not responsible for any outputs generated using their models and how those are used.


Works with Automatic111. Generated 20 512x512 on a lowly RTS 2070S with 8GB RAM.

Prompt: a man

Steps: 1, Sampler: Euler a, CFG scale: 1, Seed: -1, Size: 512x512, Model hash: e869ac7d69, Model: sd_xl_turbo_1.0_fp16, Clip skip: 2, RNG: NV, Version: v1.6.0

Examples: https://imgur.com/a/UuuT9qu


I've done a bit of fiddling around with it and definitely holding back judgement for now, seems like the 1 and 2 step images are WAY more coherent than LCM, but the images are kinda trash for any kind of prompt complexity so you start to have to use more steps, and since the individual steps take the same amount of time (I think there's a specific sampler for this which may be faster & better?) by the time you start prompting details you end up using 4 steps and the perf is about the same as LCM, and that breaks down the same way as you start going for more complexity (text, coherent bg details etc) because you end up needing 10-15 steps and at that point you're going to get a much better result from full-fat SDXL x dpmpp3msdee (lol)

Curious to see the bigbrain people tackle this over the next few days and wring all the perf out of it, maybe samplers tailored to this model will give a notable boost.


Are you finding dpm++ 3M SDE better than dpm++ 2M SDE in sdxl?

Afaik the second order (2M) version is the recommended one to use for guided sampling vs the 3rd order one.

From here: https://huggingface.co/docs/diffusers/v0.23.1/en/api/schedul...

> It is recommended to set solver_order to 2 for guide sampling, and solver_order=3 for unconditional sampling.


It might be placebo, but I find 3M better for upscaling, when I usually set CFG quite low and use a generic prompt that doesn't describe any localised element of the picture.

Which is what it's meant for, I suppose.


In my tests it's basically been 50/50, i probably did ~40 or so comparisons when i was testing samplers and i felt like there were a couple that seemed really good on the 3rd order one, but idfk, it was very very close, I don't know if I saw a single gen where one of the two was bad but the other wasn't.


I just want to point out that I’ve noticed sdxl isn’t good at producing images that are 512x512 for some reason. It works much better with at least 768x768 resolution.


Normal SDXL requires 1024x1024 output or the quality degrades significantly.


Hi Max! Thanks for all the tuts! Small correction - SDXL wants ~1 megapixel resolutions at a variety of aspect ratios.

https://github.com/lllyasviel/Fooocus/issues/24


Fair point, although in my experience SDXL still isn't great at non-square ratios. I end up just cropping them to the ratio I want.


What have you seen to be the issue? Composition? Realism? Prompt adherence? I’m just finishing a project having generated tons of images at a mix of ~6 aspect ratios and I haven’t noticed any difference.


Indeed, however one of the listed limitations is: The generated images are of a fixed resolution (512x512 pix), and the model does not achieve perfect photorealism.


I did with Automatic1111 as well. With an RTX 4090 with 24G VRAM, steps 1, cfg scale 1, size 512x512, model sd_xl_turbo_1.0. I was generating more than four images a second.


Instruction here for how to use it on iPhone / iPad / Mac today: https://twitter.com/drawthingsapp/status/1729633231526404400 (note that you need to convert 8-bit version yourself if you want to use it on < 8GiB devices).


I remember when you shared the first version on HN, I was totally impressed with what my little phone could do. It was my first step of many into the incredible world of SD. I never expected this app to be maintained this long and especially this actively. And all of this for free.

So thank you very much for your work!

PS, will you be adding Turbo directly to the app in the near future?


> PS, will you be adding Turbo directly to the app in the near future?

I need to get some clarity from Stability given the new Membership thing. I personally see no reason why I cannot as long as I tell people these weights are non-commercial / academic only.


Does anyone have any idea of when/if there will be a simple way to get commercial access in one step? Like an API or something? Or if they want to charge for it, then maybe a web check out?

It's interesting that they finally decided to try to make something commercially restricted.

Do they have a watermark or anything that they can use to track down people who use without a license?

Also, is there anything like an open source (commercial allowed) effort at reproducing this?


Open version predates this one: https://huggingface.co/blog/lcm_lora


Yes but Turbo is 4 times faster.


about my 8th prompt, it told me I had used up my usage and offered an option to sign up for "pro".


Me too, but I need an API.


It's... fast, yeah, but the quality is low. Not sure what I'd be using this for.


Based on the demo, that's... incredibly fast. Literally generating images faster than I can type a prompt. They've clearly got a set seed, so they're probably caching request, but even with prompts that they couldn't possibly have cached it's within a second or so.


I hope playgroundai.com adopts this asap, but not sure they can with that non-commercial bit...


This isn't cached. I'm running it locally on a 4060 TI 16 GB and it's just as fast. Image gens in .6-.8 seconds. Each word or character I type is a new image gen and it's INSTANT.


the clipdrop demo doesn't inspire much confidence with how bad the generations are, we're talking 2021 levels, not to mention everything is NSFW somehow, should probably work on those filters.


I tested a bit and the quality for photorealistic images is surprisingly bad, and definitely worse than LCM and of course normal SDXL. For more artistic images, SDXL Turbo fares better.

Unlike normal SDXL, you're required here to use the old-fashioned syntatic sugar like "8k hd" and "hyperrealistic" to align things.


In my own testing (using ComfyUI), the best of the "fast gen" techniques for sdxl is using the Turbo model [0], but using the LCM sampler with the sgm_uniform scheduler (which is normal for LCM) with it, and running it up to 4-10 iterations instead of just one. I think StabilityAI demos are using Euler A with the normal base scheduler, and running a single iteration (which is cool for a max-speed demo, and its awesome for that speed, but its leaving a lot of quality on the table that you can get with a few more iterations especially with the LCM/sgm_uniform sampler/scheduler combo.) Bumping CFG up slightly helps, too (but I think adds another performance hit, because I think the demos are running at CFG 1, which AIUI disables CFG and reduces computations per iteration.)

> Unlike normal SDXL, you're required here to use the old-fashioned syntatic sugar like "8k hd" and "hyperrealistic" to align things.

That's not "syntactic sugar", and its not particular my experience that it is needed with sdxl turbo.

[0] actually, differencing the base sdxl model from the turbo model to get a "turbo modifier", and then combining that with a good SDXL-based checkpoint, because StabilityAI's base models are pretty ho-hum compared to decent community checkpoints derived from them, but that is kind of a peripheral issue.


Are there any good resources for learning a lot of "syntactic sugar" terms? This is new to me, but I'd love to know more.


It is completely dependent on the model. Civit dot ai has model showcases as well as fine-tune showcases, and you can click any image or press the (i) to see the generation info.

Some models like natural language prompts - "draw me a pterodactyl tanning at a beach", some prefer shorthand (danbooru style clip) - "1man, professor, classroom, chalkboard, white_hair, suit", and some work with a mixture of the above as well as the syntactical sugar -"masterpiece, 8k, trending on artstation, space image, a man floating next to a spaceship in space, bokeh, rim lighting, cinematic lighting, Nikon D60, f / 2"

Fine-tuning models - LoRA, etc, allow one to convert prompts from one style to another if they wish, but usually it's to compress an idea, style, person, object, etc in to a single "token", so you can work on other aspects of the image.

Check out civit AI and you can sort of get an idea of the cargo cultism as well as what sort of keywords actually make a difference.


> Civit dot ai

The site you are thinking of is https://civitai.com/ not "civit dot ai".



#LearningInPublic strikes again! I love you swyx!


haha thank you ser. best way to thank me is to start your own LIP practice and pass it on :)


Yeah, it looks like the enshittification of StabilityAI is in full force by now. Especially considering the continually worse licensing.

I expect if they ever manage to release an image gen model that's an objective improvement, lets say 80% as good as dalle3, it will be subscription API only.


Are you serious?

I'm using Stability in production: they kept their SDXL beta model which was capable of SDXL 1.0 level prompt adherence at a fraction of the cost up for months after was reasonable for a one-off undocumented beta, and it was a huge boon to my product.

Then a few weeks back they went and quietly cut costs to 1/5th or so what they were for SDXL and released a model that produced similar quality outputs to SDXL for my specific usecase in a fraction of the time (SD 1.6)

They're on fire as far as I'm concerned, just quietly making their product cheaper and faster.

Also Dalle 3 is in a very awkward place for programmatic access, so awkward I wouldn't call them competitive to SD for many usecases: It's got a layer of prompt interference baked in, it's expensive, latency is not very consistent. Text is a cool trick but it's still not reliable enough to expose as a core part of the generation for an end user.


Sounds like they’re doing the same thing OpenAI is doing. Claiming to favor open models but the reality is they’re pumping growth by reducing costs and this lowering prices. They want a massive chunk of this new market, all of it if they can get it. Their perceived valuation then becomes a matter of how many eyeballs they have looking at segments of their website to advertise to, or how many data points they can collect on their users to sell to advertisers. It’s unlikely they can capture the whole market and still make a chunky enough profit to satisfy investors if they also intend to keep prices high enough without needing to resort to enshitification.


This would be a lot more pithy if it weren't in the comment section of a post that showcases exactly how they were likely able to make 1.6 cheaper, and open sources the underlying tech.

There couldn't be a more perfect rebuttal to this theory than the post you decided to leave it under.


> the reality is they’re pumping growth by reducing costs and this lowering prices

At what point did I indicate they weren't making it cheaper? Also their licensing isn't really in the spirit of what open source originally described, which is what I meant.

Sorry if that didn't come across, I guess. I was being intentionally pithy but largely related to standard practices for VC funded startups.


That's really great, we can generate porn faster and better. :)


From testing out the 1024px outputs it seems to work okay for text2img tasks only, other tasks get deepfried results. Also got it working using SDXL-LCM with similar results.


SDXL is already very very slow when compared to SD 1.5. They are claiming 200ms for 512x512 image in SDXL on A100. We need SD 1.5 turbo for even faster generation.


I haven't found SDXL to be inherently much slower than 1.5, besides the obvious 4x slowdown from having twice the linear resolution.


Yeah, in my experience it's actually FASTER because I was already doing high res gens, and that required 2 or 3 passes previously.


Well, its 2.1 instead of 1.5, but:

https://huggingface.co/stabilityai/sd-turbo

I assume that this was ready for release because Stable Video Diffusion (which is also an SD2.1-based model) is essentially this plus a motion model.)

I wouldn't be surprised if their hosted-only SD1.6 beta is, or has, a turbo version, and if that gets released publicly, that's where we'll see an SD1.x turbo.


It _would_ be nice if they offered 512/768/1024 px variants of the models. I frequently don’t actually need the full 1024 px as it just needs to look good enough for a chat thumbnail. Then I could upscale it manually later. There’s other models like Kandinsky but it’s not super convenient to use multiple models with different code and what not.


SDXL Turbo also uses a distilled version of SDXL so it gets a speed bonus from that too.


is it known how much larger SDXL is compared to SD1 or SD2?


SD1.x is around 1.1B parameters including VAE, SD2.x is slightly more (uses the same UNet and VAE, but a bigger text encoder; not finding stats as quickly as I'd like), and SDXL is 3.5B parameters (single model only, but StabilityAI's preferred base + refiner model setup is effectively 6.6B parameters -- some bits are shared.)


Very impressive, I'm generating some really nice quality images at 1024x1024 in under 1 second on a 3090 in InvokeAI!


Is this available in Automatic111?


I'm gonna wait until this is released as a model. This is really cool though!


From TFA:

> Download the model weights and code on Hugging Face[0], currently being released under a non-commercial research license that permits personal, non-commercial use.

[0] https://huggingface.co/stabilityai/sdxl-turbo


I am fucking blind, thanks!


Awesome! so where is the API I can try?



...which requires you to sign in. That nice little text box invites you until you actually click to enter some text and get a registration box thrust at you

People that design a UX where the user tricked into a registration 'ambush' need to be punched in the face.


Luckily this one accepts burner emails just fine, without any intrusive other data collection (name, etc)

http://grr.la


That's a useful website, but holy shit it is utterly unusable without an adblocker.


I mean, you should have an adblocker on every device anyway. On my phone I use https://adguard-dns.io/en/public-dns.html (No need to download the app, just select "Configure AdGuard DNS manually"). As a bonus, on android, it blocks ads in shitty mobile games too.


A totally reasonable response to a minor inconvenience.


The level of entitlement HN posters have these days is insane. They're giving you free compute resources on the order of an entire $800 GPU. If you don't like it, run the damn thing yourself!

UX designers doing their job and programmers protecting their free product from being abused are not the people deserving abuse here.


me reading your comment: yeah, also hated that, yes, yes, yes

your last four words: nope, hard no, no, we are not friends


What are the use cases for this?


> A Real-Time Text-to-Image Generation Model

> On an A100, SDXL Turbo generates a 512x512 image in 207ms (prompt encoding + a single denoising step + decoding, fp16), where 67ms are accounted for by a single UNet forward evaluation.

Okay... so what part of this is real time? 207ms is 4.8Hz. 67ms is 14.9Hz. Isn't "real time" in graphics considered to be at least 30Hz (33ms)? And by today's standards at minimum 60Hz (16ms) if not 144Hz (7ms)? I'm lost at what part of this is real time? I'm not sure it even would get there with an H100. Maybe an H100 and everything is TensorRT?


It's "real-time" as in "I finished typing and the image is already there"


I think it is still a but deceptive to say this considering a well know and high usage alternative definition exists. Plus that's on commercial grade hardware.


Considering the use-case of just interacting with a computer and typing prompts, I'd call it real-time.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: