Noncommercial use - aside from being one of my licensing pet peeves - seems to indicate that the money is drying up. My guess is that the investors over at Stability are tired of subsidizing the part of the generative AI market that OpenAI refuses to touch[0].
The thing is, I'm not entirely sure there's a paying portion of the market? Yes, I've heard of people paying for ChatGPT because it answers programming questions really well, but that's individual users, who are cost sensitive. The real money was supposed to be selling these things as worker replacement, but Hollywood unions have (rightfully) shut AI companies out of the markets where a machine that can write endless slop might have made lots of money.
OpenAI can remain in Microsoft's orbit for as long as the stupid altruist / accelerationist doomer debate doesn't tear them apart[1]. Google has at least a few more years of monopoly money before either the US breaks them up or the EU bankrupts them with fees. I don't know who the hell is pumping more money into either Anthropic or Stability.
[0] Porn. It's always porn. OpenAI doesn't want to touch it for very obvious reasons.
[1] For what it's worth, Microsoft has shown that all the AI safety guardrails can be ripped out by the money people at a moment's notice, given how quickly they were able to make the OpenAI board blink.
The only truly successful commercial use of SDXL I know of is by NovelAI. Said company appears to have used an 256xH100 cluster to finetune it to produce anime art.
Open source efforts to produce a similar model seem to have failed due to the extreme compute requirements for finetuning. For example, Waifu Diffusion using 8XA40[0] have not managed to bend SDXL to their will after potentially months of training.
If you need 256xH100 to even finetune the model for your use case, what's stopping you from just training your own base model? Not much, as it turns out. Developers of NovelAI have stated they'll train from scratch for the next version of their model a few weeks ago.
So I agree, even with the licensing changes things might be looking somewhat dire for SAI.
What do you mean by extreme requirements? There's lots of SDXL fine tunings available at civit, like https://civitai.com/models/119012/bluepencil-xl for anime. The relevant discords for models/apps are full of people doing this at home.
Or are you looking at some very specific definition / threshold for fine tuning here?
There is something I find rather hard to communicate about the difference between these models on civitai and what I think a competent model should be able to do.
I'd describe them like "a bike with no handlebars" because they are incredibly difficult to steer to where you want.
The model seems to have completely ignored a good 35% of the text input, most egregiously I find the (flat chest:2.0), the parenthesis denoting a strengthening of that specific part of the prompt. The values I see people use with good general models range from 1.05~1.15. 2.0 in comparison is an extremely large value, that ended up _still not working at all_, if you take a look at the actual image.
> the model seems to have completely ignored a good 35% of the text input,
Well, when you blindly cargo cult a prompting style designed to work around issues with SD1.x in SDXL and in the process spam several hundred tokens, mostly slight variants, into the 75-ish-token window (which, yes, the UIs use a merging strategy to try to accommodate), you have that problem.
> most egregiously I find the (flat chest:2.0)
The flat chest is fighting to compensate the also heavily weighted (hands on breasts:1.5) which not only affects hand placement but also the concept of "breasts", and the biases trained into many of the community models with that term mean that having that concept in the prompt and heavily weighted takes a lot to counteract. So, no, I don't think its ignoring that.
I'm just going out on a limb here, but a paid service needs to be good with limited input. I've used SD locally quite a lot and it takes quite a bit of work through x/y plots to find combinations of settings that produce good images somewhat consistently. Even when using decent fine tunings from CivitAI.
When I use a decent paid service, pretty much every prompt gives me a good response out of the box. Which is good, because otherwise I'd have no use for paid services, since I can run it all locally. This causes me to go to a paid service whenever I want something quick, but don't need full control. When I do want full control, I stick to my local solution, but that takes a lot more time.
They rented 8xA40 for $3.1k. That is actually kinda peanuts; I spent more on my gaming PC. I think there were kickstarter projects for AI finetunes that raised $200k before Kickstarter banned them?
Is that the correct link? I've never heard of A40s, the link is to release notes from a year and two months ago, and SD XL just came out a month or two ago. Hard for me to get to "SD XL cannot [be finetuned effectively]" from there.
The team was given early access by StabilityAI to SDXL0.9 for this. You'll have to test it out for yourself, if you're interested in comparing. From my experience, it is a world of a difference between the NovelAI and WaifuDiffusion models in both quality and prompt understanding.
Note, the very baseline I set for the WaifuDiffusionSDXL model was to beat their SD2.1 based model[0], which it did not in my opinion.
> The team was given early access by StabilityAI to SDXL0.9 for this.
SDXL0.9 may have been a worse starting point than SDXL1.0, and certainly there's been more time put in and experience developed finetuning SDXL in the time since WD trained against SDXL0.9 by the people who have released the huge pile of finetunes that have been released since.
I posted about my AI porn site pornpen.ai here last year and it reached the top of the front page. And yes, it's still going strong :D (and we've integrated SDXL and videos recently)
Emad just tweeted about the future monetization of their core models. Seems they want to use the Unity model - the original one, not the recent trick Unity pulled. AKA free to use until you make lots of money with it.
Are you suggesting the only use for locally run free SD derived models is porn?
Creating illustrations for articles/presentations and stock photo alteratives are huge!
The ability to run for free on your local machine allows for far more iterations than using SaaS, and the checkpoint/finetune ecosystem the openness sprouted has created models performing way better for these use cases than standard SD.
No, I'm suggesting that the only models you can use for porn are locally-run.
In particular the people offering hosted models do not want to touch porn, because the first thing people do with these things is try to make nonconsensual porn of real people, which is absolutely fucking disgusting. Hence why they have several layers of filtering.
SD also has a safety filter, but it's trivially removable, and the people who make nonconsensual porn do not care about trivial things like licensing terms. My assumption is that switching to a noncommercial license would mean that Stability could later add further restrictions to the commercial use terms, i.e. "if you're licensing the model for a generator app like Draw Things, you have to package it up in such a way that removing the safety filter is difficult or impossible".
I've been surprised at the explosion of porn. Well, not actually. Automatic1111 made that easy and anyone that CivitAI knows all too well what those models are being used for. I mean when you give teenagers the ability to undress their crushes[0] what do you think is going to happen (do laws adequately protect people (kids)? Can they? Will this force a shift towards actually chasing producers, distributors, and diddlers?)?
Porn is clearly an in demand market. But what does surprise me is that there's been a lot of work in depth maps and 3D rendering from 2D images in the past few years so I'm a bit surprised that given how popular VR headsets are (according to today's LTT episode, half as many Meta Quest 2s as PS5s have been sold?!). I mean if VR headsets are actually that prolific it seems like there'd be a good market for even just turning a bunch of videos into VR videos, not to mention porn (I don't have a VR headset, but I hear a lot of porn is watched. No Linus, I'm not going to buy second hand...). I think all it takes is for some group to optimize these models like they have for LLaMA and SD (because as a researcher I can sure tell you, we're not the optimizers. Use our work as ballpark figures (e.g. GANs 10x Diffusion) but there's a lot of performance on the table). You could definitely convert video frames to 3D on prosumer grade hardware (say a 90 minute movie in <8hrs? Aka: while you sleep).
There are a lot of wild things that I think AI is going to change that I'm not sure people are really considering (average people anyways or at least stuff that's not making it into popular conversation). ResNet-50 is still probably the most used model btw. Not sure why, but just about every project I see that's not diffusion of an LLM is using this as a backbone despite research models that are smaller, faster, and better (at least on ImageNet-22k and COCO).
Not really what I mean. I mean TensorRT is faster than that according to their README. By optimized I'm specifically pointing to Llama cpp because 1) it's in C, 2) using quantized models, 3) there's a hell of a lot of optimizations in there. The thing runs on a raspberry pi! I mean not well but damn. SD is still pushing my 3080Ti for comparison.
But I wasn't thinking diffusion. Models are big and slow. GANs still reign in terms of speed and model sizes. I mean the StyleGAN-T model is 75M params (lightweight) or 1bn (full) (with 123M for text). That paper notes that the 56 images they use in the Fig 2 takes 6 seconds on a 3090 at 512 resolution. I have a 3080Ti and I can tell you that's about how long it takes for me to generate a batch size of 4 with an optimized TensorRT model. That's a big difference, especially considering those are done with interpolations. I mean the GAN vs Diffusion debate is often a little silly as realistically it is more a matter of application. I'll take diffusion in my photoshop but I'll take StyleGAN for my real time video upscaling.
But yes, I do understand how fast the field is moving. You can check my comment history to verify if register isn't sufficient indication.
>do laws adequately protect people (kids)? Can they? Will this force a shift towards actually chasing producers, distributors, and diddlers?
It's extremely complicated. Actual CSAM is very illegal, and for good reason. However, artistic depictions of such are... protected 1st Amendment expression[0]. So there's an argument - and I really hate that I'm even saying this - that AI generated CSAM is not prosecutable, as if the law works on SCP-096 rules or something. Furthermore, that's just a subset of all revenge porn, itself a subset of nonconsensual porn. In the US, there's no specific law banning this behavior unless children are involved. The EU doesn't have one either. A specific law targeted at nonconsensual porn is drastically needed, but people keep failing to draft one that isn't either a generalized censorship device or a damp squib.
You can cobble together other laws to target specific behavior - for example, there was a wave of women in the US copyrighting their nudes so they could file DMCA 512 takedown requests at Facebook. But that's got problems - first off, you have to put your nudes in the Library of Congress, which is an own goal; and it only works for revenge porn that the (adult) victim originally made, not all nonconsensual porn. I imagine EU GDPR might be usable for getting nonconsensual porn removed from online platforms, but I haven't seen this tried yet.
I'm disgusted, but not surprised, that teenage kids are generating CSAM like this. Even before we had diffusion models, we had GANs and deepfakes, which were almost immediately used for generating shittons of nonconsensual porn[1].
This is true, though "AI CSAM" is an oxymoron. There is no abuse in the creation of such works, and such it is not abuse material, unless of course real children are involved.
I get your argument, but there are definitely laws about cartoon underage characters. Agree or disagree the difference is that today you don't need to be a highly skilled artist to make something that people are going to fap to. (I definitely agree priority should be focused on physical abuse and the people making the content, but this whole subject is touchy).
Do non-consensual porn not qualify as defamation? That and obscenity laws if existed should be able to handle most hyperrealistic porn so that only speeches remain.
Good question. US defamation law is fairly weak[0], but all the usual exceptions that make it weak wouldn't apply. e.g. "truth is an absolute defense against defamation" doesn't apply because AI generated or photoshopped nonconsenual porn is fake. I'm not a lawyer, but I think a defamation case would at least survive a motion to dismiss.
[0] Which, to be clear, is a good thing. Strong defamation law is a generalized censorship primitive.
Could this perhaps fall under something like trademark, like an unauthorized use of self, I'm sure I've heard of some celebrity cases that were for similar.
> I'm disgusted, but not surprised, that teenage kids are generating CSAM like this. Even before we had diffusion models, we had GANs and deepfakes, which were almost immediately used for generating shittons of nonconsensual porn
I think the big difference now is that 1) it's much easier to do now, and 2) the computational requirements and (more importantly) technical skills have dramatically dropped.
We should also be explicitly aware that deep fakes are still new. GANs in 2014 were not creating high definition images. They were doing fuzzy black and white 28x28 faces, poorly, and 32x32 color images that if you squint hard enough you could see a dog (https://arxiv.org/abs/1406.2661). MNIST was a hard problem at that time and that's 10 years. It took another 4 years to get realistic faces and objects (https://arxiv.org/abs/1710.10196) (mind you, those images are not random samples), another year to get to high resolution, and another 2 to get to diffusion and another 2 before those exploded. Deep fakes were really only a thing within the last 5 years and certainly not on consumer hardware. I don't think the legal system moves much in 10 years let alone 5 or 2. I think a lot of us have not accurately encoded how quickly this whole space has changed. (image synthesis is my research area btw)
I'm not surprised that these teenagers in a small town did this. But the fact that all those adjectives exist in that order is distinct. Discussions of deep fakes like that Tom Scott video were barely a warning (5 years is not a long time). It quickly went from researchers thinking it can happen in the next decade and starting discussions to real world examples making the news in under their prediction time (I don't think anyone expected how much money and man hours would be dumped into AI).
Open AI has a pretty robust and profitable business without Microsoft. In every enterprise I’ve been involved with over the last few years we have had some incredibly material and important use cases of OpenAI LLMs (as well as Claude). They aren’t spewing slop or whatever, they’re genuinely achieving valuable and foundational business outcomes. I’ve been a bit stunned at how fast we’ve achieved these things and it tells me that the AI hype isn’t hype, and that if we have done these things in a year, it’s hard to estimate how much impact the technologies will have in five but I think it’s substantial. So is our spend with OpenAI. Or rather, with Azure on OpenAI products. The only value from our experiences Microsoft offers is IAM - which is sufficient frankly.
Stability and Midjourney are also making money, but it’s largely with amateurs and people prototyping content for their own creations. A lot of single person Indy game developers are using these tools to generate assets, or at minimum first pass assets. I think a lot of media companies are producing the art for their articles or news letters etc using these tools. Whether this is enough, I don’t know.
Sigh I'm really tired of seeing people assume OpenAI is profitable. We have no idea of they are or not and have some indication that they're incinerating money on chatgpt to the point that they're turning off sign ups because they're out of compute.
I'm not sure that's a useful argument insomuch as A) its unfalsifiable until they IPO (reporting indicates they are very, very profitable) and B) running out of GPUs seems like an odd thing to name as an indicator you're _losing_ money.
People are genuinely way, way, way underestimating how intense the GPU shortage is.
I haven't seen any reporting about their costs. I'm sure they're making bonkers numbers in revenue, but that doesn't mean they're profitable if they're losing money on every gpt call. You're saying my claim that they're unprofitable is unfalsifiable. The same is true for claims that they're wildly profitable. We just don't know the financial state of the company because they're not public.
My understanding, which I can’t prove other than to say it comes from folks affiliated with OpenAI, is that chatgpt doesn’t make money but also doesn’t lose money (in aggregate, some accounts use way more than others but many accounts are fairly idle), and their API business is profitable and accounts for most of their GPU utilization. I have no insight into why they would turn off signups for chatgpt other than they may need the capacity for their enterprise customers, where they make a decent margin.
Given that Midjourney predates StableDiffusion, that seems unlikely, though it is possible they threw away all their hard work to create their model to use one that's available to other people for free and then charge money for it.
The EU views Google as a tobacco company. The last thing they want to do is bankrupt them with fees. They want to milk Google - big tech generally - for tax revenue. And besides, it'd take $100 billion per year in fees, which is never going to happen. Meanwhile Google keeps getting bigger year after year (they have nearly doubled in size in four years, up to $300b in sales now) and Bing has made zero headway despite the AI-angled efforts (BingChat etc). Maybe the mainstream adoption of GPT (or similar) would severely damage Google, there's still a lot of time left for Google to take their shot at getting out in front of that outcome.
The US breaking Google up also won't end the monopoly money. More likely one of the children will spin out with an even better margin business.
> The US breaking Google up also won't end the monopoly money.
It's also weird because modern economics with tech has created a space that creates a lot of natural monopolies. Momentum is very powerful and it's the reason silicon valley companies will run at a loss for years creating a userbase. Trick is to keep them (or sell before buyer starts charging). There's what, 2 map companies and only one of them is in high usage because it's a default? Same with browsers. You can't compete because making a better product isn't enough in a system where network effects dominate the economics. Hardware companies are seeing that (along with patent issues, especially across borders). I have no idea how to think about this tbh because it is weird. In some cases breaking these companies up ultimately destroys them while in other cases, exactly what you say.
Seems like a great place to mention the MapQuest API. I tested it against 4 other services(inc. Google's) on 200 manually verified geocoding/reverse-geocoding tasks. Google managed a 92% whereas MapQuest scored 99%.
If you are doing geocoding or reverse-geocoding MapQuest outperforms Google Maps quite magnificently(YMMV). The cost is also lower and there are plans where you can keep the data.
The funny thing being that the C levels didn't give a shit about the results and went ahead with Google Maps anyway. So your point remains correct.
Waze? But my hyperbole aside ("only one"), how many people do you know use these other platforms? Colloquially people call things monopolies if they have sufficient market share, not absolute (which nearly never exists), or market collusion exists (e.g. ISPs, airlines, oil). People even do it for 2 dominating companies (e.g. Coke + Pepsi only controls 71% of market share (46.3+24.7)). Because the truth is that monopolies aren't always bad, they are just dangerous because they have so much weight that they can perform abusive tactics and that's the thing we actually care about. My whole point is that this gets to be a very sticky situation when the product is the market share (the more people that use Google Maps the better Google Maps gets. But maybe social networks are a clearer example).
Natural monopoly means that the market only has enough demand for one supplier. Canonical examples would include highways, railways, local telephone networks, and residential Internet access providers.
Most of the big tech companies we love to hate aren't natural monopolies. Google's anticompetitive moat is primarily made of inertia: they were the first to market with a halfway functional search index. Defaults are very sticky: Microsoft pushes Bing in Windows a lot because a lot of people don't know how to change the default search engine to anything else.
Browsers are monopolized because we let Apple do to iOS what we refused to let Microsoft do to Windows: make the platform browser the only option. In the heyday of Firefox, switching from IE was just a matter of downloading an installer and migrating your bookmarks and history. You can still do this (and I recommend that you do), but if you have an iPhone, it's more complicated. You can technically use an app called "Firefox", but it's using Safari under the hood. Apple won't let you port your own browser engine, which limits how you can improve the experience. So switching to Firefox on iOS is pointless.
Sometimes these monopolies interact. Google built Chrome specifically so they'd get a seat at the web standards table. They got people to use it by giving free advertising space to it on their homepage, and then they used their market position to introduce ridiculous numbers of new web APIs that every other browser vendor now has to reimplement to make Google properties work[0]. Microsoft and Opera found this to be such a hurdle that they switched their browsers over to Chromium. Firefox had to split engineering time up between implementing Chrome features, chasing down security bugs, and adopting multiprocess support, which slowed them down.
At no point is any of this 'natural'. The market can support five technologically independent browsers, but not when Google is trying to actively sabotage them.
[0] The most egregious example being Shadow DOM v0, which was never actually adopted as a standard, but used in YouTube's version of Polymer for years after Shadow DOM v1 was standardized. Firefox never implemented v0, so they were stuck running lots of polyfill code that Chrome wasn't.
> Natural monopoly means that the market only has enough demand for one supplier.
That's not true. Natural monopolies also form due to network effects. You can follow the story of Bell Labs for an early example, which I suspect you're aware of. The reason for a monopoly wasn't for lack of demand it was about tragedy of the commons.
In our modern tech economics we have similar tragedy of the commons but a bit more abstracted. The thing is most products are a result of their userbase, not the product itself. Look at HN. Or look at Reddit or YouTube. It may look like circular logic (because it is a feedback loop), but the fact that everyone is publishing on YouTube makes YouTube bigger and more useful to the consumer which causes more people to publish on that platform because there are more users. It then makes it very difficult to compete because what can you do? You can make a platform that is 100x better than YouTube in respects to serving videos, search, pay to creators, and so on, but you won't win because you have no users and you won't have users without creators who aren't going to produce on your platform because there are no users. In other words, there's a first mover disadvantage. You're an early user and the site sucks because it has no content but you're a true believer. You're an early creator and your pay sucks because there are so few viewers (even if your pay per viewer is 100x your pay per video/time spent is going to be 10000x less because there are 1000000x fewer users). Look at PeerTube, Nebula, or FloatPlane. They are all doing fine but nowhere near as successful as YouTube which everyone hates on (and for good reason). Hell, when YouTube started trying to compete with Twitch they had to REALLY incentivize early creators to move with very lucrative deals because they were not buying the creator, they were buying their userbase. It should be a clear signal that there's an issue if a giant like Google has a hard time competing with Amazon.
For a highly competitive market you need a low barrier to entry so that you can disrupt. There are thousands of examples where a technology/product that is superior in every way (e.g. price and utility) but are not the market winners because network effects exist. Even things like BetaMax vs VHS is a story of network effects (I wouldn't say BetaMax dominated VHS, but not important), because what mattered was what you could get at a store or share (via your neighbor or local rental).
And I'm glad you mention Firefox, because it's a good example of stickiness. I've tried to convert many friends who groan and moan about how hard it is and make up excuses like bookmarks and literally showing them that on startup it'll export for you they just create a new excuse or say UI/UX is trash because the settings button is 3 horizontal lines instead of 3 vertical dots so they can't find it despite being in the same place or tabs are not as curved so its "unusable." You might even see these comments on HN, a place full of tech experts.
What I'm getting at here is that the efficient market hypothesis is clearly false and market participants are clearly not rational (or at least based on the conventional -- economic -- definitions)
There's quite a few hosted SDXL platforms (mage.space, leonardo.ai, novel.ai, tensor.art, invoke.ai to name a few) and most consumers do not have the GPUs needed to run those models, only enthusiasts do.
It's always baffled me that stability didn't offer a competitive UI platform to use their models with, clipdrop is just bad quality and very bare-bones, and dreamstudio is pricey and still lacks most features. So this move to a new licensing strategy doesn't surprise me, it actually is somewhat comforting, as i expecting them to just stop releasing further trained models (e.g sdxl1.1 and up), and only offer those on their services (of course, that can still happen) cause how else were they going to monetize the consumers (i know they (planned to) offer custom trained/finetuned models to big corps, but that doesn't monetize consumers).
However, as most releases by stability these days, it has this feeling of close-but-no-cigar, and the recent LCM lora's might be a little slower, but these actually offer 1024^2 resolution, work with any existing lora's and finetunes (so they are usable for iterative development, unlike this turbo model, cause well, it's a different model, can't iterate on it then expect sdxl (with lora's, to a lesser extend also without) to generate a similar image) and support cfg-scale (and therefor negative prompts / prompt weighting). I suppose there's some niche market where you need all the speed you can get, but unless there's a giant leap in (temporal) consistency, that will remain niche, i don't see the mentioned real-time 3d "skinning" neither the video img-to-img (frame-to-frame) gimmicks take off with current quality and lack of flexibility. It's good research, optimizations have lots of value, but it needs quality as well.
Their recent video model is quite bad as well, especially compared to pika and runway gen-2, but well, but as with the the dalle-3 comparison one can say those are closed source and stability's offering is open.
Then we have the 3d model, close sourced, worse than luma's genie unfortunately.
The music model is nothing like suno's chirp (which might be multiple models, bark and a music model) used together), and the less said about their llm offerings the better.
Bottom line, stability needs a killer model again, they started strong with stable diffusion 1.5, took a wrong turn with 2.0 (kind of recovered by 2.1, but the damage was done), and while SDXL is't bad in a vacuum, neither was it the leap ahead that put it in front of competition like midjourney at the time, and Dalle-3 a little later, and now even a relatively small model like pixart-alpha, also opensource, can offer similar quality to what sdxl offers (with a lot of caveats, as it has been trained on so few images it just doesn't have info on many concepts). And more worrying, there's no hint of something better in the stability's pipeline. But maybe image-gen is as best as stability can get it, and they think they can make an impact pivoting in another direction or multiple directiobs, but currently, it feels a master-of-none situation.
The thing is, I'm not entirely sure there's a paying portion of the market? Yes, I've heard of people paying for ChatGPT because it answers programming questions really well, but that's individual users, who are cost sensitive. The real money was supposed to be selling these things as worker replacement, but Hollywood unions have (rightfully) shut AI companies out of the markets where a machine that can write endless slop might have made lots of money.
OpenAI can remain in Microsoft's orbit for as long as the stupid altruist / accelerationist doomer debate doesn't tear them apart[1]. Google has at least a few more years of monopoly money before either the US breaks them up or the EU bankrupts them with fees. I don't know who the hell is pumping more money into either Anthropic or Stability.
[0] Porn. It's always porn. OpenAI doesn't want to touch it for very obvious reasons.
[1] For what it's worth, Microsoft has shown that all the AI safety guardrails can be ripped out by the money people at a moment's notice, given how quickly they were able to make the OpenAI board blink.