Ideogram 2.0 was also released today, and it's nerfed (anatomy is lot worse than 1.0 now) just like StableDiffusion versions after 1.5 which is very disappointing.
Well good thing we have Flux out in the open now, both midjourney releasing web version or ideogram releasing there 2.0 on the same day after 2 weeks of flux won't redeem them as much. Flux Dev is amazing, check what SD community is doing with it on https://www.reddit.com/r/StableDiffusion/ . It can do fine tuning, there are Loras now, even control net. It can gen casual photos like no other tool out there, you won't be able to tell they are AI without looking way too deep.
https://fastflux.ai/ for instant image gen using Schnell (but its fixed on 4 steps and is mainly a tech show off of inference engine by runware.ai)
https://www.segmind.com/ has API support with lots of options, I am using it to generate and set wallpaper using an AHK script. It's very very slow though.
Just to clarify for other readers, Draw Things has support and provides download links to quants but no iOS device can run the full precision model which means you will get slightly different and usually lower quality output than stuff you may see elsewhere, even if you use the same settings. It's still damn impressive though.
Quality issue should be mainly due to using FP16 accmulators for GEMM in M1-M2, A14-A16 devices (it is not a problem for SD v1 / SDXL models due to smaller channel count). This is changed to FP32 accmulators for GEMM in these devices with 1.20240820.1 release. q8p should have comparable quality to non-quantized models (in Draw Things, it is called FLUX.1 [dev] (Exact)).
Claims that quantization doesn’t hurt models are made all the time but rely on the fact that almost all evaluations today of LLMs hardly scratch their surface. If we evaluated LLMs properly, even large quants would be detectably worse, and by a significant amount.
A model trained on BF16 that within the range of FP16 have effective bit rate of 13-bit at max (e5m7). Reasonable quantization (at 8-bit) gets you weight error (i.e. L2 distance on weights) down to < 1e-3, and full diffusion run (30 steps) have result with L2 distance (comparing to unquantized) < 5e-2.
I think there is a line somewhere between 4-bit to 8-bit that will hurt performance (for both diffusion models and LLM). But I doubt the line is between 8-bit to 13-bit.
(Another case in point: you can use generic lossless compression to get model weights from 13bit down to 11bit by just zip exponent and mantissa separately, that suggests the effective bit rate is lower than 13bit on full-precision model).
I buy these kinds of arguments but the moment a company or NeurIPS researcher claims even a "tiny" bit of loss happens, I become suspicious. I don't buy most claims of "we get 99%+ of the performance" made in practice.
But yes, I do believe that we will find proper lossless quants, and eventually (for real this time) get "only a little bit of loss" quants, but I don't think that the current 8 bits are there yet.
Also, quantized models often have worse GPU utilization which harms tokens/s if you have the hardware capable to run the unquantized types. It seems to depend on the quant. SD models seem to get faster when quantized, but LLMs are often slower. Very weird.
I wonder if there is a case for thinking about it in terms of the technical definitions:
If we start from peering into quantization, we can show it is by definition lossy, unless every term had no significant bits past the quantization amount.
so our lower bound must that 0.03% error mentioned above.
I don't think this is true, llama.cpp hobbyists think about this a lot and there's been multiple independent community experiments, including blind testings by a crowd. I doubt it holds across models and modalities, but in llama.cpp's context, Q5 is inarguably unnoticeably different from F32.
However this seems to be model size dependent, ex. Llama 3.1 405B is reported to degrade much quicker under quantization
I made a simple one - https://dreamgenerator.ai. (If you pick the “High” image setting), but u have to subscribe to use it, otherwise it gives you stable diffusion.
While I won't say that realism is a solved problem, SD has been able to produce unbelievably realistic photo-level images using "Realism Engine"/NightVisionXL/etc for a while now.
Flux's power isn't necessarily in its ability to produce realistic images, so much as its increased size gives it a FAR superior ability to more closely follow the prompt.
I hadn't seen Realism Engine, looks good. Check Boreal FD https://civitai.com/models/639937?modelVersionId=715729 for the kind of casual photography realism I was talking about. I don't remember seeing that quality with SD.
Also to 'understand' complicated prompts (however far that goes when talking image models). For example, describing two subjects in detail with a relationship between them ("a cat and a man and the cat is standing on the man's head and the cat has red hair and the man is wearing purple glasses") reliably works with Flux, even the 'light' models, but is a crapshoot with SD.
Which honestly is rather silly for the companies making the models. The cat's already out of the bag, and if a competitor manages to make a similar/good-enough (uncensored) model there's nothing midjourney etc will have going for them.
Or as someone once said, if they took porn off the internet, there'd only be one website left, and it'd be called "Bring Back the Porn!".
SD 1.5 was not censored. Later version were trained with the the censored data. A side of that was bad anatomy. Search for stable diffusion woman lying on grass. I think this was SD3. It generates monstrosities when you ask for a woman on grass.
I liked ideogram's approach. It looked like their training data was not censored as much (it still didn't render privates). The generated images are checked are tested for nudity before presenting to user, if found, image is replaced by a placeholder.
If you check SD reddit, community seems to have jumped ship to flux. It has great prompt adherence. You don't have to employ silly tricks to get what you want. Ideogram also has amazing prompt adherence.
Yep - they self-censor themselves according to China's whims, just so they can have China access now only to be banned by the great firewall 1 year from now after Chines startups scrape all their image outputs for training.
try generating with the prompt "a spoony bard and a licentious howler", if you're willing to catch a ban for using ted woolsey's horrible, offensive language that was acceptable to 1990s nintendo of america
Maybe they got so far thanks to the Discord approach.
When you went to the Discord, you immediately got the endless stream of great looking stuff other people were generating. This was quite powerful way to show what is possible.
I bet one challenge for getting new users engaged with these tools is that you try to generate something, get bad results, get disappointed and never come back.
That's exactly true, and having built a similar bot for friends and acquaintances to use, the effect in question is huge. It makes no sense to go for a webapp first, second or even third.
Google, Facebook, Microsoft etc all have to shoehorn these things into their products to stay on top, but it's not their core business and they don't want it to take away from their ads or licenses they sell. Midjourney as a company is much freer to innovate without the burden of the restrictions of an established company.
That and you get user observability for free, and support injection in a way that to this day there’s no good way to do in an “app” experience.
Presuming your bot requires people to interact with it in public channels, your CSRs can then just sit in those channels watching people using the bot, and step in if they’re struggling. It’s a far more impactful way to leverage support staff than sticking a support interface on your website and hoping people will reach out through it.
It’s actually akin to the benefit of a physically-situated product demo experience, e.g. the product tables at an Apple Store.
And, also like an Apple Store, customers can also watch what one-another are doing/attempting, and so can both learn from one another, and act as informal support for one-another.
What was the confusing Discord experience? Was it that Discord was the main way to access Midjourney, and it was chaotic? I vaguely remember this, but didn't spend much time there.
Yep. "Web experience" made me think there was an actual website experience. There still isn't a web experience and it's still not open to everyone. lol
Can you explain the difference between creating an anonymous google account with fake information compared to an anonymous midjourney account with fake information?
Touché. But it's certainly leagues more "everyone" today than it was yesterday. It used to require arcane knowledge of deep Discord server bot navigation. I gave up after 20 minutes and never figured it out. Today I tried it in seconds.
I have about 40+ google accounts for reasons, so I don't begin to understand the aversion some have to registering a burner google/discord/facebook/etc account under their hampster's name, but many of my closest friends are just like you so I respect it anyway, whatever principle it is.
> I don't begin to understand the aversion some have to registering a burner google/discord/Facebook/etc
Maybe because several of those tend to ask for phone verification nowadays, and phone burner services tend to either not work or look so shady it looks like a major gamble to give them any payment information?
Do you use all 20 regularly? Because if you don’t, there’s a chance that next time you try to login to one of them on the web they’ll ask you to add a phone number and not let you continue until you do. But if you have them setup in an email client, it should still work.
It's just astonishing to me how difficult it still seems to be for the Midjourney team to develop a web site that amounts to little more than a textbox, a button, and an img tag.
I tried their new web experience, and... it's just broken. It doesn't work. There's a showcase of other people's work, and that's it. I can't click the text-box, it's greyed out. It says "subscribe to start creating", but there's is no "subscribe" button!
That must have been a LOOONNG time ago, I've been a user of MJ since ~version 3 a couple years back - and I remember custom scraping my images way back in the day and they were coming from Google Cloud Storage, e.g.
Why not just make another google account that you solely use for registration in services like these? Use a virtual machine with VPN if you really do not want it to be linked with your real account.
It is a bit of extra work but that's just how it is nowadays.
Does not google require a telephone number to create accounts nowadays? In some regions there are no anonymous SIM cards by law. (Temporary number services may not work well.)
There do exist anonymous credit cards, paid with cash, for fixed relatively small amounts (e.g. ~100u. In Europe there are restrictions to these kind of payment methods - cards must be below, I believe, 150€).
It is very much possible to create temporary credit cards linked to your real bank account for one-off purchases. Apple Card provides that as a service (US only) and other countries have similar systems that every bank adheres to.
Plus, you may not mind not being anonymous to Midjourney but mind not being anonymous to some other service (like Google).
Because the alternative is pretty provably worse for you, and for them.
* You have to save your (hopefully unique!) email/password in a password manager which is effectively contradictory to your "I won't use a cloud service" argument.
* The company needs to build out a whole email/password authentication flow, including forgetting your password, resetting your password, hints, rate limiting, etc etc, all things that Google/Apple have entire dedicated engineering teams tackling; alternatively, there are solid drop-in OAuth libraries for every major language out there.
* Most people do not want to manage passwords and so take the absolute lazy route of reusing their passwords across all kinds of services. This winds up with breached accounts because Joe Smith decided to use his LinkedIn email/password on Midjourney.
* People have multiple email addresses and as a result wind up forgetting which email address/password they used for a given site.
Auth is the number one customer service problem on almost any service out there. When you look at the sheer number of tickets, auth failures and handholding always dominate time spent helping customers, and it isn't close. If Midjourney alienates 1 potential customer out of 100, but the other 99 have an easier sign-in experience and don't have to worry about any of the above, that is an absolute win.
All very thoughtful arguments but I think this solution to these problems is flawed. I don't believe we should be solving authentication management problems by handing over all authentication capabilities and responsibilities to one or two mega companies.
Especially since those companies can wield this enormous power by removing my access to this service because I may or may not have violated a policy unrelated to this service.
While we are all waiting for the world to sort these problems out, companies that are not interested in solving them for the world will continue to use SSO techniques.
I’m very not impressed by this deep, extended critique of machine learning researchers using common security best practices on the grounds that those practices involve an imperfect user experience for those requiring perfect anonymity…
there's web3, where users have a private key and the public key is on a cryptocurrency chain, but adoption there has been slow. there's also a number of problems with that approach, but that's the other option on the table.
I want to believe, but sadly there's no market for it. unless someone wants to start a privacy minded alternative to auth0, and figure out a business model that works , which is to say, are you willing to pay for this better way? are there enough other people willing to pay a company for privacy to make it a lucrative worthwhile business? because users are trained to think that software should be free-as-in-beer but unfortunately, developing software is expensive and those costs have to be recouped somehow. people say they want to pay, but revealed preferences are they don't.
You make it sound untenable to support email/password auth, but given that the vast majority of low tech and high tech online services manage it just fine, I think you might be exaggerating a bit. Since Midjourney is already outsourcing their auth flow, they could just as easily use a third party that supports the most common form of account creation.
While I understand that they might need some account or token to stop abuse, just having my sign in is a big no for anything I just want to try out. After the whole social media trend more or less collapsed into a privacy invading data collection nightmare, I've been extremely reluctant to sign up for anything.
No no, I understand that it's not a viable financial option, I just don't understand that anyone would trust any of these companies with any sort of information.
Most of the current AI companies are going to fail or get bought up, so you have no idea where that account information, all your prompts and all the answers will eventually go. After social media, I don't really see any reason why anyone would trust any tech company with any sort of information, unless you really really have to. If I want to run an LLM, then I'll get one that can run on my own computer, even if that mean forgoing certain features.
Replicate, although not fully free, has the bonus of outputs from Flux (from their endpoints only) able to be used for commercial purposes whereas that is normally only applicable for Flux schnell:
From the license:
"We claim no ownership rights in and to the Outputs. You are solely responsible for the Outputs you generate and their subsequent uses in accordance with this License. You may use Output for any purpose (including for commercial purposes), except as expressly prohibited herein. You may not use the Output to train, fine-tune or distill a model that is competitive with the FLUX.1 [dev] Model."
This license seems to indicate that the images from the dev model CAN be used for commercial purposes outside of using those images to train derivative models. It would be a little weird to me that they'd allow you to use FluxDev images for commercial purposes IF AND ONLY IF the model host was Replicate.
Here's a project I did over a year ago in Midjourney where I took all art for the card game Spectromancer and re-themed each piece in 26 different styles
For a period I daily checked its best outputs page: the "pixar" style was frequent, but far from being the only one. "Typical" like 10 is the type (mode) in 10+9+9+8+8+7+7+6+6+5+5+4+4+3+3+2+2+1+1 - but still 10% of the whole and just one possibility of all.
It's a rather arbitrary metric. All it proves is that the token "spork" wasn't in midjourney's training data. A FAR better test for adherence is how well a model does when a detailed description for an untrained concept is provided as the prompt.
Yup it's an arbitrary metric, but I tried cajoling various image models into generating spork pictures with highly detailed descriptions (I have ComfyUI & AUTOMATIC1111 locally, and many models), which lead to me creating the site.
I'd say a better test for adherence is how well a model does when the detailed description falls in between two very well known concepts - it's kinda like those pictures from the 1500s of exotic animals seen by explorers drawn by people using only the field notes after a long voyage back.
The combination of T5 / clip coupled with a much larger model means there's less need to rely on custom LoRAs for unfamiliar concepts which is awesome.
EDIT: If you've got the GPU for it, I'd recommend downloading a copy of the latest version of the SD-WEBUI Forge repo along with the DEV checkpoint of Flux (not schnell). It's super impressive and I get an iteration speed of roughly 15 seconds per 1024x1024 image.
Welll.... there's a hundred ways we could measure prompt adherence everything from:
- Descriptive -> describing a difficult concept that is most certainly NOT in the training data
- Hybrids -> fusions of familiar concepts
- Platonic overrides -> this is my phrase for attempting to see how well you can OVERRIDE very emphasized training data. For example, a zebra with horizontal stripes.
I have to say I'm very impressed. I've previously used free generators for generating scenery for my D&D campaign, and running a prompt here that previously took me dozens of tweaks to get something reasonable, instantly returned me a set of high quality images, at least one of which was much closer to my mind's eye than anything in those dozens of previous attempts.
( Prompting: An iron-gated door, set into a light stone arch, all deep set into the side of a gentle hill, as if the entrance to a forgotten crypt. The hill is lightly wooded, there is foliage in season. It is early evening. --v 6.1 )
Access blocked: Midjourney’s request does not comply with Google’s policies
Midjourney’s request does not comply with Google’s ‘Use secure browsers’ policy. If this app has a website, you can open a web browser and try signing in from there. If you are attempting to access a wireless network, please follow these instructions.
You can also contact the developer to let them know that their app must comply with Google’s ‘Use secure browsers’ policy.
Learn more about this error
If you are a developer of Midjourney, see error details.
Unable to process request due to missing initial state. This may happen if browser sessionStorage is inaccessible or accidentally cleared. Some specific scenarios are - 1) Using IDP-Initiated SAML SSO. 2) Using signInWithRedirect in a storage-partitioned browser environment.
EDIT:
I tried again from scratch in a new tab and this time it worked. So, temporary hiccup.
EDIT2: I have all the images I created on Discord in the web app - very nice!
Ever since DALL-E 3 completely eclipsed Midjourney in terms of Prompt ADHERENCE (albeit not quality), I've had very little reason to make use of it. However, in my testing of Flux Dev, I can gen images in roughly 15 seconds in Forge, throw those at a SDXL model such as Dreamshaper in the form of a controlnet and get the best of both worlds, high detail and adherent.
Dall-E 3 (intentionally) leans away from realism though but in doing so what it leans into is a very tacky and aesthetically naive although competently executed type of image. Gives every image the feeling that you're seeing a bootleg version of a genuine thing and therefore makes everything else it touches feel tacky.
Same feeling you get looking at the airbrushed art on a state fairground ride.
In 15 seconds? Really? On my machine with good specs and a 4090 flux-dev takes around 400 seconds for 512x512! And flux-schnell at least half. Do you recommend a tutorial for optimization?
For a company workshop I wanted to pay for Midjourney to invite a bot into a private Discord with the workshop participants. We couldn't find a way of using it as a company account, and ultimately every participant had to get a sub, which was less than ideal. If it was today I would have them use Flux in some hosted version.
Yes, I'm finding the "meh" here is expandable to most of the companies, and that's a great thing (e.g just got a $500 card and OpenHermes feels comparable to anything and it's running fully locally) I know there's often not a lot to be optimistic about, but the fact that so called "AI" is absolutely de-facto free/open source is perhaps just about the best way this stuff could roll out.
I'm so glad that Midjourney and Flux is making creativity accessible to everyone, I don't need to have a subscription to MJ anymore now that Flux is getting better.
Everybody can now be an artist and have creativity for free now.
Creativity has always been accessible to everyone. Creativity is a concept which only requires imagination. Those ideas can take many forms, including writing, cooking, sewing, doing skateboard tricks… It doesn’t mean rearranging pixels on screen by vaguely describing something.
Flux, latest Midjourney, or even dall-e... I'm still disappointed that 3 of my sci-fi prompts never works (interior of an O'Neill Cylinder, Space elevator, rotating space station). I hope someone makes lora of those one day.
I am also still struggling with this. Tried various prompts along the lines of "draw me the inside of a cylindrical space habitat rotating along it's axis to create artificial gravity", like a more detailed:
"Create a highly detailed image of the interior of a massive cylindrical space habitat, rotating along its axis to generate artificial gravity. The habitat's inner surface is divided into multiple sections featuring lush forests, serene lakes, and small, picturesque villages. The central axis of the cylinder emits a soft, ambient light, mimicking natural sunlight, casting gentle shadows and creating a sense of day and night. The habitat's curvature should be evident, with the landscape bending upwards on both sides. Include a sense of depth and scale, highlighting the vastness of this self-contained world."
flux kind of got the idea but the gravity is still off.
"Halo, Interior of a cylindrical space station habitat, upside down, fields, lakes, rivers, housing, rivers running around cylinder roof, Rick Guidice artwork, in space, stars visible through windows"
They definitely struggle with getting the land moving up the wall and often it treats the cylinder like a window onto earth at ground level but I think with tweaking the weighting or order of terms and enough rolls of the dice you could get it.
Well good thing we have Flux out in the open now, both midjourney releasing web version or ideogram releasing there 2.0 on the same day after 2 weeks of flux won't redeem them as much. Flux Dev is amazing, check what SD community is doing with it on https://www.reddit.com/r/StableDiffusion/ . It can do fine tuning, there are Loras now, even control net. It can gen casual photos like no other tool out there, you won't be able to tell they are AI without looking way too deep.