So I've done a bit of comparative testing between Janus 7b and Flux Dev - strictly considering PROMPT ADHERENCE since Janus is limited to 384x384. As mentioned elsewhere upscaling is a FAR simpler problem to solve than adherence.
Results testing star symmetry, spatial positioning, unusual imagery:
Prior to flux 90℅ of my SD images had one dimension smaller than 480-512px. I prefer the smaller images both for speed and bulk/batch, I can "explore the latent space" which to me means running true random images until one catches my eye, then exploring the nearby seeds and subseeds - the model seed and then there's a smaller latent space seed that kind mutates your image slightly. All images in a batch might share the first seed but the second seeds are all different. Just what I call exploring the latent space. I can make a video, because i doubt what I typed makes perfect sense.
Nice. A couple discord users back in the early days of SD were doing something similar by generating random alphanumeric positive/negative prompts and then pushing the seed/subseed values up and down.
In my experience, changing the seed even by a single digit can drastically alter the image so I'd be curious to know how truly "adjacent" these images actually are.
it doesn't drastically alter the images, in my experience. More like changing the trim on a dress or the shape of drapes. The structure and composition of the nearby images are similar.
Could you send me the video if you ever end up making it? I don’t understand how jumping between nearby seeds means anything in the latent space. As far as I know it’s closer to like a hash function where the output is drastically different for small changes in the input.
i posted 2 replies to your sibling comments: 1 with a demo of what i mean with 2 batches of 4 with completely random and then latent space "random" seeds; and then a second comment with a single imgur link that shows the only setting i touched and an explanation of how i use it.
I apologize if this isn't what "exploring latent space" means but /shrug that's how i use it and i'm the only one i know that knows anything about any of this.
edit to add: i get frustrated pretty easily on HN because it's hard to tell who's blowing smoke and who is actually earnest or knows what they're talking about (either or is fine). I end up typing a whole lot into this box about how these tools work, how i use them, the limitations, unexpected features...
https://imgur.com/a/PpYGnOz
unsure about other UI, but:
you can usually set a seed, and also see the seed of an image you've generated. so generate/load an image you like so you have the seed. Lock the seed. Find the variation seed setting. lock that (on automatic1111's webUI it automatically locks to the main seed) - now adjust the variation strength. If you're doing small images you can make this small, because the variations will be very minor. I set 0.1 - which i use with larger images if i am looking for a specific color smear or something, but once i narrow it down i reduce this 0.05 or below. When you click an image in a UI it ought load all the details into the configurable parts, including the variation seed / subseed, which means you can just keep exploring around individual variations' spaces, too. expanding the strength a bit if you get stuck in local minima (or boring images), and reducing the strength to get the image you want to rescale to publish or whatever.
Good idea - I've updated the comparisons with Imagen3 and DALL-E 3. I also cherry picked the best result from each GenAI system out of a max of 12 generations.
Results testing star symmetry, spatial positioning, unusual imagery:
https://imgur.com/a/nn9c0hB