Hacker News new | past | comments | ask | show | jobs | submit login

I believe (but not 100% sure) you can achieve this with Stable Diffusion also, which is something you can run locally.



For sure but Stable Diffusion still cannot produce as realistic quality as Dalle 2. Stable Diffusion also has fairly weak inpainting which is what is used for the first step. I am confident it will surpass Dalle 2 at some point though, it has only been open sourced for ~2 weeks after all.


> For sure but Stable Diffusion still cannot produce as realistic quality as Dalle 2.

I don't think that's true. Just like Gimp can be used to make better results than Photoshop (or even mspaint), the quality of the results is up to the artist/user, not the tool itself.

Some SD outputs are truly amazing, but so is some DALL-E 2 outputs. Maybe DALL-E 2 is easier to use for beginners, but people who are good with prompts will get as good output with SD as with DALL-E.


I’ve tried about 500+ prompts on Midjourney, Stable diffusion, and Dalle 2 each. It’s just not there yet, it’s really good for creative results though.

I agree it might do a the job decently for a dress in this instance.


> Just like Gimp can be used to make better results than Photoshop (or even mspaint), the quality of the results is up to the artist/user, not the tool itself.

There's a huge difference between Gimp/Photoshop and an image generation model.

If a particular model can't generate faces properly then the "artist/user" can't get around that unless they develop a new model or find one that can fix the output of the first.


I agree. People who will carry out this activity at a high professional level in creative agencies will have to deepen in the observation and study of language.

In addition to linguistic precision, the parameters involved in prompt composition, for a perfectly controlled artistic result, require technical knowledge, a sense of style and historical knowledge. The more related keywords involved in the composition, the greater the artist's control over the final result. Example: the prompt

_A distant futuristic city full of tall buildings inside a huge transparent glass dome, In the middle of an arid desert full of big dunes, Sunbeams, Artstation, Dark sky full of stars with a bright sun, Massive scale, Fog, Very Detailed, Cinematic, Colorful_

is more sophisticated than just

_A city full of tall buildings inside a huge transparent glass dome_

Note that the conceptual density, hence the quality, of the prompt depends heavily on the cultural and linguistic background of the person composing. In fact, a quality prompt is very similar to a movie scene described in a script/storyboard [by the way, there go the Production Designers, along with the concept artists, graphic designers, set designers, costume designers, lighting designers… ].

In an attempt to monetize the fruits of the new technology, Internet entrepreneurs will be forced by the invisible hand of the job market to delve deeper into language skills. It will be a benign side effect, I think, considering the current state of the Internet. Perhaps this will lead to a better articulation of ideas in the network environment.

Just as YouTube influencers have a knack for dealing with the visual aspects of human interactions, aspiring prompt engineers will have to excel at sniffing out the nuances of human expression. They have great potential to be the new cool professionals of the digital economy, as were web designers, and later influencers — who, with the end of social networks, now tend to lose relevance.

To differentiate themselves, prompt engineers will have to be avid readers and practitioners of semiotics/semiology.

Umberto Eco and the structuralists may return to fashion.

(*) I used a prompt by Simon Willison

https://simonwillison.net/2022/Aug/29/stable-diffusion/


> In addition to linguistic precision, the parameters involved in prompt composition, for a perfectly controlled artistic result, require technical knowledge, a sense of style and historical knowledge.

You are assuming that the models themselves respond accurately to linguistic clues. Actually they embody the cloud of random noise, prejudices, innacuracies and misconceptions in the training data and then pile on a big layer of extra noise by virtue of their stochastic nature.

So this isn't a case of the learned academic with extensive domain knowledge steering a precision machine. It's more like a someone poking a huge chaotic furnace with a stick and seeing what comes out.


> I agree. People who will carry out this activity at a high professional level in creative agencies will have to deepen in the observation and study of language.

So they have to be a character out of a William Gibson novel? Do they rip the labels off their clothes too?

PS: This is awesome.


That beats the point. If you need specialized artists and know how to make this work it’s useless. Might as well just use photoshop then. The whole point is your grandma should be able to do this herself.


???

Imagine someone saying that about Photoshop when it was first introduced...

"Why would you even use Photoshop if so you still have learn a tool in order to use it? Might as well just use a canvas with colors... The whole point of Photoshop is that your grandma should be able to use it!"

No, every tool in the world is not meant to make it zero effort to do something. Some tools are meant to incrementally make it easier, or even just make it less effort for people who already are good at something. And this is OK.


It’s not okay. The whole appeal of AI generated art is you don’t need someone with skills. Photoshop is not even close to an analog.


Inpainting comparison between Stable Diffusion and Dalle 2: https://twitter.com/nicolaymausz/status/1565290282907848704

Have to confess I haven't been bothered to log into D2 since the SD beta started. Which is crazy because my mind was blown when it first launched back in April, but it for now seems like closed AI simply can't keep up with the open source community swarm.


For my usage, dalle 2 is better at understanding or creating pictures that are a good start. But stable diffusion provides a better resolution and provides more settings. I sometimes use them together.


does the inpainting code actually work or is it just a vestigal placeholder from the last version of stable diffusions release?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: