Hacker News new | past | comments | ask | show | jobs | submit login

There does seem to be a lot of editing work required to get this working.

I would like to understand if there's a more automated way of doing this.




There isn’t yet but my company’s goal is precisely to make this kind of thing accessible to everyone. I don’t want to spam and we haven’t launched yet anyway, but there’s a link in my profile if interested.

I suspect there will be a lot of companies that build businesses out of chaining models together for specific kinds of users. The more you can focus on a niche, the more you can paper over some of the limitations in the models.


The most important work is done by Dalle which has no open API right now. Once they open it up - there could be a way to run a program on all 3. You would still have to cherry pick which frames from Dalle look the nicest


I believe (but not 100% sure) you can achieve this with Stable Diffusion also, which is something you can run locally.


For sure but Stable Diffusion still cannot produce as realistic quality as Dalle 2. Stable Diffusion also has fairly weak inpainting which is what is used for the first step. I am confident it will surpass Dalle 2 at some point though, it has only been open sourced for ~2 weeks after all.


> For sure but Stable Diffusion still cannot produce as realistic quality as Dalle 2.

I don't think that's true. Just like Gimp can be used to make better results than Photoshop (or even mspaint), the quality of the results is up to the artist/user, not the tool itself.

Some SD outputs are truly amazing, but so is some DALL-E 2 outputs. Maybe DALL-E 2 is easier to use for beginners, but people who are good with prompts will get as good output with SD as with DALL-E.


I’ve tried about 500+ prompts on Midjourney, Stable diffusion, and Dalle 2 each. It’s just not there yet, it’s really good for creative results though.

I agree it might do a the job decently for a dress in this instance.


> Just like Gimp can be used to make better results than Photoshop (or even mspaint), the quality of the results is up to the artist/user, not the tool itself.

There's a huge difference between Gimp/Photoshop and an image generation model.

If a particular model can't generate faces properly then the "artist/user" can't get around that unless they develop a new model or find one that can fix the output of the first.


I agree. People who will carry out this activity at a high professional level in creative agencies will have to deepen in the observation and study of language.

In addition to linguistic precision, the parameters involved in prompt composition, for a perfectly controlled artistic result, require technical knowledge, a sense of style and historical knowledge. The more related keywords involved in the composition, the greater the artist's control over the final result. Example: the prompt

_A distant futuristic city full of tall buildings inside a huge transparent glass dome, In the middle of an arid desert full of big dunes, Sunbeams, Artstation, Dark sky full of stars with a bright sun, Massive scale, Fog, Very Detailed, Cinematic, Colorful_

is more sophisticated than just

_A city full of tall buildings inside a huge transparent glass dome_

Note that the conceptual density, hence the quality, of the prompt depends heavily on the cultural and linguistic background of the person composing. In fact, a quality prompt is very similar to a movie scene described in a script/storyboard [by the way, there go the Production Designers, along with the concept artists, graphic designers, set designers, costume designers, lighting designers… ].

In an attempt to monetize the fruits of the new technology, Internet entrepreneurs will be forced by the invisible hand of the job market to delve deeper into language skills. It will be a benign side effect, I think, considering the current state of the Internet. Perhaps this will lead to a better articulation of ideas in the network environment.

Just as YouTube influencers have a knack for dealing with the visual aspects of human interactions, aspiring prompt engineers will have to excel at sniffing out the nuances of human expression. They have great potential to be the new cool professionals of the digital economy, as were web designers, and later influencers — who, with the end of social networks, now tend to lose relevance.

To differentiate themselves, prompt engineers will have to be avid readers and practitioners of semiotics/semiology.

Umberto Eco and the structuralists may return to fashion.

(*) I used a prompt by Simon Willison

https://simonwillison.net/2022/Aug/29/stable-diffusion/


> In addition to linguistic precision, the parameters involved in prompt composition, for a perfectly controlled artistic result, require technical knowledge, a sense of style and historical knowledge.

You are assuming that the models themselves respond accurately to linguistic clues. Actually they embody the cloud of random noise, prejudices, innacuracies and misconceptions in the training data and then pile on a big layer of extra noise by virtue of their stochastic nature.

So this isn't a case of the learned academic with extensive domain knowledge steering a precision machine. It's more like a someone poking a huge chaotic furnace with a stick and seeing what comes out.


> I agree. People who will carry out this activity at a high professional level in creative agencies will have to deepen in the observation and study of language.

So they have to be a character out of a William Gibson novel? Do they rip the labels off their clothes too?

PS: This is awesome.


That beats the point. If you need specialized artists and know how to make this work it’s useless. Might as well just use photoshop then. The whole point is your grandma should be able to do this herself.


???

Imagine someone saying that about Photoshop when it was first introduced...

"Why would you even use Photoshop if so you still have learn a tool in order to use it? Might as well just use a canvas with colors... The whole point of Photoshop is that your grandma should be able to use it!"

No, every tool in the world is not meant to make it zero effort to do something. Some tools are meant to incrementally make it easier, or even just make it less effort for people who already are good at something. And this is OK.


It’s not okay. The whole appeal of AI generated art is you don’t need someone with skills. Photoshop is not even close to an analog.


Inpainting comparison between Stable Diffusion and Dalle 2: https://twitter.com/nicolaymausz/status/1565290282907848704

Have to confess I haven't been bothered to log into D2 since the SD beta started. Which is crazy because my mind was blown when it first launched back in April, but it for now seems like closed AI simply can't keep up with the open source community swarm.


For my usage, dalle 2 is better at understanding or creating pictures that are a good start. But stable diffusion provides a better resolution and provides more settings. I sometimes use them together.


does the inpainting code actually work or is it just a vestigal placeholder from the last version of stable diffusions release?


They will never open it up. They're ClosedAI, because they're a for profit company that makes money off their SaaS which prohibits reverse engineering.


They will open up an API yes. That is different than being open sourced. Actually seems that anyone who currently has access can try this: https://github.com/ezzcodeezzlife/dalle2-in-python


Too many products/businesses use "Open" to mean "You can use it if you pay" rather than "you can understand how it works" or "you can look under the hood".

OpenAI is just another example of a typical SaaS business misusing the word to make themselves seem "nicer" than they are. Ultimately, it's a for-profit business and it will operate as such, shouldn't surprise anybody.


I wish open AI actually had a SAAS model.

If you want a SaaS model, you should have all of the things that the SAAS model supports including premium pricing and enterprise tiers, OpenAI needs to get their act together on DallE2 as no serious business use case will use the consumer pricing and hacked together unofficial apis.


Maybe we differ in how we understand what "SaaS" really is (a debate which is as old as the concept of "SaaS").

For me, if you offer something software-y behind payment, and the software can only be accessed online, it's a SaaS.

No need to offer premium pricing, enterprise tiers, support or anything else. Putting up a online API/UI that is locked behind payment, it's a SaaS (in my eyes).




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: