Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Stock Photos Using Stable Diffusion (ghostlystock.com)
190 points by jarrenae on Sept 30, 2022 | hide | past | favorite | 105 comments
Hi HN, this is an early version of what we’re imagining as a truly functional stock photo platform using Stable Diffusion.

We’re doing our best to hide the customization prompts on the back end so users are able to quickly search for pre-existing generated photos, or create new ones that would ideally work as well.

If we keep going with it, in future versions we’d like to add voting, better tags, and more varied prompts, or maybe whatever you recommend!





Yeah I would love a university to use this on their website. https://replicate.com/api/models/stability-ai/stable-diffusi...



The terrifying thing is this is what we look like to the AI. It sees us as melty faced monsters.


Although it would make a good art piece at a campus


In V2 we're planning to add a voting system and additional filtering/tagging to solve for a lot of these unusual/nightmareish summoned images.

I for one am sorry for your cockroach salad jump-scare, but of course, you know summoning from beyond is tricky business.


Actual stock photo websites have human reviewers and it's a big part of their value proposition.


I searched for “happy” and I tend to agree with the nightmarish look. Pretty much all of the results looked like they would love to use their happy teeth to eat you. Consistently hitting the uncanny valley


Yeah, unfortunately uncanny valley is spot on for most generated faces with this version of Stable Diffusion.

I've seen some people tackle the problem by running faces through a secondary AI that makes faces look "better" [1] though in this case "better" just means "pretty" by recent western standards, so that has its own set of shortcomings.

I'd be interested to hear what you think about the inanimate object searches?

[1] https://replicate.com/tencentarc/gfpgan


> AI that makes faces look "better" [1] though in this case "better" just means "pretty" by recent western standards

The AI is not making the faces pretty, it's just restorating them, even the "ugly" ones. The AI was trained on FFHQ, a dataset that contains considerable variation in terms of age, ethnicity; also it was trained by Tencent ARC Lab, so I have no idea how the west has anything to do with this.


It makes the faces average, which correlates with perceived beauty.


Sometimes can be cute: https://dropover.cloud/d6b973#841f5cc5-f92f-44fe-93ec-9e9b9d...

The uncanny valley is real but I guess these issues can be fixed with a "face sanity engine".

"Limbs sanity engine" can help too: https://dropover.cloud/8bf440#ce54ae49-bb16-4ce9-97d1-71b3b8...


Gives a whole new meaning to "debugging."


This might just be the best stock photo site for YouTube creepypasta videos.


Oh yes. Some are very creepy / hilarious. Awesome just the same.


No wonder they call it ghostly.



Nice find! So it was trained with dreamstime images.

Do the output images come with licensing and copyright images, so that dreamstime can be compensated for downstream commercial use?

What a legal mess.


I found an image with istock watermark in huggingface stabilityAI space itself.



I'm surprised it didn't mangle their watermark. It's extremely clear!


Having a button in the search bar that's a blue circle that says Photo, etc, and then not having it start the generation process when clicked feels odd to me. Took me about 30 seconds to realize I had to hit the enter key. Would likely feel weirder on mobile.


Agreed. Mobile we have an added "Search" button appear, but that's on my list of improvements to make.


UX suggestion: example search already performed on the landing page. You can fake it a bit so it's not actually hitting your search logic (and incurring that cost) every time. Just so when you arrive you see the sort of thing a search might return.

[EDIT] Actually instead of dropping straight into the actual search-result UI, how about scrunching the header up a tad more (there's already a bunch of incomplete-looking space under it) and a row of example images with example searches that might bring them up:

    [ Image ]       [ Image ]      [ Image ]
    "Cats playing    "The moon,    "Statue of
     baseball"        made of       liberty
                      cheese"       driving a car"


Thanks for the feedback, and I totally agree. I'd really hoped to fill out the landing page a bit more before we launched today, but we weren't able to roll it out in time.

Honestly I'm taking a lot of inspiration from Unsplash, but gearing up to add very unique features specifically based around generation.

We're going to be iterating on this pretty fast, so I wouldn't be surprised if we have something there within the week.


None of the text-to-image tools seem to really understand 3D geometry, so I feel safe for now. Look at examples for icosahedron [1] vs dodecahedron [2] vs octahedron [3] None of the images were actually geometrically correct - is that quibbling? Maybe, but sometimes for some audience words actually mean something, not just some vague evocation of the angular aesthetic of something. Has someone delineated the words that will not appear in a stock photography prompt? If there was some feedback like "I'm confident in this" to "I'm guessing here, user beware", it would be a lot more usable.

[1] https://replicate.com/api/models/stability-ai/stable-diffusi...

[2] https://replicate.com/api/models/stability-ai/stable-diffusi...

[3] https://replicate.com/api/models/stability-ai/stable-diffusi...


That's one of the things I've found deeply interesting about the current generation of tools, there's little (if any) comprehension going on, it's really just trying to "enhance" a blur/bit of noise to make the image it was told to make.

And I'm not sure I completely know what you mean, but we are planning to add voting and tagging to improve filtering for images.


You saw this right:

https://dreamfusion3d.github.io/

That's the same type of diffusion model used here, and without any further training, it is constrained to generate something that is consistent from all angles when viewed in 3d.


Thanks for pointing that out, maybe I’ll try it.

The question is, if I say “icosahedron” will it actually make a (3D) icosahedron


Not sure I understand how to use this. I searched for "monkey on car" and these are the "categories" I get:

"a dead monkey", "a monkey dancing", "a dead monkey" (again), "a ca"


They are offering you a previously generated image. Need to click the button at bottom of page to get an original rendering "from beyond"


Exactly. We'll improve button location in another iteration.


Also we'll have to add reporting for specific search terms. We do have a NSFW filter on by default, but there are often things that skirt around the rules while are hard to filter for.


The word "penis" returns surprisingly good results and even better related terms.

I find it a amusing that the site seems almost eager to suggest related terms that it then refuses to generate images of.


haha well I hadn't considered this, but you've actually stumbled upon something interesting. Duly noted!

We're using a synonym API that's separate from the image summoning. Basically our filtering happens after the synonym API, but before the image summoning API.


Orgy did well too


They take too long to generate, but there is no clear indication of that. You should add a spinning mouse or other thing that shows that the server is working. (A robot paining a canvas would be nice, but you need someone that can make nice drawings. A hourglass or a spinning circle are good enough.)


Agreed. That's already one of the things I have on the list for v2, "make image summoning more obvious/loading" and also we'll improve the button location for "Summoning new images" because it's likely that users won't want to scroll to the bottom just to generate new images.


I'd make the working indicator a top priority. It's not so difficult to add, but it makes the UI much better.


You're absolutely right. We'll likely have a fix rolled out for this and a few other things on Monday.


Summoning, I like that~


Seances aren't easy things to get working with ones and zeros.


Or in any other context, for that matter!


Dall-E 2 does something great: Show prompts and examples of images that those prompts generate. This educates your consumer to be able to get more of what they want while they wait. It kind of tickles the desire for mastery.


It's fascinating how much AI struggles to mimic signs and text. With as much as we enter text into computers, my instinct was to think this should be really easy for computers, but they don't actually receive and process the abstraction of writing like we do, do they?

We use shapes to indicate sounds and sequences to make words, but the computer is ultimately just getting 1 or 0, on or off. It doesn't seem that it does have the associations we use intuitively because of how humans interact with language.


I'm excited to see the next versions of Stable Diffusion. I think we'll probably see this improve significantly over the next year.


Frankly i am quite hopeful for 10 years. Still not sure if/when i want them driving, but these versions of SD make me feel like they're entirely dumb yet feature rich (almost).

Ie i never really thought we'd get to the point where you could tell a computer to do something like in the movies without the computer _also_ seeming intelligent at the same time.

The thought of the computer being very capable but still no more smart than my terminal is.. interesting.


Vegan gave me this one: https://replicate.com/api/models/stability-ai/stable-diffusi...

I'm actually mildly impressed.


This is awesome. I see this coming builtin to power point.

Cellist eating a donut is super freakish!

https://replicate.com/api/models/stability-ai/stable-diffusi...


Looks like a donut eating a cellist.


The suggested search results are amazing in such a ridiculous way.

"paper" produced "a man reading a newspaper while riding a walrus"

"a wolf reading a newspaper"

"Trapped inside infinity"

and I got to say, the wawlrus readers look passable at a glance when shrunk to low res

https://replicate.com/api/models/stability-ai/stable-diffusi...

https://replicate.com/api/models/stability-ai/stable-diffusi...


I imagine that is a short term problem.


is it a problem!? ;)


It still has trouble understanding sentences, it feels to me that it just generates images based on keywords and not the meaning of my sentence.

For example, I tried "attractive woman disgusted by an ugly bystander" and the generated images show a disgusted woman with no "ugly bystander".

Similar situation with "man angry at a squirrel seeks revenge" (generated image shows an angry squirrel with no man in the image, when the man was the one supposed to be angry..)


This is the biggest difference between SD and Dall-E (and Imagen) to my mind. SD can produce stunning results but it tends to treat prompts as "word salad" rather than a grammatical instruction.


Not sure how to evaluate that. Maybe it's kinda fun, but… I mean, generating crappy images from text isn't exactly new by now. It may be "an early version" (and this is exactly why I struggle to evaluate that — obviously, we shouldn't be too judgemental of "an early version"), but it surely isn't "a truly functional stock photo platform" yet. I mean, by far. "By a light-year" kind of far.


This is definitely a fair assessment. I think a lot of the "wow" factor is just seeing the generated images in the first place.

In truth I think a lot of value will be added as we start improving filtering. Once users are able to vote on "usable" or "unusable" images, or request variations of an existing photo.

I've genuinely used it for 3-4 photos where I would have previously used Unsplash, and I'm optimistic that I can get that number to steadily trend upwards.

I don't expect this to erase any of the existing stock photo tools on the market, though I do think this will add some new value to the space. Honestly my goal was "will my mom be able to use this?"

Hope that helps clarify the goal a bit more, and I do really appreciate the feedback!



Usability note: please add a clickable "search" button.


Whoa this is cool and I would def used a more refined version of it. The images with people are a little bit... freaky but objects and animals look fine.

I wonder if this exists inside of Squarespace or Wordpress. I imagine the ability to generate quality license free stock photos would be a huge selling point for them.


We're going to add voting to help empower users to sort between better/worse summoned images. And an API tool for devs to leverage is planned as well.


The animals definitely do not look fine to me, all the results for "cat" I saw were pretty squarely in the uncanny valley.


It's sort of interesting, given the undeniable power that these new AI techniques have, just how limited the output is at the moment. Only 512x512 images.

I tried a specific query - "man running from a tiger" - and none of the provided images were even close. Seems to be a common problem.


Upscaling with a neural network resizer such as Topaz Gigapixel Ai dramatically improves the usability of the art. On1 also has a resizer that might be a good less-expensive choice.


Upsizing is something I'm really hoping to bring in the next version of the site.

I'm imagining a variety of different tools to improve the images, so ideally if an image is good enough for somebody, they could "enhance it" and from then on, that image would be available at a higher resolution for all users.


I've had great results upscaling with various versions of the open source RealESRGANx4 which I usually just run via Visions of Chaos (which also supports lots of other upscalers, stable diffusion, disco diffusion, and hundreds of other AI related tools).


I really like this idea! Related results work fairly well. Tons of potential here!

Ideas: Allow voting for prompts. Allow voting for results. (But try to prevent the rich get richer effect... https://medium.com/hacking-and-gonzo/how-hacker-news-ranking...) Allow requesting more results for a given prompt.

bug: When there is an error, make it so "back" goes to before the error, instead of before I went to the website perhaps?


I am so thankful we got out of the stock business when we did.

AI generated photos, videos, music and animations are here, and I believe it's only a matter of time before they replace a large percentage of the stock websites/companies.


That's sort of the reason we started building this. I think there will absolutely always be room for paid, high quality stock photos, but "content" at the speed of thought is here, and I'm excited to see how the space evolves.


The suggested tags when searching "anime girl" are just a bit creepy.


Omg I cannot wait for human faces to become non-freaky with this technology. People pay real money to sites like Getty or Adobe (the former of which is owned by a corp that you may or may not find politically compatible with your beliefs) to fill their landing pages. And for specific categories, for example "happy asian couple", there's only a few models to choose from so it becomes repetitive fast.


If you need a non-freaky human face generated by AI, look no further than: https://thispersondoesnotexist.com/


That's good for some situations where all you need is a headshot, yes. Maybe it can be combined with this stable diffusion stuff?


Dreambooth is what you're looking for.


I can't wait either. We're going to add follow-up solutions to upres, expand, and improve facial features. Additionally, we're aiming to improve search terminology on the back end to start providing more relevant results for exactly those sorts of searches.


The landscape and city photos are stunning.

The ones with people and animals tend to have distorted faces or bodies.

But keep up the good work. There’s a definitely a market for this.


Thanks a lot! Of course Stable Diffusion is doing most of the heavy lifting here, but we're working hard to make the UI way more user-friendly and provide real lasting utility.

If you have any suggestions, I'm all ears.


How safe is it to use those stock photos? How sure can one be that stable diffusion does not create any copyrighted work? Is the training data all freely licenced or is there a mechanism that does not recreate the training data, otherwise I could see remaining risks.


In complete transparency, the "jury is out" on this sort of conversation, meaning we don't completely know. Getty Images/Unsplash said they're not allowing AI images because of this concern, but I personally believe that's because they're concerned about the competition/controversy.

Stable Diffusion, like Dalle-2, and MidJourney were 100% trained on unlicensed copyrighted work, which is probably where most of the concern is focused.

Even in this HN thread, there are examples of "summoned" images with what looks to be watermarks or signatures from notable sites and artists.

We technically haven't written our license yet, but it'll basically be along the lines of "don't resell these images without significant alterations, don't use these images to launch a similar tool, don't re-distribute or sell the images as is to other stock photo sites."

These images (AI generated in general, not directly GhostlyStock's photos) have already been used in and on magazine covers, artist album art, and many other places.

My personal take is: - yes these tools have been trained on copyrighted work - yes it does seem rather insensitive to use an artist's name to replicate their style - yes I/we do have rights to license these generated works as we please

The space is rapidly evolving, and I'm also very interested to see how this all plays out over time.

note: I wrote this rather rapidly, so forgive me if I seem a bit blunt about this. I do think it's a conversation that should be taken seriously.


I was intrigued by the text I had as a subtitle in one run:

> Please Contact Us US$1,000 One or more parties reside in a country not supported by Escrow.com. Please contact us to discuss alternate payment options with a Flippa Representative.



Would love the option to not just have square images, but other formats as well (especially 16:9)


Spot on. In my initial sketches for the site I had vertical/horizontal sorting for images, and I think we'll be able to add that pretty quickly.


This needs a lot of curation. Maybe put some ads and offer a percentage to users willing to give ratings to the quality of the images. I will be happy to spend some time clicking , to earn some money for a beer once in a while.


You're spot on with the curation! I'm really exciting to add voting and weighing to the images. I think that's where this'll really start to shine. "Girl reading in a coffee shop" generates about 1 usable image per summon, so I'm looking forward to empowering users to manually filter for more usable images.


512x512 is standard for SD generated images, but seems pretty low res when you look at it as a stock photo. Might be good to provide an AI upscaled version of the image for download.


Use the Lexica API and show images from them as well! https://lexica.art/docs


For a bad time, search for “open” there


SD is so human that it too have a problem drawing bicycle.


A simple url-argument-based API to get back a photo, similar to source.unsplash.com’s behavior would be terrific.


100% Agreed on this one. I was really hoping we'd have that feature ready before launching today. Hopefully we'll have an early version up in the next week or so.


"D-Man programming" didn't turn up anything interesting, though images.google.com got it right :D


For more vague terms, this probably won't do too well for a while, but this definitely isn't going to "replace" anything else completely in the near future. It's really intended to be a new tool to add to the toolbox.


Gotta love some of these suggestions:

"The Ethereum logo glowing above wet pavement, with a bull charging"


If I search for "bicycle repair shop", all I see are images for car repair shops.


[ferrari] got me pictures of ferrets. then it got HN hug.


Interesting. Over time we'll be working to improve more specific searches to prevent unusual errors. Right now we're just loosely comparing letters in searches.

And yeah, our servers are working overtime, but what's fun is that as more images are generated, there's less of a need to "bulk generate" because many searches show existing results.


Id like to normalize the term “commissioning an AI”


penis

shows fountain pen


Doesnt work. I type my prompt and it gives results for something different.


Sorry, it's a little clunky at the moment. Just to clarify, you're actually directly searching for images that have been previously been summoned by other users. Our recommendation algorithm right now is "pretty bad" because we're just checking for words that might be related.

But if you scroll down to the bottom of the search results, you can "summon more images from beyond", and then you'll see a direct prompt generation to your search.

Ideally this'll enable people to reuse images that are really good and viable over and over again just like stock photos instead of searching hard for new images from random image generations, which is pretty exciting I think.

Let me know if this helps!


Terrible.


I'm interested to hear why you think so? If theres's something specifically wrong with what we've built that we could possibly improve, I'd appreciate the feedback.


It's a horror show. Creepy and weird stock photos for sure. Useful for Halloween perhaps, but uncanny as hell. I cannot think of much else to say other than that it's terrible.

Maybe some people want terrible, but that doesn't change the fact that an arm is melding into the mouth of this girl's face, and all I asked for was "smiling".


Yeah, that's the unfortunate side effect of generated images, there's no comprehension of generation on the AI's side of things. People rarely come out correctly right now, but animals tend to do a bit better. Inanimate objects are mostly useable.

This is more of a proof of concept for us, and I'm trying to solve for some of this by adding voting and improved prompts on the back end. I'm hopeful that in a few weeks we'll have a much improved version out with less "horror show" images overall.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: