Not seeing anything about the dataset, are they still using LAION? There's no me...

Zetobal · on July 4, 2023

The problem with laion was never the picture quality it's the tag quality and a lot of unusable data points. They traded some compute for a horde of users to tag them for them though. https://dbzer0.com/blog/a-collaboration-begins-between-stabl...

mkaic · on July 4, 2023

From hanging out around the LAION Discord server a bunch over the past few months, I've gathered that they're still using LAION-5B in some capacity, but they've done a bunch of filtering on it to remove low-quality samples. I believe Emad tweeted something to this effect at some point, too, but I can't find the tweet right now.

sorenjan · on July 4, 2023

Are any of these txt2img models being partially trained on synthetic datasets? Automatically rendering tens of thousands of images with different textures, backgrounds, camera poses, etc should be trivial with a handful of human models, or text using different fonts.