Hacker News new | past | comments | ask | show | jobs | submit login
How Good Is DALL-E Mini at Origami? (kosmulski.org)
51 points by mkosmul on June 18, 2022 | hide | past | favorite | 30 comments



One related property of GPT-3: It's very bad at traditional computational tasks.

* "Make a list of 20 items" results in a list. The number of items is as accurate as if you asked a toddler the same question.

* If you ask GPT-3 a simple combinatorics question, it will be 100% confident in the wrong answer.

Origami is sort of the same. It takes a conceptual understanding of how paper folds, which DALL-E Mini doesn't have. It has a feel for the general origaminess of a picture.

If I showed a human being a few pieces of origami, including a paper crane, and they had never seen origami before, they'd likely result in similar pictures.


Don't overestimate humans. Most people (adults, not toddlers) can't even draw a bicycle, even if they used one for most of their lives, so presumably they have a conceptual understanding of how it looks and works.

https://www.fastcompany.com/3059089/it-turns-out-its-almost-...


This example gets trotted out a lot but I don't really understand it. Why do we assume cyclists have a conceptual mechanical understanding of a bicycle and can remember its exact appearance? If they build or repair bicycles, sure, but the majority of people don't do that. They just have learnt how to operate one by instinct.


I think that's the point, the bike example shows people have some concept of what a bike is (pedals, wheels, frame) even without an exact understanding of it, or a precise image in mind.

It's the same for "draw a house". You may do a square facade, triangular roof, windows, chimney, door. It is equally unrealistic, because people don't have a clue how tall floors should be, have no sense of proportion etc..

It's just that in the bike case it's more obviously wrong.


I feel like we underestimate how close these systems are to sentience, largely by virtue of overestimating humans.

The largest ML systems of today have roughly the same complexity as human brains, and evolve in much the same way. The brain has 100 billion neurons, and GPT-3 has 175 billion parameters. Neurons and parameters aren't comparable, but there isn't an obvious advantage in either direction. Neurons have more parameters than ML parameters, but also operate at around 10Hz, versus many, many MHz.

That doesn't mean machine sentience will be anything like human sentience. Brain disorders are helpful to look at here -- there are people who don't experience specific emotions (e.g. pain, fear, etc.). Even a minor tweak can have a major impact. That's far less than, for example, evolving without evolutionary pressure for self-preservation, for pro-social behavior, or with the sort of ephemeral nature of ML systems.


Actual webpage for the artist's project: https://www.gianlucagimini.it/portfolio-item/velocipedia/


I know this is a little bit banal, but I feel like: (1) the author is thinking about "origami" (2) the model is only able to create "pictures of origami"

The model can only ever be trained on pictures of origami. Thus, the model can generate images that are getting close to "pictures of origami", but (as pictures necessarily are abstracted 2d projections) this might still be way way way off from "origami". Not knowing about actual origami, only ever having seen pictures, I thought most of the generated images were quite good. The actual experienced origami-folding person doesn't see it that way.

I hope my thought is phrased clearly enough, I am having trouble finding the right words here.


Semi related question for those more familiar with current AI capabilities: Has there been any attempt to "see" what dinosaurs looked like from their fossils? Using existing known animals and their skeletons as a training set.


I don't mean this to sound overly negative, because I absolutely think DALL-E is a killer app amongst recent AI advances. But the thing that made DALL-E astonishing is that it was... good. While DALL-E Mini mimics a lot of the technical advances and you can kind of see what it's getting at with its outputs, they're still mostly garbage. Very clever garbage! But they lack the emotional impact that - woah! - this is doing something superhuman.

Obviously the hope is that somehow this and future advances can be democratised. It was funny that Asimov's The Last Question has been posted here a couple of times recently because it makes such a big thing about world-sized computers and how advanced minicomputers would be. It's easy to read and scoff at the naivety... before realising we could easily be heading back in that direction for many impactful future technologies.


What makes DALLE Mini great is that we can all sit down and play with it, with no "oh this thing might destroy humanity" warning. A warning that most people who have worked seriously on different areas of AI find annoying for different reasons, but mainly it feels like a marketing gimmick to draw attention.

I have lots of friends who aren't related to the tech field having lots of fun playing with DALLE Mini, even though the results are terribly looking -- if they sort of resemble the prompt (and many times they do), they are ecstatic that the machine made a weird doodle about something ridiculous.


> What makes DALLE Mini great is that we can all sit down and play with it, with no "oh this thing might destroy humanity" warning

It seems like the DALL-E creators are mostly worried about the (possibly justified!) fear that people will use it to make racist or other offensive imagery, and it would bring very bad PR to the team.


Honestly, I thought the images generated were actually pretty good. The shadows of the paper folds, the types of folds typically used. It all felt "close enough" to be very impressive for an AI model.


Checked the model, and the "model card" https://huggingface.co/dalle-mini/dalle-mini#bias is an interesting exercise in sensitivity absurdity:

"Bias

CONTENT WARNING: Readers should be aware this section contains content that is disturbing, offensive, and can propagate historical and current stereotypes."

Spoiler alert, nothing contained in that section requires a warning. It's just abstract descriptions of "potential" negative stereotypes in images.

"initial testing demonstrates that they may generate images that contain negative stereotypes against minoritized groups"

Minoritized is a new word for me. As though minority status is something actively attached to someone. But no duh I can ask dalle to generate "images of klan members at a lynching" or "inner city police brutality" and get negative images.

"When the model generates images with people in them, it tends to output people who we perceive to be white, while people of color are underrepresented."

I'd like to see real testing, because from what I can tell this is not true. Ask for "white people" and you get weird abstract models of white figures. Ask for "black people" and you get beautiful photos of smiling black faces.

Is this the kind of exercise AI researchers have to concern themselves with these days?


The statement was copied wholesale from OpenAI, and the DALL•E mini implementation simply isn't as capable at depicting people, so the biases aren't as apparent. Yet.


Just tried all of the prompts from the OP's post on OpenAI's DALL-E 2 - https://harishgarg.com/writing/generating-origami-images-usi...

DALL-E 2 beats Mini in almost all of them.


Some of the issues seemingly stem from the model's either poor or mis-understanding of the input language... I wonder what a fusion of DALL-E + GPT3 or LaMBDA, where the text-based models perform prompt interpretations, would look like.

This may be a naïve thought as my understanding of all models mentioned is superficial at best.


The text input comprehension is (supposedly) much better in Google's "DALL-E 2", https://imagen.research.google/


Slightly OT, although there might be some sort of connection with origami: does anyone know if DALL-E can produce vector images?


No, it's trained on pixels and it generates pixels. You could make a vector AI if you had millions of labeled vector inputs.


GPT-3 already seems capable of generating SVG. I prompted it with:

  <?xml version="1.0" encoding="UTF-8"?><svg xmlns="http://www.w3.org/2000/svg"
and it completed it to the following:

  <?xml version="1.0" encoding="UTF-8"?><svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" viewBox="0 0 24 24" id="cloud-sun_svg__-SimpleVector-0"><defs><linearGradient spreadMethod="pad" x2="1" y2="1"><stop offset="0" stop-color="#fff" stop-opacity=".2"/><stop offset="1" stop-opacity=".2"/></linearGradient></defs><path d="M5.5 13.5A2.5 2.5 0 008 16h8a2.5 2.5 0 002.5-2.5S19.5 10 17.5 10H9c-2.5 0-4.5 2-4.5 4.5zM10 11h.5a.5.5 0 100-1h-2a.5.5 0 100 1H10zm2 3.5h2a.5.5 0 100-1h-2a.5.5 0 100 1zm-4 2H8a.5.5 0 100 1h2a.5.5 0 100-1zm-2-2H8a.5.5 0 100 1h2a.5.5 0 100-1z" fill="#000" fill-opacity=".3"/><circle cx="12" cy="19" r="2" fill="#fff"/><path d="M11.5 19h-2c0-1.5 1-2.5 2.5-2.5S14 15.5 14 17h.5a.5.5 0 100-1h-2a.5.5 0 100 1z" fill="#fff"/><path d="M10 13.5h2a.5.5 0 100-1h-2a.5.5 0 100 1z" fill="#fff"/><path fill="url(#cloud-sun_svg__-SimpleVector-0)" d="M16 6l-5 5 5 5v-7c1.5 0 3 1 3 3v2z"/><path fill="#fff" d="M12.5 17h2v2h-2z"/></svg>
which looks like this: https://i.imgur.com/sHpv4Ii.png


Do androids dream of disemboweled cubist ducks?


Guinea pig wearing a hat?

https://i.imgur.com/Q8KWdAO.png


Do some of them have no eye? Are they proper pirates?


Yes, but that's just random SVG (xml); it would be amazing to be able to ask for specific shapes or silhouettes.


Try it! codex model.

It's pretty random. It will do a fine smiley face, for example. Most other things, it won't do.


This is damn scary! In a way that people might actually start using this technology (which does not really know what it's doing)...


Is DALL-E publicly available or what? How do I work on generating images?


DALLE-2 (from OpenAI) isn't, but DALLE Mini (from huggingface) is: https://huggingface.co/spaces/dalle-mini/dalle-mini


When can we get our hands on DALLE-2?


You can sign up for beta access in the meantime, but OpenAI hasn't given a timeline for access. I think eventually they'll open up some kind of playground like they did with GPT-3 and sell the service to other businesses.

Currently some prominent AI researchers, artists, and journalists have access to DALLE-2, and some have been taking requests from social media and posting the results, like https://twitter.com/hardmaru/, so you can check their feed for a few examples or give a few ideas when they ask again for ideas.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: