Hacker News new | past | comments | ask | show | jobs | submit login

Even the full 7b model's results are relatively low-res (384x384) so its hard for me to imagine the generative aspect of the 1b model would be useable.

Comparisons with other SoTA (Flux, Imagen, etc):

https://imgur.com/a/janus-flux-imagen3-dall-e-3-comparisons-...




I am not sure if the results are that comparable to be honest. For example DALL-E expands the prompt by default to be much more descriptive. We would need to somehow point out that it is close to impossible to produce the same results than DALL-E, for example.

I bet there has been a lot of testing that what looks "by default" much more attractive for the general people. It is also a selling point, when low effort produces something visually amazing.


It's still very impressive that it gets the cube order right!

Also it looks like octopuses are suffering the “six finger hand” syndrome with their arms from all models.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: