Even the full 7b model's results are relatively low-res (384x384) so its hard for me to imagine the generative aspect of the 1b model would be useable.
I am not sure if the results are that comparable to be honest. For example DALL-E expands the prompt by default to be much more descriptive. We would need to somehow point out that it is close to impossible to produce the same results than DALL-E, for example.
I bet there has been a lot of testing that what looks "by default" much more attractive for the general people. It is also a selling point, when low effort produces something visually amazing.
Comparisons with other SoTA (Flux, Imagen, etc):
https://imgur.com/a/janus-flux-imagen3-dall-e-3-comparisons-...