Hacker News new | past | comments | ask | show | jobs | submit login

> A Real-Time Text-to-Image Generation Model

> On an A100, SDXL Turbo generates a 512x512 image in 207ms (prompt encoding + a single denoising step + decoding, fp16), where 67ms are accounted for by a single UNet forward evaluation.

Okay... so what part of this is real time? 207ms is 4.8Hz. 67ms is 14.9Hz. Isn't "real time" in graphics considered to be at least 30Hz (33ms)? And by today's standards at minimum 60Hz (16ms) if not 144Hz (7ms)? I'm lost at what part of this is real time? I'm not sure it even would get there with an H100. Maybe an H100 and everything is TensorRT?




It's "real-time" as in "I finished typing and the image is already there"


I think it is still a but deceptive to say this considering a well know and high usage alternative definition exists. Plus that's on commercial grade hardware.


Considering the use-case of just interacting with a computer and typing prompts, I'd call it real-time.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: