While this is cool, this is not meant to target "game ready". For games and CGI, there's no reason to limit yourself to a single image. Photogrammetry is already extensively used, and it involves using tens or hundreds of images of the object to scan. Using many images as an input will obviously always be superior to a single one, as a single image means it has to literally make up the back side, and it has no parallax information.
You appear to be thinking about scanning a physical object, whereas zero-shot one image to 3D object would be vastly more useful with a single (possibly AI-generated or AI-assisted) illustration. You get a 3D model in seconds at essentially zero cost, can iterate hundreds of times in a single day.
What if I have a dynamically generated character description in my game’s world, generate a portrait for them using StableDiffusion and then turn that into a 3d model that can be posed and re-used?