A different project, perhaps, although with the speed they are popping up it's not easy to keep track, but I was just playing around in a live multiplayer 3D worldspace [1] where a text prompt to instant 360 Skybox is a really cool feature to see working as it forms all around you in realtime (cool on PC amazing in VR). It extends the pipeline of whatever Blockade Labs are using under the hood [2].
The metric depth model I used (ZoeDepth) is quite new -- most previous models were inverse relative depth, with poor scaling properties, especially for artistic worlds.
But now there is a much better depth model coming from Intel called Depth Fusion which they are adding to the Blockade API and also open sourcing (!)...
Reminds me of this project submitted yesterday [0]. I'm trying hard to keep up with the pace of projects and papers being announced. This is all very exciting!
Pretty cool. Now, I wonder, can't you label a certain region of space with a prompt, and let the diffuser do its job? Maybe with some mathematical function to bend into another area.
The idea would be to roughly place the elements in a 3D scene, and adjust the prompt as the camera is moved around the scene.
Here, it's obvious that the "fireplace" prompt causes the model to place a new fireplace as the previous one comes out of view.
Even if you can't precisely label portions of an image, changing the prompt as the camera moves (or changing the weight coefficients for a prompt describing multiple orientations) would avoid that kind of "unnatural" result.
Regardless, impressive results! I wonder if it would perform better if it was re-trained to output a depth channel as well.
It could be useful for (artistically) filling gaps in photogrammetry projects.
I can't wait for painting or drawing styles to be applied to the output!
I do believe this kind of quasi-automatic 3D-Realistic-scene-generator is great.
But maybe at the end useless.
So why you ask me ?
- Now while the speed of 3D generating is wonderfull, and the 3D accuracy is "Ok" (but will progress more in the short future I bet). So great. But...
But it lacks a unforgiving flaw :
- You can NEVER correct the 3D the 'automat' has guessed for you. I mean it will go wrong in a few parts, and you can NOT do anything with it.Sadly.
I believe that once this kind of softwares really gives the user a magic tric to MANUALLY correct some part of the 3D mesh, you have got a winner and a major selling software.
That's not true at all, there's no discreet object recognition.
You would still need to create maps for colliders, navigation etc. As well as break the flat mesh this provides down into discreet objects if you wanted any more physics integration.
I can't wait to be able to generate 3d housing models from 2d floor plans. That'll probably happen sometime this year, wild how quickly all of this is progressing
[1] https://hyperfy.io/ai-sky
[2] skybox.blockadelabs.com