Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models

jimmySixDOF · on March 22, 2023

A different project, perhaps, although with the speed they are popping up it's not easy to keep track, but I was just playing around in a live multiplayer 3D worldspace [1] where a text prompt to instant 360 Skybox is a really cool feature to see working as it forms all around you in realtime (cool on PC amazing in VR). It extends the pipeline of whatever Blockade Labs are using under the hood [2].

[1] https://hyperfy.io/ai-sky

[2] skybox.blockadelabs.com

avaer · on March 22, 2023

Definitely hard to keep up with the tech, even if you're deep in it.

I presented a 3D gameplay hack of this at the recent Blockade meetup: https://youtu.be/TfRJeedTeOs

The metric depth model I used (ZoeDepth) is quite new -- most previous models were inverse relative depth, with poor scaling properties, especially for artistic worlds.

But now there is a much better depth model coming from Intel called Depth Fusion which they are adding to the Blockade API and also open sourcing (!)...

Also worth checking out what's possible with SD ControlNet: https://twitter.com/BlockadeLabs/status/1634578058287132674

wsgeorge · on March 22, 2023

Reminds me of this project submitted yesterday [0]. I'm trying hard to keep up with the pace of projects and papers being announced. This is all very exciting!

[0] https://zero123.cs.columbia.edu/

smaddox · on March 22, 2023

Cool. Stereoscopic diffusion images coming soon.

MayeulC · on March 22, 2023

Pretty cool. Now, I wonder, can't you label a certain region of space with a prompt, and let the diffuser do its job? Maybe with some mathematical function to bend into another area.

The idea would be to roughly place the elements in a 3D scene, and adjust the prompt as the camera is moved around the scene.

Here, it's obvious that the "fireplace" prompt causes the model to place a new fireplace as the previous one comes out of view.

Even if you can't precisely label portions of an image, changing the prompt as the camera moves (or changing the weight coefficients for a prompt describing multiple orientations) would avoid that kind of "unnatural" result.

Regardless, impressive results! I wonder if it would perform better if it was re-trained to output a depth channel as well.

It could be useful for (artistically) filling gaps in photogrammetry projects.

I can't wait for painting or drawing styles to be applied to the output!

gs17 · on March 22, 2023

The example trajectories (https://github.com/lukasHoel/text2room/tree/main/model/traje...) seem to have different prompts for different angles, so you can definitely give a vague layout of the room.

totetsu · on March 23, 2023

Time for a 3D run through of some classic text adventure games :D

tmilard · on March 23, 2023

I do believe this kind of quasi-automatic 3D-Realistic-scene-generator is great. But maybe at the end useless.

So why you ask me ? - Now while the speed of 3D generating is wonderfull, and the 3D accuracy is "Ok" (but will progress more in the short future I bet). So great. But... But it lacks a unforgiving flaw : - You can NEVER correct the 3D the 'automat' has guessed for you. I mean it will go wrong in a few parts, and you can NOT do anything with it.Sadly.

I believe that once this kind of softwares really gives the user a magic tric to MANUALLY correct some part of the 3D mesh, you have got a winner and a major selling software.

tmilard · on March 23, 2023

I tried to do one such software myself. Was kind of almost there : https://youtu.be/ufpajCHLWbg

Example result made by tool (load twice because bug) : https://free-visit.net/fr/demo01

antiatheist · on March 23, 2023

It doesn't look like it creates any discreet models from the different parts of the room, it's just a flat mesh.

So it's a lot of work to still to create an object that makes integrateable with anything else.

bilsbie · on March 22, 2023

When is this stuff making it into games! This would be amazing on the quest.

worldsayshi · on March 22, 2023

Shouldn't be too hard to integrate. Just need to load the result into a Unity scene.

If nobody has tried it in a week or two I might give it a go.

antiatheist · on March 23, 2023

That's not true at all, there's no discreet object recognition.

You would still need to create maps for colliders, navigation etc. As well as break the flat mesh this provides down into discreet objects if you wanted any more physics integration.

worldsayshi · on March 23, 2023

Sure if you want physics integration it's quite a bigger tasks. Maybe even insurmountable. And it wasn't on my mind at all.

Unless the resulting mesh is huge it shouldn't be too hard to just build a vr viewer.

nineteen999 · on March 22, 2023

It's cool and all but all your lighting is going to be pre-baked ...

andybak · on March 23, 2023

Which it usually is on Quest titles.

canadiantim · on March 22, 2023

I can't wait to be able to generate 3d housing models from 2d floor plans. That'll probably happen sometime this year, wild how quickly all of this is progressing

jasonjamerson · on March 22, 2023

Brilliant, been waiting for / working toward this. Is there a way to try it out?

pininja · on March 23, 2023

They documented how to get up and running on Github. I see an example in there too. https://github.com/lukasHoel/text2room

fouc · on March 23, 2023

Based on the title, I was randomly hoping that:

A) there was a text specification for a room and all the items in it (in terms of visuals at least).

B) that generating from this would be entirely deterministic