Hacker News new | past | comments | ask | show | jobs | submit login
Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models (lukashoel.github.io)
220 points by amichail on March 22, 2023 | hide | past | favorite | 20 comments



A different project, perhaps, although with the speed they are popping up it's not easy to keep track, but I was just playing around in a live multiplayer 3D worldspace [1] where a text prompt to instant 360 Skybox is a really cool feature to see working as it forms all around you in realtime (cool on PC amazing in VR). It extends the pipeline of whatever Blockade Labs are using under the hood [2].

[1] https://hyperfy.io/ai-sky

[2] skybox.blockadelabs.com


Definitely hard to keep up with the tech, even if you're deep in it.

I presented a 3D gameplay hack of this at the recent Blockade meetup: https://youtu.be/TfRJeedTeOs

The metric depth model I used (ZoeDepth) is quite new -- most previous models were inverse relative depth, with poor scaling properties, especially for artistic worlds.

But now there is a much better depth model coming from Intel called Depth Fusion which they are adding to the Blockade API and also open sourcing (!)...

Also worth checking out what's possible with SD ControlNet: https://twitter.com/BlockadeLabs/status/1634578058287132674


Reminds me of this project submitted yesterday [0]. I'm trying hard to keep up with the pace of projects and papers being announced. This is all very exciting!

[0] https://zero123.cs.columbia.edu/


Cool. Stereoscopic diffusion images coming soon.


Pretty cool. Now, I wonder, can't you label a certain region of space with a prompt, and let the diffuser do its job? Maybe with some mathematical function to bend into another area.

The idea would be to roughly place the elements in a 3D scene, and adjust the prompt as the camera is moved around the scene.

Here, it's obvious that the "fireplace" prompt causes the model to place a new fireplace as the previous one comes out of view.

Even if you can't precisely label portions of an image, changing the prompt as the camera moves (or changing the weight coefficients for a prompt describing multiple orientations) would avoid that kind of "unnatural" result.

Regardless, impressive results! I wonder if it would perform better if it was re-trained to output a depth channel as well.

It could be useful for (artistically) filling gaps in photogrammetry projects.

I can't wait for painting or drawing styles to be applied to the output!


The example trajectories (https://github.com/lukasHoel/text2room/tree/main/model/traje...) seem to have different prompts for different angles, so you can definitely give a vague layout of the room.


Time for a 3D run through of some classic text adventure games :D


I do believe this kind of quasi-automatic 3D-Realistic-scene-generator is great. But maybe at the end useless.

So why you ask me ? - Now while the speed of 3D generating is wonderfull, and the 3D accuracy is "Ok" (but will progress more in the short future I bet). So great. But... But it lacks a unforgiving flaw : - You can NEVER correct the 3D the 'automat' has guessed for you. I mean it will go wrong in a few parts, and you can NOT do anything with it.Sadly.

I believe that once this kind of softwares really gives the user a magic tric to MANUALLY correct some part of the 3D mesh, you have got a winner and a major selling software.


I tried to do one such software myself. Was kind of almost there : https://youtu.be/ufpajCHLWbg

Example result made by tool (load twice because bug) : https://free-visit.net/fr/demo01


It doesn't look like it creates any discreet models from the different parts of the room, it's just a flat mesh.

So it's a lot of work to still to create an object that makes integrateable with anything else.


When is this stuff making it into games! This would be amazing on the quest.


Shouldn't be too hard to integrate. Just need to load the result into a Unity scene.

If nobody has tried it in a week or two I might give it a go.


That's not true at all, there's no discreet object recognition.

You would still need to create maps for colliders, navigation etc. As well as break the flat mesh this provides down into discreet objects if you wanted any more physics integration.


Sure if you want physics integration it's quite a bigger tasks. Maybe even insurmountable. And it wasn't on my mind at all.

Unless the resulting mesh is huge it shouldn't be too hard to just build a vr viewer.


It's cool and all but all your lighting is going to be pre-baked ...


Which it usually is on Quest titles.


I can't wait to be able to generate 3d housing models from 2d floor plans. That'll probably happen sometime this year, wild how quickly all of this is progressing


Brilliant, been waiting for / working toward this. Is there a way to try it out?


They documented how to get up and running on Github. I see an example in there too. https://github.com/lukasHoel/text2room


Based on the title, I was randomly hoping that:

A) there was a text specification for a room and all the items in it (in terms of visuals at least).

B) that generating from this would be entirely deterministic




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: