GANcraft: Unsupervised 3D Neural Rendering of Minecraft Worlds

Vermeulen · on April 16, 2021

Take this idea, and apply it to Google Earth. Have it procedurally generate the rest of the data as I zoom in.

Its incredible I can check out my small home town on Google Earth now entirely in 3d (which wasnt the case just a few months ago). Yet the trees/cars are still these blocky low resolution things sticking out of the ground. Imagine Google Earth procedurally generating a high resolution mesh as I zoomed in. Train it with high resolution photogrammetry of other similar locations for the ground truth - and let me zoom in endless in their VR app

nicklecompte · on April 16, 2021

I like that idea as a separate program (or just in the VR app) but I think it would be confusing/misleading in Google Earth itself. At the very least there needs to be a clear user-facing indication as to which content is procedurally generated (ie fictional) versus photographed (ie real). Obviously there’s a grey area with image processing but I think there’s a real concern with prioritizing nice pictures over actual information.

esclerofilo · on April 16, 2021

I believe Microsoft Flight Simulator does this (and it has VR support). It gets a bit tiring to see the same trees on every place on earth, though.

ThalesX · on April 16, 2021

Their building gen is also super nice but it really takes away from the experience when flying a known location.

LennyWhiteJr · on April 16, 2021

Yeah, going and flying over the small rural farm I grew up on I was disappointing since Flight Simulator had procedurally generated a whole compound full of random buildings that made it look like some cult compound.

_k0o1 · on April 16, 2021

What would be fun is the inverse. Feed in a section of Google Earth and have it generate a Minecraft world out the other end.

Bjartr · on April 16, 2021

It appears that someone's worked that out.

reddit.com/r/Minecraft/comments/x56lg/using_google_earth_terrain_data_for_making_a/

_k0o1 · on April 16, 2021

This is fantastic, good find!

kebman · on April 16, 2021

It's been a couple of years, but I still like the Outerra project. And also the Bohemia Interactive VBS.

diplodocusaur · on April 16, 2021

In the works https://www.youtube.com/watch?v=8_bW3ab8YAk

milkey_mouse · on April 16, 2021

https://xkcd.com/1204/

poorman · on April 16, 2021

Is this what https://earth2.io is doing?

robertlagrant · on April 16, 2021

No, that's probably a scam. https://www.youtube.com/watch?v=ZaijNcRuzsQ

Vermeulen · on April 16, 2021

a crypto scam isnt what i had in mind

brundolf · on April 16, 2021

Super cool.

With that said I bet this would choke on lots of actual Minecraft worlds, because people often build things using blocks where the semantics get thrown completely out the window in favor of aesthetics. Want a big mural on the wall? You're going to be building the wall itself out of differently-colored blocks of wool

Maybe they'll solve that part one day :)

Edit: That said, it could choke in some really interesting ways...

captainclam · on April 16, 2021

Question for the experts: Is it possible that GANs will be used for rendering video game environments in the near-ish future? This has been one of my private predictions with respect to upcoming tech, but I'd love to know if people are already thinking about this, or alternatively, why it won't happen.

jayd16 · on April 16, 2021

It depends on how exactly you frame the question.

If you ignore the implementation, its basically a procedural texturing technique? Those are widely used now.

If you're talking about a real time post effect, it would probably be a bit too slow for a few more years.

If you count SLAM techniques that label camera feeds for AR games, those are very close but I dont think most run at a full framerate.

_Microft · on April 16, 2021

"SLAM" is this: https://en.wikipedia.org/wiki/Simultaneous_localization_and_...

jayd16 · on April 16, 2021

Yep, sorry about that.

_Microft · on April 16, 2021

No problem, I thought I could help with that :)

Aeolun · on April 17, 2021

> If you're talking about a real time post effect, it would probably be a bit too slow for a few more years.

The paper in question can apparently render at 2K and 30fps though. Or at least that’s what the videos claim.

Tenoke · on April 16, 2021

Non real-time as in generating levels for your game to release it's doable today. Real-time it will probably be doable soon, especially for shared-world games where a server can generate for multiple people at a time rather than single-player.

Even real-time today it should be doable if you create your game with that in mind. You really don't need to generate all the textures, just a compact representation of the level which is to be rendered normally after the fact.

mcbuilder · on April 16, 2021

Artistically, developers could do some trippy dream sequences with GANs, where the glitchyness and training artifacts add to the immersions. Because one can sample GANs or mix in latent dimensions, the experience can be tailored individually based on the characters decisions for instance.

aspaviento · on April 16, 2021

If the result is also a 3d environment, it could save a lot of time designing scenarios.

rcv · on April 16, 2021

I'm not sure if this is what you're asking about, but here's a Two Minute Papers (dear fellow scholars!) video about a deep learning paper for super sampling with some applications for games: https://www.youtube.com/watch?v=OzHenjHBBds

m463 · on April 16, 2021

That seems like an obvious use case.

We've had procedurally generated worlds for a long time, but this would take it from roguelike top-down or isometric to immersive fps.

malka · on April 16, 2021

It has already started with dlss. I am not sure about Nvidia implementation but super resolution can have some adverserial training.

If you mean asset / level generation. Then yes. It is the next step in procedural generation imo.

Agentlien · on April 16, 2021

DLSS is just image upscaling using a neural network. It's a very different problem from what is shown here or what I believe GP is talking about.

d23 · on April 16, 2021

Maybe I'm just too dumb, but I wish these papers would cut the nonsense and explain the key elements in layman's terms with simple examples. I'm super curious how you can do something like this in a fully unsupervised fashion, but the "Hybird Voxel-conditional Neural Rendering" doesn't mean much to me. Maybe if I knew what "voxel-bounded neural radiance fields" were...

chrisseaton · on April 16, 2021

> I wish these papers would cut the nonsense and explain the key elements in layman's terms with simple examples

If papers did that, they'd be a thousand pages long. The target audience is people intimately familiar with the state of the art.

Mehdi2277 · on April 16, 2021

The voxel bounded neural radiance field is important as neural radiance field was some prior research paper this builds off. But the very high level is just voxel data to image generation using some form of neural nets. I didn’t look at the paper but I’d hope it summarizes neural radiance fields and if not it’ll at least cite them and then you’d read there and see how this paper extends that work.

ArtWomb · on April 16, 2021

Personally prefer blocky Voxel Art to the photoreal scene ;)

NVidia also released their RTXDI SDK for global illumination at a scale of millions of dynamic lights in real time. Combined with GANCraft, anyone could become a world class environmental artist using only Pixel Art tools.

https://developer.nvidia.com/rtxdi

_dax6 · on April 16, 2021

NVidia really likes rendering landscapes, huh? https://blogs.nvidia.com/blog/2019/03/18/gaugan-photorealist...

BugsJustFindMe · on April 16, 2021

Landscape images are classic exemplars for texture-by-number algorithms because nature's variety means that it's easier to make them look real enough.

See the OG transfer algorithm called "Image Analogies" from decades before the GAN boom:

https://mrl.cs.nyu.edu/projects/image-analogies/potomac.html

https://mrl.cs.nyu.edu/projects/image-analogies/arch.html

Iv · on April 16, 2021

Yes, there are all sort of weirdness in their rendering but that's what you get in a research paper. Put that in the hands of actual game designers and you will have incredible possibilities.

harias · on April 16, 2021

Sounds a lot like Google's GAN for fantastical creatures[0]. Labels to photorealism seems to be the core idea behind both

[0] https://ai.googleblog.com/2020/11/using-gans-to-create-fanta...

xupybd · on April 16, 2021

Wow that site and all it's auto play videos crashed my phone. I get that you want to show off the cool tech but please don't put that many videos on autoplay.

29athrowaway · on April 16, 2021

This is what happened in your head when you played Atari games as a kid.

zelon88 · on April 16, 2021

It looks impressive, but what exactly is the machine learning doing on the original to produce the result?

And wouldn't it be possible to simply take the original minecraft map as a height map and texture map and then regenerate a new world with the original world data and more advanced post processing? You could interpolate and randomize more detail into the scene than you started with.

erikpukinskis · on April 16, 2021

It’s not really adding any meaningful detail per se. where there’s a grass block it’s just rendering grass. All it is doing is projecting a stable image of “grass” (taken from a labeled image database) in that voxel.

Not to minimize the awesomeness of that... doing it stably in 3D while moving the camera is the point of this paper, and is amazing.

But it’s not really adding detail beyond “these are the kinds of pixels that grass has and the AI figured out we can put them in this arrangement without making things jumpy”

lainga · on April 16, 2021

It doesn't look like it can do structures, either.

debacle · on April 16, 2021

It also seems like your brain is doing most of the work here:

https://nvlabs.github.io/GANcraft/images/vox2img.png

The renderer seems to be adding some resolution, smoothing, and mipmapping. Shaders can do the same thing, and in real time.

Bjartr · on April 16, 2021

Modern shaders[1] do a lot, but you'll never mistake them for anything but minecraft. They don't quite get to where this paper is demonstrating.

[1] https://www.rockpapershotgun.com/best-minecraft-shaders

jchanimal · on April 16, 2021

Anyone doing GAN with

before: old streetview pre bikelanes

after: streetview with new bike lanes

profit: now you can see what any town would look like with complete streets. I call it Complete Street View.

Please do implement. Of course it would be dreamlike, this is a strength as you wouldn’t want the gan to make design recommendations, just a plausible feel.

Bjartr · on April 16, 2021

There's a probably a larger opportunity for an urban redevelopment sketching/brainstorming tool of sorts.

_haoa · on April 16, 2021

Civil engineers, architects and landscapers are going to have a field day with this.

iguessthislldo · on April 16, 2021

I like how if you look close enough, the outlines of trees and hills still are block-ish.

sly010 · on April 16, 2021

The image translators work for the construct program, but there is way too much information to decode the matrix. You get used to it. I don't even see the code, all I see is blonde, brunette, readhead...

waiseristy · on April 16, 2021

Wow! Amazing results, it's like marching cubes on an acid trip!

xixixao · on April 16, 2021

At the end of the paper it says that one frame takes 10secs to render. I wonder whether one day this method will be able to render in real time (say 30fps).

tachyonbeam · on April 16, 2021

Maybe, but OTOH we have very efficient 3D rendering technology that we understand very well. If I had more compute, I'd want to raytrace everything in real-time, but I wouldn't feel the need to bring neural networks into the mix. A better use case of machine learning is probably to help procedurally generate the data to be rendered. It would be really neat to be able to turn a few photos of a real-world location into high quality 3D meshes with no gaps, for example.

willis936 · on April 16, 2021

This creates a realistic topography from voxels. If you could do this in realtime then you could have a game where you have the flexibility of minecraft yet the appearance of a more photorealistic game. Imagine playing a game that looked like Control except everything is destructible and constructible. It's an exciting idea.

Tenoke · on April 16, 2021

Almost definitely. There's many ways to optimize further with software and hardware is only getting better. I wouldn't be surprised if it's doable today with some cheating, a bit more hardware and a lot of work on optimization.

rocky1138 · on April 16, 2021

Would having a higher resolution texture pack make for better results?

willis936 · on April 16, 2021

It doesn't look like it uses any texture information. I think it only takes in a list of block locations and spits out a scene. I would think you would have to train it with every different combination of textures.

throwaway10110 · on April 16, 2021

Heh i wonder if we ever get Minecraft 2.0, i get a good chuckle how it just barely runs on consoles and ultrapowerful PCs yet looks so "basic"

Myself and my son absolutely love it and spend months in this pandemic deep in minecraft worlds

lwansbrough · on April 16, 2021

That’s mostly a product of Minecraft’s technical choices. Modern computers can render axis aligned voxel grids on the order of 1,000,000^3 (think Minecraft scale but the blocks are sub millimeter) with PBR/GI in real time. Interactive would be another story I suppose.

shuringai · on April 16, 2021

The clickbait article makes people believe "you can create 3d models of ANY 2d object" but in reality, this would only come down to cars, cats and human faces. We have only so much datasets that are suitable for a GAN.

debacle · on April 16, 2021

The neural net part of this seems somewhat trivial and also misapplied. This is not a realtime renderer, and I would hazard that if you gave someone who knows GLSL the task, they would produce something far and away more compelling than this, that could probably render at <1 FPS.

https://nvlabs.github.io/GANcraft/images/vox2img.png

exporectomy · on April 16, 2021

They would produce something which won't generalize to other types of environment without another huge load of human labor.

Your complaint could be made about just about any new technology. It's usually worse than what came before it at first, but the value is in the potential to eventually become better than what came before it.