Hacker News new | past | comments | ask | show | jobs | submit login
Infinite Photorealistic Worlds Using Procedural Generation (arxiv.org)
306 points by cpeterso on June 18, 2023 | hide | past | favorite | 76 comments



Procedural generation is something the demoscene has essentially specialised in for several decades, but I see no mention of it in the article. The demoscene has also done so using several orders of magnitude less computing resources than mentioned here, so I think everyone else has a lot to learn from them.

Here's a memorable intro, from 2009: https://news.ycombinator.com/item?id=31636482

Earlier than that, another famous 64k from 2000:

http://www.theproduct.de/index.html


This is an academic paper not an article. Generously 0.01% of demos generate landscape scenes, photorealistic or not.. so it is clear why the paper would not rely upon them for inspiration.


What they are doing has clear overlap with what we've seen in the demoscene. On a technical point of view, they generate all meshes and textures from formulas, using the same kinds of algorithms.

See for example some 64kB demos that have landscapes / nature:

- Paradise by Rgba, 2004 https://www.pouet.net/prod.php?which=12821

- Gaia Machina by Approximate, 2012 (https://www.pouet.net/prod.php?which=59107)

- Turtles all the way down by Brain Control, 2013 (https://www.pouet.net/prod.php?which=61204)

Of course, it's not exactly the same, the demoscene tends to focus on a few carefully crafted scenes and do everything in real-time (with up to 30s of precalc), but I think there's a lot of similarity.


Procedural generation in computer graphics starts in the 70s.

The demo scene was already standing on the shoulders of giants.


Many hours wasted playing Rescue on Fractalus, Koronis Rift, and The Eidolon.


The novelty in this paper is generating photorealistic textures and scenes without using any assets.


Could you define "assets", please? What would you consider the assets in these videos by Inigo Quillez, for example?

https://www.youtube.com/watch?v=BFld4EBO2RE

https://www.youtube.com/watch?v=8--5LwHRhjk

EDIT: To be clear, I was not being flippant, I was assuming that there is some nuance to how "assets" is used here that I'm not aware of since I'm not a 3D artist which would make this work novel compared to the work in the demoscene. Beyond being a general purpose asset generator, which is indeed very impressive but not what is being argued here.


The emphasis is on photorealistic.

What is being argued here? I don’t see where this turned into a competition.

It’s a paper presenting a system of new algorithms that can generate geometry, textures and lighting with extreme photorealism. It seems to do that very well. It’s not a study of the procedural 3D graphics evolution, pretending to be an entirely new technique or the first to do it.


It's not a competition, except it sort of is: claiming to be the first as well as acknowledging previous is serious business in academia, and in an indirect way also an indicator of how much people did their homework or how willing they are to play nice with others.

So if they don't view the demoscene as relevant previous work to their paper then I want to understand why they think their work falls under a different kind of procgen.


It’s not photorealistic though. It’s not even close to a modern standard of realism. Compare to an indie game https://m.youtube.com/watch?v=IK76q13Aqt0&pp=ygUIdW5yZWNvcmQ...


You’re comparing it to what is probably the most realistic game since the PT Silent Hill demo came out, and heavily based on photogrammetry. The paper also focuses on natural textures, not built up environments, so it’s a pointless comparison in multiple ways.

Again, not a contest. The paper is exploring new techniques for generating content. Why all the negativity?


The negativity is because the claim is strong "Infinite *Photorealistic* Worlds Using Procedural Generation" while the approach is probably a dead end and does not improve performance in any of the relevant directions.

No new ground is broken here in rendering techniques, nor in procedural generation, nor in actual content. AI is generating actually photorealistic content today, AI-adjacent techniques such as NERFs as well as traditional but sophisticated rendering like UE5 are the leading edge of rendering. If you're going to make a strong claim, you should deliver something.


Again, isn’t UE5’s foliage generator 1) using masks instead of geometry by default and 2) using quixel/megascan imagery? Doesn’t seem like a useful comparison.

I honestly don’t know what you’re reacting to, what strong claims? “Infinite” is a given for procedural assets, and is highlighted in comparison to the existing partially asset-based generators (they go over existing software on page three), and the output certainly qualifies as photorealistic. There are no claims in the paper about being the first to do anything.


Inigo has a bunch of photorealistic scenes here too:

https://iquilezles.org/demoscene/


The paper does not say that AFAICS "Ours is entirely procedural, relying on no external assets"


I'm not sure how you missed it, it's right there in the summary: "Infinigen is entirely procedural: every asset, from shape to texture, is generated from scratch via randomized mathematical rules, using no external source..."


This is not a critique of what they achieved, but that is not an innovation over existing procgen techniques. Generating assets from maths is basically what the demoscene has been all about for decades.

The sheer scale of what this does, how general it seems to be (instead of a single special-purpose animation like in the demoscene), as well as the fact that the output is structured and labeled assets I would consider novel, and very impressive.


"without using any assets" vs "relying on no external assets" but it's not my area so I may be misunderstanding.


Ok, to clear it up:

"Ours is entirely procedural" == ""Infinigen is entirely procedural"

"relying on no external assets" == "every asset, from shape to texture, is generated from scratch via randomized mathematical rules, using no external source"

If that doesn't make it make it clear, could you elaborate on the part that doesn't click and I'll try and explain further.


Don't worry, I'm out of my depth and I know it, I shouldn't have put my oar in :)


I ran this for most of today in the background and I have some thoughts:

The quality is good and it's giving you all of the maps (as well as the .blends!). It seems great for its stated goal of generating ground truth for training.

However, it's very slow/CPU bound (go get lunch) so probably doesn't make sense for applications with users behind the computer in the current state.

Additionally, the .blend files are so unoptimized that you can't even edit them on a laptop with texturing on. The larger generations will OOM a single run on a reasonably beefy server. To be fair, these warnings are in the documentation.

With some optimization (of the output) you could probably do some cool things with the resulting assets, but I would agree with the authors the best use case is where you need a full image set (diffuse, depth, segmentation) for training, where you can run this for a week on a cluster.

To hype this up as No Man's Sky is a stretch (NMS is a marvel in its own right, but has a completely different set of tradeoffs).

EDIT: Although there are configuration files you can use to create your own "biomes", there is no easy way to control this with an LLM. Maybe you might be able to hack GPT-4 functions to get the right format for it to be accepted, but I wouldn't expect great results from that technique.


> ... on a reasonably beefy server.

Out of curiosity, what kind of cpu / ram are you meaning here?

Asking because I have some spare hardware just sitting around, so am thinking... :)


A typical server would be 8xA100 with a dual CPU (128 cores total), 2TB of RAM. I doubt you have something like this sitting around.


That's only a "typical server" for some companies with specialised needs.

Lots of places have servers with 128-256GB of ram around though.


That doesn't track in my experience, but that depends heavily on how you define typical. Just going off total servers installed, what I see getting racked are typically 1+TB RAM, and anything less would be seen as low density (i.e. cost inefficient). We've got a whole batch in the 512GB range that are coming up on EOL. Dual socket is definitely less common, but not rare either.


In my corner of academia, 128gb is by far the most common RAM per node on something billed as a compute cluster (on random desktops and laptops it’s of course much lower than 128gb). I have seen a few 1tb+ nodes but they are rare.


I know nothing about server hardware but I'm curious how that works.

I have a decent PC (AMD 3990X 64-Core Processor with 256 GB of RAM), I'd have installed better/more components but that seemed to be the best you could do on the consumer market a few years ago when I was building it.

Are they using the same RAM I'm using with a different motherboard that just supports more of it? Or are they using different components entirely?

Apologies for what I'm sure is a very basic question but it would be interesting to learn about.


It's the same RAM chips (though error tolerance features are prioritized over pure speed). You would just need a server motherboard to support that many sockets, and a server chassis to support that motherboard, and a rack to support the cooling needs of that chassis.

Here's what a lowly 256GB server looks like. For a TB just imagine even more sticks:

https://i.ebayimg.com/images/g/dnIAAOSwcy1kFNqq/s-l1200.jpg


Typical Intel offering is 1.5-2TB per socket. Socket scales up to 8 (though the price increase 2->4 is very steep). Memory itself is registered ECC DIMMs (which is even lower cost than consumer DIMMs/unbuffered ECCs), but to get to 1.5TB density you need low-rank (LRDIMM) modules, which gives x2 capacity but at a higher price.


Interesting. The servers at places I work with are in the 128-256GB ram range.

The only real exception to that would be for database or reporting servers, which sometimes might have higher ram requirement (eg 384GB).

That's pretty much it though.


but there aren't many people can use these servers for amusement.


Proof of concepts don’t have to be optimized. That’s an exercise for the reader. ;-)



Demo video is very impressive. I thought it was yet another GAN, but, no. It's something "mathy". I'll need to read the paper for sure!


> Zero ai

I gave a standing ovation when that popped up in the video.


"Infinigen is free and open source".

What a gift. Gotta love it.


Now we just need to make a No Man’s Sky-style—but open source—universe with this.


I think no man's sky proved conclusively that we don't need more no man's sky. One was already too much no man's sky.


Well there's a second one coming soon with Starfield


What do you mean? The release didn’t go too well (mainly due to unrealistic expectations) but the game is still updated and has a healthy community.


As in the universe was too big or as in the game was no good?

I think one could make a case for the former, but the latter is subjective (I liked the game, although it is a bit shallow).


How about a planet that runs on a server and is part of a federated universe basically. Each server has a solar system to play with.

Then add spaceships so you can actually travel between them.

Godot runs in the browser these days, so that could work.


No Man's License


It’s random, sorry. But I looked at it briefly and I saw jargon that I had difficulty picking up. What’s baseline is needed to operate that stuff with meaningful control. ( I’m confident any dev can run it and have fun )



None of this looks "photorealistic". The creatures look hilarious and the paper is not well written either.

"Each part generator is either a transpiled node-graph, or a non-uniform rational basis spline (NURBS). NURBS parameter-space is high-dimensional, so we randomize NURBS parameters under a factorization inspired by lofting, composed of deviations from a center curve. To tune the random distribution, we modelled 30 example heads and bodies, and ensured that our distribution supports them."

This strikes me as a fairly random approach. No wonder why those creatures look the way they do. I fail to see why this is worth a scientific paper as it appears to be no more than a student project with a number of contributors across different fields. Building a (somewhat) procedually based asset library has been done countless times before by game dev studios big and small.


Spore, No Man’s Sky, Elite Dangerous all do galaxy generation to good effect. Elite is probably the most realistic. No Man’s Sky has creatures like Spore.


Space Engine[1] is another popular one

[1] https://spaceengine.org/


Complete opposite reaction for me here. Interesting to see how two people can see the same thing and think polar opposites.


This is great. Terrain generators have been around for decades, but this is a nice open-source one.

" The wall time to produce a pair of 1080p images is 3.5 hours."

Ouch. It's in Python, but still...


They Python part just generates Blender procgen specifications (compute graphs), that is probably relatively fast. But apparently the output graphs are huge and unoptimized, so generating the geometry takes a lot of resources (CPU bound). Rendering might also take a while if they don't have good ways to limit detail (something like Nanite would come in very handy here).


With Nanite, if you have a sidewalk with occasional cracks, and all the cracks are the same, those get combined into a single submesh. This is recursive; if you have multiple instances of that sidewalk, those get combined at a higher level. It's a form of compression for mesh data with redundancy.

This only works if the data contains such redundancy. That depends on how it's generated. Not clear if this type of generator creates such redundancy. Something using random processes for content generation won't do that.

Nanite is optimized for large areas of content generated by a team of people working together, as in a major game project. There's a lot of asset reuse. Nanite exploits that. It's not a good fit for randomly generated assets, or for assets generated by a large community, as in a serious metaverse. The redundancy isn't there to be exploited.


You seem to know more about it than I do. But isn't Nanite also about automatic and continuous LOD adjustment? A big selling point is to be able to drop-in extremely detailed unoptimized meshes made by artists, and have it perform well by dynamically reducing the polygon count based on the screen real-state they take up. I guess you could call this compression too, but it is lossy compression (automatic mesh downscaling) and doesn't depend on redundancy that much.

Many of these artist-made high-fidelity meshes make heavy use of Z-brush style sculpting and procedural materials, which have similar characteristics to these randomly generated assets.


You are correct, and the parent comment is confused. Nanite does support faster instancing but its unique advantage is being a fast software renderer for assets with very large polygon counts.


Yeah, it's not clear if the wall time needed is due to Python being slow or not. eg could it be redone in a compiled language for a better result

I'd kind of expect that most of the core stuff is being done in C++ or similar behind the scenes though. Maybe (haven't looked). :)


Considering that part of the code is in CUDA and part is in C/C++, there's already been some optimization.


"Zero AI" will likely become a selling point.


I worked on a thing many years ago that involved machine learning, which usually produced reasonable results but all users hated it nonetheless, because machine learning made it completely opaque. The correct predictions it made were mostly acceptable, but the incorrect predictions it made were hilariously bad, and in both cases nobody could explain why it generated those outputs.

Eventually we concluded that machine learning wasn't a good fit for our problem, and our users were very keen to maintain that conclusion.


I'm thinking of a very complex logistics system I wrote, that had to trace millions of possible paths to optimize a route. Even when the range of choices is too extensive to present to the user directly, and you need to resort to a list of best choices, it's indispensable to show somehow how the logic was arrived at and present ways of disabling portions of what went into the deductive process. That's something machine learning simply isn't geared towards, because the reasoning doesn't rest on reproducible sets of hierarchical rules.


Interesting - they've just changed the license from GPL to BSD: https://github.com/princeton-vl/infinigen/commit/258cf38e860...

This makes the repo much more useful to me.



Something feels odd about the fire animation. Water flowing also feels a bit odd but I am not able to say how.


The banks of the creek didn’t darken when water splashed on them.


it's an offline fluid simulation, it's a pretty identifiable look once you've seen a few


This is really impressive, even the trees look pretty good which appears to be tricky to do with procgen stuff.

A nice next step or addition would be to take the results and remesh them to lower poly models so it can be used in a game engine to walk around in.


Love the "Matrix" zoom-in on the green terminal text.


I can't help but find this so much more interesting and fun to use than anything AI-based


Reminds me of the Mac Aquarium SereneScreen screensaver.

That was around for decades.

https://m.youtube.com/watch?v=Ws1zpk9GWkk

Much, MUCH more limited, but pretty cool.


I'd love this generating images for a Don Woods' Adventure version. Or for Nethack/Slashem.


Think of a real world scale sandbox gaming with Pokémon elements...must be a crazy game.


Procedural generation, not transformers—truly infinite!

Of course AI input would work well with this...


The name’s reminiscent of Terragen, I wonder if the author is involved in any way?


Absolutely amazing. The holly grail of shaders.


The infinite virtual reality escape comes nearer


Can't wait for my vision pro


This is very nice. Thinking of playing with it




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: