Procedural generation is something the demoscene has essentially specialised in for several decades, but I see no mention of it in the article. The demoscene has also done so using several orders of magnitude less computing resources than mentioned here, so I think everyone else has a lot to learn from them.
This is an academic paper not an article. Generously 0.01% of demos generate landscape scenes, photorealistic or not.. so it is clear why the paper would not rely upon them for inspiration.
What they are doing has clear overlap with what we've seen in the demoscene. On a technical point of view, they generate all meshes and textures from formulas, using the same kinds of algorithms.
See for example some 64kB demos that have landscapes / nature:
Of course, it's not exactly the same, the demoscene tends to focus on a few carefully crafted scenes and do everything in real-time (with up to 30s of precalc), but I think there's a lot of similarity.
EDIT: To be clear, I was not being flippant, I was assuming that there is some nuance to how "assets" is used here that I'm not aware of since I'm not a 3D artist which would make this work novel compared to the work in the demoscene. Beyond being a general purpose asset generator, which is indeed very impressive but not what is being argued here.
What is being argued here? I don’t see where this turned into a competition.
It’s a paper presenting a system of new algorithms that can generate geometry, textures and lighting with extreme photorealism. It seems to do that very well. It’s not a study of the procedural 3D graphics evolution, pretending to be an entirely new technique or the first to do it.
It's not a competition, except it sort of is: claiming to be the first as well as acknowledging previous is serious business in academia, and in an indirect way also an indicator of how much people did their homework or how willing they are to play nice with others.
So if they don't view the demoscene as relevant previous work to their paper then I want to understand why they think their work falls under a different kind of procgen.
You’re comparing it to what is probably the most realistic game since the PT Silent Hill demo came out, and heavily based on photogrammetry. The paper also focuses on natural textures, not built up environments, so it’s a pointless comparison in multiple ways.
Again, not a contest. The paper is exploring new techniques for generating content. Why all the negativity?
The negativity is because the claim is strong "Infinite *Photorealistic* Worlds Using Procedural Generation" while the approach is probably a dead end and does not improve performance in any of the relevant directions.
No new ground is broken here in rendering techniques, nor in procedural generation, nor in actual content. AI is generating actually photorealistic content today, AI-adjacent techniques such as NERFs as well as traditional but sophisticated rendering like UE5 are the leading edge of rendering. If you're going to make a strong claim, you should deliver something.
Again, isn’t UE5’s foliage generator 1) using masks instead of geometry by default and 2) using quixel/megascan imagery? Doesn’t seem like a useful comparison.
I honestly don’t know what you’re reacting to, what strong claims? “Infinite” is a given for procedural assets, and is highlighted in comparison to the existing partially asset-based generators (they go over existing software on page three), and the output certainly qualifies as photorealistic. There are no claims in the paper about being the first to do anything.
I'm not sure how you missed it, it's right there in the summary: "Infinigen is entirely procedural: every asset, from shape to texture, is generated from scratch via randomized mathematical rules, using no external source..."
This is not a critique of what they achieved, but that is not an innovation over existing procgen techniques. Generating assets from maths is basically what the demoscene has been all about for decades.
The sheer scale of what this does, how general it seems to be (instead of a single special-purpose animation like in the demoscene), as well as the fact that the output is structured and labeled assets I would consider novel, and very impressive.
"Ours is entirely procedural" == ""Infinigen is entirely procedural"
"relying on no external assets" == "every asset, from shape to texture, is generated from scratch via randomized mathematical rules, using no external source"
If that doesn't make it make it clear, could you elaborate on the part that doesn't click and I'll try and explain further.
I ran this for most of today in the background and I have some thoughts:
The quality is good and it's giving you all of the maps (as well as the .blends!). It seems great for its stated goal of generating ground truth for training.
However, it's very slow/CPU bound (go get lunch) so probably doesn't make sense for applications with users behind the computer in the current state.
Additionally, the .blend files are so unoptimized that you can't even edit them on a laptop with texturing on. The larger generations will OOM a single run on a reasonably beefy server. To be fair, these warnings are in the documentation.
With some optimization (of the output) you could probably do some cool things with the resulting assets, but I would agree with the authors the best use case is where you need a full image set (diffuse, depth, segmentation) for training, where you can run this for a week on a cluster.
To hype this up as No Man's Sky is a stretch (NMS is a marvel in its own right, but has a completely different set of tradeoffs).
EDIT: Although there are configuration files you can use to create your own "biomes", there is no easy way to control this with an LLM. Maybe you might be able to hack GPT-4 functions to get the right format for it to be accepted, but I wouldn't expect great results from that technique.
That doesn't track in my experience, but that depends heavily on how you define typical. Just going off total servers installed, what I see getting racked are typically 1+TB RAM, and anything less would be seen as low density (i.e. cost inefficient). We've got a whole batch in the 512GB range that are coming up on EOL. Dual socket is definitely less common, but not rare either.
In my corner of academia, 128gb is by far the most common RAM per node on something billed as a compute cluster (on random desktops and laptops it’s of course much lower than 128gb). I have seen a few 1tb+ nodes but they are rare.
I know nothing about server hardware but I'm curious how that works.
I have a decent PC (AMD 3990X 64-Core Processor with 256 GB of RAM), I'd have installed better/more components but that seemed to be the best you could do on the consumer market a few years ago when I was building it.
Are they using the same RAM I'm using with a different motherboard that just supports more of it? Or are they using different components entirely?
Apologies for what I'm sure is a very basic question but it would be interesting to learn about.
It's the same RAM chips (though error tolerance features are prioritized over pure speed). You would just need a server motherboard to support that many sockets, and a server chassis to support that motherboard, and a rack to support the cooling needs of that chassis.
Here's what a lowly 256GB server looks like. For a TB just imagine even more sticks:
Typical Intel offering is 1.5-2TB per socket. Socket scales up to 8 (though the price increase 2->4 is very steep). Memory itself is registered ECC DIMMs (which is even lower cost than consumer DIMMs/unbuffered ECCs), but to get to 1.5TB density you need low-rank (LRDIMM) modules, which gives x2 capacity but at a higher price.
It’s random, sorry. But I looked at it briefly and I saw jargon that I had difficulty picking up. What’s baseline is needed to operate that stuff with meaningful control. ( I’m confident any dev can run it and have fun )
None of this looks "photorealistic". The creatures look hilarious and the paper is not well written either.
"Each part generator is either a transpiled node-graph,
or a non-uniform rational basis spline (NURBS). NURBS
parameter-space is high-dimensional, so we randomize
NURBS parameters under a factorization inspired by lofting,
composed of deviations from a center curve. To tune the
random distribution, we modelled 30 example heads and
bodies, and ensured that our distribution supports them."
This strikes me as a fairly random approach. No wonder why those creatures look the way they do. I fail to see why this is worth a scientific paper as it appears to be no more than a student project with a number of contributors across different fields. Building a (somewhat) procedually based asset library has been done countless times before by game dev studios big and small.
Spore, No Man’s Sky, Elite Dangerous all do galaxy generation to good effect. Elite is probably the most realistic. No Man’s Sky has creatures like Spore.
They Python part just generates Blender procgen specifications (compute graphs), that is probably relatively fast. But apparently the output graphs are huge and unoptimized, so generating the geometry takes a lot of resources (CPU bound). Rendering might also take a while if they don't have good ways to limit detail (something like Nanite would come in very handy here).
With Nanite, if you have a sidewalk with occasional cracks, and all the cracks are the same, those get combined into a single submesh. This is recursive; if you have multiple instances of that sidewalk, those get combined at a higher level. It's a form of compression for mesh data with redundancy.
This only works if the data contains such redundancy. That depends on how it's generated. Not clear if this type of generator creates such redundancy. Something using random processes for content generation won't do that.
Nanite is optimized for large areas of content generated by a team of people working together, as in a major game project. There's a lot of asset reuse.
Nanite exploits that. It's not a good fit for randomly generated assets, or for assets generated by a large community, as in a serious metaverse. The redundancy isn't there to be exploited.
You seem to know more about it than I do. But isn't Nanite also about automatic and continuous LOD adjustment? A big selling point is to be able to drop-in extremely detailed unoptimized meshes made by artists, and have it perform well by dynamically reducing the polygon count based on the screen real-state they take up. I guess you could call this compression too, but it is lossy compression (automatic mesh downscaling) and doesn't depend on redundancy that much.
Many of these artist-made high-fidelity meshes make heavy use of Z-brush style sculpting and procedural materials, which have similar characteristics to these randomly generated assets.
You are correct, and the parent comment is confused. Nanite does support faster instancing but its unique advantage is being a fast software renderer for assets with very large polygon counts.
I worked on a thing many years ago that involved machine learning, which usually produced reasonable results but all users hated it nonetheless, because machine learning made it completely opaque. The correct predictions it made were mostly acceptable, but the incorrect predictions it made were hilariously bad, and in both cases nobody could explain why it generated those outputs.
Eventually we concluded that machine learning wasn't a good fit for our problem, and our users were very keen to maintain that conclusion.
I'm thinking of a very complex logistics system I wrote, that had to trace millions of possible paths to optimize a route. Even when the range of choices is too extensive to present to the user directly, and you need to resort to a list of best choices, it's indispensable to show somehow how the logic was arrived at and present ways of disabling portions of what went into the deductive process. That's something machine learning simply isn't geared towards, because the reasoning doesn't rest on reproducible sets of hierarchical rules.
Here's a memorable intro, from 2009: https://news.ycombinator.com/item?id=31636482
Earlier than that, another famous 64k from 2000:
http://www.theproduct.de/index.html