My summary (from someone who is not in the field but likes backpropagation):
The core idea behind this type of approach ("parametric encoding") is that you learn a scene as some spatial data + a (small) neural network. For example, a 128^3 grid of data values and a 10k parameter model. In the forward pass you feed whatever data is at the voxel(s) in question to the network, and the backward pass updates both the network and the same voxel(s).
The innovation in this paper is in how the spatial data is represented. Prior work includes dense grids, multi-resolution grids and octrees to name some - but all of them are either GPU-unfriendly or waste parameters on empty space. They figured that they can just hash the coordinates and use them directly as an index into a data array (edit: A multi-resolution stack of data arrays - sorry for not getting this right initially), with hash collisions left to the network to figure out (it's gonna figure out whether there's a collision on fine layer through info from the coarser ones, I guess).
(Relatively) few parameters + GPU-friendly data structure = fast training. Tempted to try and implement this myself...
I think the key here is that e.g. surface information only grows at O(N²) rate whereas number of grid points scales as O(N³). The hash function approach means your arrays will be filled with detailed information densely, whereas sampling coarsely would still leave most of the array with "nothing here" information.
Your comment made me realize that I forgot to mention the multi-resolution aspect of their hash encoding (there are several data arrays corresponding to different resolutions - coarse ones are 1:1 indexed but finer ones have hash collisions for the network to deal with). It's in the title, but I should still include it.
Why? The point of research is to push the limits of what's possible, not to build something that runs on every single platform.
I find it remarkable that most recent deep learning papers release the source code needed to reproduce their result -- and even more remarkable that many papers, like this one, can be reproduced on hardware that a hobbyist can afford.
And if you'd like this to run on a CPU, you're welcome to port it. The code is open source after all.
The reason they run on a GPU isn’t spite. It’s because the work for neural net based ML is inherently dependent on vast amounts of independent floating point operations.
CPUs tend to have very few FPUs per core, so you max out a modern systems CPUs idealised throughput at maybe 40-80 concurrent streams. On top of that the FPUs on a CPU are generally require to perform fully compliant ieee754 arithmetic at at least 32bit of precision.
Modern GPUs can have that number of FPUs per hardware thread and then have a few hundred of those hardware threads. Each of those GPU FPUs are also faster as they can both elide some elements of ieee754, and operate at lower precision (fp16) to get even more performance.
So you could read the paper, and implement it on a CPU and the very best that you, or anyone, could do would be literal orders of magnitude slower than the GPU implementation.
That’s why you don’t see them doing it on a CPU, let alone in Python.
The reason the research is coming out of nvidia is because this kind of research is inherently GPU limited. So if it came out of AMD, Intel, Google, or Apple, it would be dependent on either GPU, or non-programmable NN specific hardware. If it came out of academia it would still be on a GPU, because none of this is remotely practical on a CPU.
Well we can shorten that list if you're able to write your models in tensorflow 1.15 to: Windows 10 and Python 3.6+. Microsoft has done something quite interesting with tensorflow-directml [0, 1]. A friend is training convolutional networks on a Ryzen 5 3500u ultrabook, at about the same speed my old notebook with a GeForce 940mx could. I'm tempted to test it on a 4600H when I have a bit of time, it could be interesting if the iGPU is able to access a large portion of the 24GB of RAM that system has.
Machine learning research often scales up to solve a new problem, and then scales down the solution until it's actually usable. Object detection, for example, is now fully usable on a phone CPU.
I'm sorry if you're doing GPU calculations then you want a powerful video card unless your research is on improving performance of algos on less powerful hardware. There are only so many hours in a day.
Everything is written in Python because early in on the process people realized that you are not doing anything special and are certainly not doing actual math: you are just wiring together libraries that you barely understand using a cookbook that someone else provided for you to generate results that you cannot explain so that an investor can tick a checkbox for a feature list that you never see. If you are just gluing together C/C++ libraries then there are worse languages that could have been selected, but once momentum gathered behind Python as the glue language it was hard to divert to another language (e.g. how hard the Julia folks are trying, and failing, to do just that...)
To be fair, almost every deep learning paper that comes out needs something like 10x GPU cloud nodes to run on.
The days where you could run anything significant on a single 1k graphics card are long gone.
This is, ironically, the first time that (I’m aware of) you could distill this Nerf stuff down into a size that runs on a single consumer GPU (RTX 2x or higher)
…so, some of your points are fair, but hey, at least these folk are trying to bring this down from “only usable by large corporations” to “runs on your desktop”.
I mean, it’s not perfect, but I think in this case you’re complaining about something abstract, when these folk are actually going in the right direction.
I like FOSS a lot. Normal programming languages have relatively small downloads and run on normal CPUs dating about 10 years back with almost no issue.
GPU workloads always want some odd driver that has a gigantic download, and they're constantly coming up with new reasons to force you to the newest APIs, which means you have to buy new chips that have the right architecture or firmware for the new APIs.
So I have to buy this co-processor, and then I can't even treat it like a black box that I send commands to, I need a gigabyte-scale SDK or something to issue the commands on my behalf.
I can't stand it. It's as if there was a tiny window when programming was simple, after I learned about FOSS, and before GPGPU caught on. As if the personal computer really will turn out to have been a fad.
Ok, GPGPU isn’t “general purpose” in the basic sense, it means “not just graphics”. No CPU is going to be able to get performance in NNs that matches that of a GPU. A CPU simply cannot do the work. The closest a “general” CPU gets to that kind of thing are the big vector machines like the old Crays or Itanium’s packet architecture. Programming for either of those architectures is non trivial, and for normal software those architectures are slower than normal CPUs.
Despite the trade offs those systems made, consumer GPUs ended up with better performance because a lot of the things and general CPU has to do interfere with performance of pure numerical computation.
For some additional context, when the original NeRF paper (https://arxiv.org/pdf/2003.08934.pdf) was published 2 years ago, it reportedly took at least 12 hours (depending on hardware used of course) to train on the scene with the bulldozer. This has now been reduced to about 5 seconds (!), with realtime rendering of the result.
The gigapixel example could be done with fourier features which takes about a few minutes to train (on colab-like resources). Definitely still a huge improvement though (and based on more clever hashing techniques than optimization).
Why not billions of triangles? Unreal is betting on Nanite because triangles have so many nice properties in addition to having the whole art pipeline already set up.
(I could not get the URL to load. Maybe HN hugged it)
Triangles have no volume and no diffraction occurs inside them as it does with Platonic solids. The idea is that real-time raytracing will allow complex variations and interactions of "Platonic dust particles" and the rays bouncing and refracting between and in them. It would be a more expressive "clay" for the AI to tinker with than triangles - the orientation/color/transparency changes of each solid will be able to elicit more visual effects than doing it with flat triangles.I got banned from Eleuther discord today The One#3740
Neural rendering? I doubt it. Check out deep learning super sampling though (DLSS) from NVIDIA, which has to be plumbed into the game itself to enable.
This is probably going to fight virtual geometry tech like Unreal's Nanite, which is still using triangles but using clever automated LoD and GPGPU rasterization so that rendering e.g. 20 million pixel-sized triangles is fast and looks just as good as rendering a trillion triangles. (normally very small or thin triangles are a pathological case for hardware rasterizers)
The core idea behind this type of approach ("parametric encoding") is that you learn a scene as some spatial data + a (small) neural network. For example, a 128^3 grid of data values and a 10k parameter model. In the forward pass you feed whatever data is at the voxel(s) in question to the network, and the backward pass updates both the network and the same voxel(s).
The innovation in this paper is in how the spatial data is represented. Prior work includes dense grids, multi-resolution grids and octrees to name some - but all of them are either GPU-unfriendly or waste parameters on empty space. They figured that they can just hash the coordinates and use them directly as an index into a data array (edit: A multi-resolution stack of data arrays - sorry for not getting this right initially), with hash collisions left to the network to figure out (it's gonna figure out whether there's a collision on fine layer through info from the coarser ones, I guess).
(Relatively) few parameters + GPU-friendly data structure = fast training. Tempted to try and implement this myself...