Hacker News new | past | comments | ask | show | jobs | submit login
Instant neural graphics primitives with a multiresolution hash encoding (nvlabs.github.io)
179 points by ath92 on Jan 16, 2022 | hide | past | favorite | 37 comments



My summary (from someone who is not in the field but likes backpropagation):

The core idea behind this type of approach ("parametric encoding") is that you learn a scene as some spatial data + a (small) neural network. For example, a 128^3 grid of data values and a 10k parameter model. In the forward pass you feed whatever data is at the voxel(s) in question to the network, and the backward pass updates both the network and the same voxel(s).

The innovation in this paper is in how the spatial data is represented. Prior work includes dense grids, multi-resolution grids and octrees to name some - but all of them are either GPU-unfriendly or waste parameters on empty space. They figured that they can just hash the coordinates and use them directly as an index into a data array (edit: A multi-resolution stack of data arrays - sorry for not getting this right initially), with hash collisions left to the network to figure out (it's gonna figure out whether there's a collision on fine layer through info from the coarser ones, I guess).

(Relatively) few parameters + GPU-friendly data structure = fast training. Tempted to try and implement this myself...


Isn't the effect of hashing the same as sampling more coarsely?


I think the key here is that e.g. surface information only grows at O(N²) rate whereas number of grid points scales as O(N³). The hash function approach means your arrays will be filled with detailed information densely, whereas sampling coarsely would still leave most of the array with "nothing here" information.

Your comment made me realize that I forgot to mention the multi-resolution aspect of their hash encoding (there are several data arrays corresponding to different resolutions - coarse ones are 1:1 indexed but finer ones have hash collisions for the network to deal with). It's in the title, but I should still include it.


If it's so fast, I'd like to see it working on a smaller scale on a CPU.

Every new deep learning paper that comes out, I'm disappointed that it needs...

- A $500-$1,000 GPU

- A huge proprietary NVidia driver

- Some odd language or language extensions, usually CUDA

- Python


Why? The point of research is to push the limits of what's possible, not to build something that runs on every single platform.

I find it remarkable that most recent deep learning papers release the source code needed to reproduce their result -- and even more remarkable that many papers, like this one, can be reproduced on hardware that a hobbyist can afford.

And if you'd like this to run on a CPU, you're welcome to port it. The code is open source after all.


The reason they run on a GPU isn’t spite. It’s because the work for neural net based ML is inherently dependent on vast amounts of independent floating point operations.

CPUs tend to have very few FPUs per core, so you max out a modern systems CPUs idealised throughput at maybe 40-80 concurrent streams. On top of that the FPUs on a CPU are generally require to perform fully compliant ieee754 arithmetic at at least 32bit of precision.

Modern GPUs can have that number of FPUs per hardware thread and then have a few hundred of those hardware threads. Each of those GPU FPUs are also faster as they can both elide some elements of ieee754, and operate at lower precision (fp16) to get even more performance.

So you could read the paper, and implement it on a CPU and the very best that you, or anyone, could do would be literal orders of magnitude slower than the GPU implementation.

That’s why you don’t see them doing it on a CPU, let alone in Python.


The reason it runs onGPU is because this research was literally done by NVIDIA!


Nvidia also makes CPUs.

The reason the research is coming out of nvidia is because this kind of research is inherently GPU limited. So if it came out of AMD, Intel, Google, or Apple, it would be dependent on either GPU, or non-programmable NN specific hardware. If it came out of academia it would still be on a GPU, because none of this is remotely practical on a CPU.


Well we can shorten that list if you're able to write your models in tensorflow 1.15 to: Windows 10 and Python 3.6+. Microsoft has done something quite interesting with tensorflow-directml [0, 1]. A friend is training convolutional networks on a Ryzen 5 3500u ultrabook, at about the same speed my old notebook with a GeForce 940mx could. I'm tempted to test it on a 4600H when I have a bit of time, it could be interesting if the iGPU is able to access a large portion of the 24GB of RAM that system has.

[0] https://pypi.org/project/tensorflow-directml/

[1] https://docs.microsoft.com/en-us/windows/ai/directml/gpu-ten...


Machine learning research often scales up to solve a new problem, and then scales down the solution until it's actually usable. Object detection, for example, is now fully usable on a phone CPU.


I'm sorry if you're doing GPU calculations then you want a powerful video card unless your research is on improving performance of algos on less powerful hardware. There are only so many hours in a day.


You can rent the GPU from Google Colab. (It's actually more like $15k than $1k, which makes you wonder why Colab is so cheap.)

Why everything's written in Python I couldn't tell you.


Everything is written in Python because early in on the process people realized that you are not doing anything special and are certainly not doing actual math: you are just wiring together libraries that you barely understand using a cookbook that someone else provided for you to generate results that you cannot explain so that an investor can tick a checkbox for a feature list that you never see. If you are just gluing together C/C++ libraries then there are worse languages that could have been selected, but once momentum gathered behind Python as the glue language it was hard to divert to another language (e.g. how hard the Julia folks are trying, and failing, to do just that...)


To be fair, almost every deep learning paper that comes out needs something like 10x GPU cloud nodes to run on.

The days where you could run anything significant on a single 1k graphics card are long gone.

This is, ironically, the first time that (I’m aware of) you could distill this Nerf stuff down into a size that runs on a single consumer GPU (RTX 2x or higher)

…so, some of your points are fair, but hey, at least these folk are trying to bring this down from “only usable by large corporations” to “runs on your desktop”.

I mean, it’s not perfect, but I think in this case you’re complaining about something abstract, when these folk are actually going in the right direction.


Your smartphone in 10 years will have an equivalent GPU


I haven't learned Android in the last 10 years, doomsday principle says I'm probably not gonna learn it in the next 10 either.


We'll hopefully have WebGPU by then


By this logic you would be disappointed by any space related discovery because it needs a multimillion dollar telescope...


Just curious. What is your objections wrt the last 3 points ?


I like FOSS a lot. Normal programming languages have relatively small downloads and run on normal CPUs dating about 10 years back with almost no issue.

GPU workloads always want some odd driver that has a gigantic download, and they're constantly coming up with new reasons to force you to the newest APIs, which means you have to buy new chips that have the right architecture or firmware for the new APIs.

So I have to buy this co-processor, and then I can't even treat it like a black box that I send commands to, I need a gigabyte-scale SDK or something to issue the commands on my behalf.

I can't stand it. It's as if there was a tiny window when programming was simple, after I learned about FOSS, and before GPGPU caught on. As if the personal computer really will turn out to have been a fad.


Ok, GPGPU isn’t “general purpose” in the basic sense, it means “not just graphics”. No CPU is going to be able to get performance in NNs that matches that of a GPU. A CPU simply cannot do the work. The closest a “general” CPU gets to that kind of thing are the big vector machines like the old Crays or Itanium’s packet architecture. Programming for either of those architectures is non trivial, and for normal software those architectures are slower than normal CPUs.

Despite the trade offs those systems made, consumer GPUs ended up with better performance because a lot of the things and general CPU has to do interfere with performance of pure numerical computation.


This research uses Nvidia products because it was released by Nvidia, who have hired approximately every graphics researcher out there.


you can get really really far with a colab pro+ account these days (just $50 per month)


For some additional context, when the original NeRF paper (https://arxiv.org/pdf/2003.08934.pdf) was published 2 years ago, it reportedly took at least 12 hours (depending on hardware used of course) to train on the scene with the bulldozer. This has now been reduced to about 5 seconds (!), with realtime rendering of the result.


The gigapixel example could be done with fourier features which takes about a few minutes to train (on colab-like resources). Definitely still a huge improvement though (and based on more clever hashing techniques than optimization).


Goodbye polygons, hello neural networks?


More like "run this low-quality polygon and raytracing renderer at 320x240 @20 fps, upscale to 4k120 with acceptable quality".


I think meshes and textures can be replaced by intelligently shifting, raytraced billions of platonic solids

http://zeroprecedent.com/platonic


Why not billions of triangles? Unreal is betting on Nanite because triangles have so many nice properties in addition to having the whole art pipeline already set up.

(I could not get the URL to load. Maybe HN hugged it)


Triangles have no volume and no diffraction occurs inside them as it does with Platonic solids. The idea is that real-time raytracing will allow complex variations and interactions of "Platonic dust particles" and the rays bouncing and refracting between and in them. It would be a more expressive "clay" for the AI to tinker with than triangles - the orientation/color/transparency changes of each solid will be able to elicit more visual effects than doing it with flat triangles.I got banned from Eleuther discord today The One#3740


This person just invented a very handwavey GAN


Effective representation changes everything

Like hashtables


I’ve seen the GTA demo.

Are there any commercial games currently doing this?


Neural rendering? I doubt it. Check out deep learning super sampling though (DLSS) from NVIDIA, which has to be plumbed into the game itself to enable.

https://www.nvidia.com/en-us/geforce/technologies/dlss/


Not sure yet.

This is probably going to fight virtual geometry tech like Unreal's Nanite, which is still using triangles but using clever automated LoD and GPGPU rasterization so that rendering e.g. 20 million pixel-sized triangles is fast and looks just as good as rendering a trillion triangles. (normally very small or thin triangles are a pathological case for hardware rasterizers)


Wow this is a game changer / landmark advancement


Magic, impressive. I have no word.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: