Hacker News new | past | comments | ask | show | jobs | submit login

It should, AFAIK that was on the SM-level.

> Ampere's benefit is that it can deal with dense and sparse matrices differently. Its cores are twice as fast as Turing's for dense matrix and four times as quick for sparse matrix that have all the needless weights removed. The upshot, per SM, is dense processing at the same speed - it has half the cores, remember - and twice the overall throughput for sparse processing.

https://hexus.net/tech/reviews/graphics/145342-nvidia-geforc...




Yeah, but ResNet does not have sparse matrices, so how could it use them? Post ReLU activations may be sparse, but I don't think that helps when used with a non-sparse Conv2d.


I don't know if there are any white papers with hard details yet (if anyone knows of one, please share!), but nVidia's marketing material[0] for the Ampere architecture claims the following:

"Sparsity is possible in deep learning because the importance of individual weights evolves during the learning process, and by the end of network training, only a subset of weights have acquired a meaningful purpose in determining the learned output. The remaining weights are no longer needed.

Fine grained structured sparsity imposes a constraint on the allowed sparsity pattern, making it more efficient for hardware to do the necessary alignment of input operands. Because deep learning networks are able to adapt weights during the training process based on training feedback, NVIDIA engineers have found in general that the structure constraint does not impact the accuracy of the trained network for inferencing. This enables inferencing acceleration with sparsity."

So the idea seems to be that at the end of training, there's fine tuning that can be done to figure out which weights can be zeroed out without significantly impacting prediction accuracy, and then you can accelerate inferences with sparse matrix multiplication. They consider training acceleration with sparse matrices an "active research area."

I could see it being nice for the sake of running large language models on consumer, or really cool for the few edge computing applications that can actually demand and power conventional GPUs (e.g. self-driving cars.) It's probably not a great boon to the researcher who wants to reduce their iteration timeline though.

[0] https://developer.nvidia.com/blog/nvidia-ampere-architecture...


Ah, I should have paid more attention to the question. Read it as "are they enabled?" My bad. :(




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: