Hacker News new | past | comments | ask | show | jobs | submit login

The implications are unclear to me. We already know how to prune models for inference. For example https://arxiv.org/abs/1710.01878, along with earlier work (and more recent work). There's also work showing that you can take advantage of the sparsity to achieve practical speed gains: https://arxiv.org/abs/1911.09723.

We can also train networks that are sparse from the beginning of training (without requiring any special knowledge of the solution): https://arxiv.org/abs/1911.11134. It remains to be shown that this can be done with a speed advantage.




In most cases, there is limited support for sparse operations. "Sparse Networks from Scratch: Faster Training without Losing Performance" https://arxiv.org/abs/1907.04840 openly says "Currently, no GPU accelerated libraries that utilize sparse tensors exist, and as such we use masked weights to simulate sparse neural networks.".

However, the situation seems to be very dynamic. See:

- https://github.com/StanfordVL/MinkowskiEngine (Minkowski Engine is an auto-diff convolutional neural network library for high-dimensional sparse tensors)

- https://github.com/rusty1s/pytorch_sparse (an efficient sparse tensor implementation for PyTorch; the official one is slower SciPy https://github.com/pytorch/pytorch/issues/16187; however, I failed to install it - it is not "pip install"-simple)

EDIT:

I checked it now and was able to install pytorch_sparse with one command. It is a dynamic field indeed.


OpenAI just announced it will port its sparse libraries to PyTorch, so exciting times ahead! You can read more here about this (OP here) : https://medium.com/huggingface/is-the-future-of-neural-netwo...


Without reading: I think that the importance is that before we had methods that could do that. Now we know that there is an algorithm that can do that. They proved that it is always possible, not in some subset of the networks.

In the other hand, it will trigger research on reducing the size of the networks. That is important, as most researchers don't have access to the computing power of Google and the like.


It's unclear this algorithm would be useful in practice. Training the weights will lead to a more accurate network for the same amount of work at inference time.


Searching for the optimal small network is just as much work as training a larger network. There's no free lunch.


I wonder, isn't the new CPU training with locality sensitive hashing similar to constructing something close to a winning lottery ticket?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: