The implications are unclear to me. We already know how to prune models for inference. For example https://arxiv.org/abs/1710.01878, along with earlier work (and more recent work). There's also work showing that you can take advantage of the sparsity to achieve practical speed gains: https://arxiv.org/abs/1911.09723.
We can also train networks that are sparse from the beginning of training (without requiring any special knowledge of the solution): https://arxiv.org/abs/1911.11134. It remains to be shown that this can be done with a speed advantage.
In most cases, there is limited support for sparse operations.
"Sparse Networks from Scratch: Faster Training without Losing Performance" https://arxiv.org/abs/1907.04840 openly says "Currently, no GPU accelerated libraries that utilize sparse tensors exist, and as such we use masked weights to simulate
sparse neural networks.".
However, the situation seems to be very dynamic. See:
Without reading: I think that the importance is that before we had methods that could do that. Now we know that there is an algorithm that can do that. They proved that it is always possible, not in some subset of the networks.
In the other hand, it will trigger research on reducing the size of the networks. That is important, as most researchers don't have access to the computing power of Google and the like.
It's unclear this algorithm would be useful in practice. Training the weights will lead to a more accurate network for the same amount of work at inference time.
We can also train networks that are sparse from the beginning of training (without requiring any special knowledge of the solution): https://arxiv.org/abs/1911.11134. It remains to be shown that this can be done with a speed advantage.