It seems absolutely wild to me that there are cases where you can prune 90% of neurons and still get a functional network out the other side. I'm definitely interested in experimenting with some of these techniques myself soon!
"A man with an unusually tiny brain manages to live an entirely normal life despite his condition, which was caused by a fluid build-up in his skull.
Scans of the 44-year-old man's brain showed that a huge fluid-filled chamber called a ventricle took up most of the room in his skull, leaving little more than a thin sheet of actual brain tissue"
An article goes on to say that he was tested at an IQ of 75, below normal but not disabled; he was employed as a civil servant, and married with children.
If you zoom in until you can distinguish individual pixels, it's clear that they aren't actually solid colors. It's just that all the variability is at the pixel level and there are no large-scale structures. (There couldn't possibly be, because the ordering of rows and columns is arbitrary: you could shuffle them around and get an equivalent model. The most you could expect is some banding where values in the same row or column are related.)
Always thought of the opposite: DL model as a compression method.
Like taking a initial frame of a movie and training a DL to reproduce the following frames. And then transmitting the frame + parameters to reproduce the movie.
It is not video, but I have worked on image compression using neural networks on my master thesis.
It was an improvement over an existing method, so there is quite a bit of research going on in this area.
Briefly, the idea was to use an auto-encoder to transform the image, then quantize and encode the transform coefficients. So, you can actually “teach” the network to be resilient against quantization operation. Very similar to what the author describes.
Deep learning for video compression seems like it'll have a natural advantage just because video codecs have never been particularly large (maybe 100 KBs?) but everyone expects an ML model to be enormous. That's a lot of space to store common data in.
Any dimensionality reduction method (DL-based or otherwise) can be viewed as a form of compression. But, the end goal in data dimensionality reduction is usually quite different than with data compression.
Either way, this is much different than model compression/distillation, which is compression of the parametrized functional mapping itself. As a silly example, imagine you fit a 100-degree polynomial to noisy linear data using some proper regularization. You would find that can distill/compress your 100-degree polynomial model into a 1-degree model with comparable accuracy.
Yeah, I've thought something along those lines, maybe applied to a layer(s)/architectural element-wise basis. I'm sure someone's done it because it; just seems like another optimization problem.
Becuase unless you have a certain (high) level of sparsity, sparse formats are infact ineffective in storing. There are cases where sparse formats take more memory than storing dense tensors.