Is the embedded case really that interesting? Almost by definition, the embedded device will be receiving a tiny fraction of the data in the world that it may be concerned about. It seems unlikely to me that an embedded, power-constrained device is going to "deep learn" anything all that useful that wouldn't be better learned in something with more data and power available. But I do mean this as a question, if anybody's got a really cool use case in hand. (Please something more specific than text that boils down to "something something sensor network internet of things local conditions something".)
This isn't for training, is it? It's for using the results of training immediately (inferring something), without the need for a network round trip (as far as I understand it).
So, you might still send the request to the network to continue training the model, but by the time you do, your answer has already computed on the local machine for local consumption.
True. The cases where this would be useful (extremely low training data) there are machine learning methods that are much much better than deep learning, which is a glutton for data comparatively. E.g. handwriting recognition or signature recognition (where you would have only a few samples).
The large nets, yes. Partly because bigger nets tend to do better so we've built massive nets that run pretty well on big GPUs. But that's the state of things, we have impressive results but from huge neural nets that we really cannot run in realtime on a mobile device.
It's particularly bad if you want to run them on a small, battery powered device. Far more so if you only have one image to process, here they're seeing single image speedups of over 10x compared to GPUs (but slower than batched processing) and energy efficiencies about 1000+ times better than mobile GPUs (and far more compared to the beasts in your desktop).
But that's the state of things, we have impressive results but from huge neural nets that we really cannot run in realtime on a mobile device.
This is absolutely incorrect.
A mobile device can execute a pretrained model fine. See, for example [1][2][3]. Google's new TensorFlow NN system is explicitly designed to be able to run on mobile devices and comes with a pre-trained image classification NN that workd fine on mobile devices.
This doesn't mean that energy saving is unimportant on a mobile device of course. But there are very widely deployed production systems (eg all of Android) that use them now with no special GPU acceleration.
> A mobile device can execute a pretrained model fine.
Of course it can. The problem is with the size of the network you might want to run.
From your first link, which is about single character level image processing
> We needed to develop a very small neural net, and put severe limits on how much we tried to teach it—in essence, put an upper bound on the density of information it handles.
And from the voice training paper:
> While our server-based model has 50M parameters (k = 4,
nh = 2560, ni = 26 and no = 7969), to reduce the memory
and computation requirement for the embedded model, we experimented
with a variety of sizes and chose k = 6, nh = 512,
ni = 16 and no = 2000, or 2.7M parameters
AlexNet is, what, 60M+? VGG is pretty big too.
I guess I wasn't too clear. Yes, there are good results from nets that we can run on mobile devices in realtime. We do, however, want to run significantly larger nets.
[1] is a Google-authored demo running Inception-v3[2] (about twice as accurate as VGG) on Android phones.
I don't know how many parameters Inception v3 has, but I know Google considers it more efficient than VGG ("Although our network is 42 layers deep, our computation cost is only about 2.5 higher than that of GoogLeNet and it is still much more efficient than VGGNet".)
Yes, being able to run big networks is great. But ultimately it's what you do with it, and a 3% error rate on ImageNet is a pretty compelling argument that size isn't the only factor.
Well, it depends on the neural network you want to use. At computer vision/image processing conferences I find that it is much more common for people to use preexisting DNNs and plug them into their systems rather than train their own (why would you waste weeks or months on that?). The good pretrained DNNs available today tend to be huge, requiring a large amount of video RAM. I've talked to people writing realtime image processing software (GPU acceleration required) intended for handheld devices which can't yet actually run on the device because they don't have a suitable DNN which will fit in VRAM.
Performing inference with deep learning modes is computationally costly as well. Its a reason why NVidia has two streams of business, Tesla / Titan / upcoming Volta GPUs to train models and Tegra boards to enable near real time inference.