Is the embedded case really that interesting? Almost by definition, the embedded...

jessep · on Jan 11, 2016

This isn't for training, is it? It's for using the results of training immediately (inferring something), without the need for a network round trip (as far as I understand it).

So, you might still send the request to the network to continue training the model, but by the time you do, your answer has already computed on the local machine for local consumption.

jerf · on Jan 11, 2016

Thank you, everybody, that has helped me understand. Obvious in hindsight but isn't that the way of these things.

hosh · on Jan 12, 2016

From what I understand, some of the vehicular deep-nets work like this right now. The results of training gets sent to the car as firmware updates.

cscurmudgeon · on Jan 11, 2016

True. The cases where this would be useful (extremely low training data) there are machine learning methods that are much much better than deep learning, which is a glutton for data comparatively. E.g. handwriting recognition or signature recognition (where you would have only a few samples).

http://www.sciencemag.org/content/350/6266/1332.abstract

IanCal · on Jan 11, 2016

I didn't get the impression this was for training, but for running a trained network.

This may lead to high quality speech recognition or image recognition right on the device, without requiring a beast of a GPU.

cscurmudgeon · on Jan 11, 2016

Ah ok, got it. But is running that slow that you need special hardware?

IanCal · on Jan 11, 2016

The large nets, yes. Partly because bigger nets tend to do better so we've built massive nets that run pretty well on big GPUs. But that's the state of things, we have impressive results but from huge neural nets that we really cannot run in realtime on a mobile device.

It's particularly bad if you want to run them on a small, battery powered device. Far more so if you only have one image to process, here they're seeing single image speedups of over 10x compared to GPUs (but slower than batched processing) and energy efficiencies about 1000+ times better than mobile GPUs (and far more compared to the beasts in your desktop).

nl · on Jan 11, 2016

But that's the state of things, we have impressive results but from huge neural nets that we really cannot run in realtime on a mobile device.

This is absolutely incorrect.

A mobile device can execute a pretrained model fine. See, for example [1][2][3]. Google's new TensorFlow NN system is explicitly designed to be able to run on mobile devices and comes with a pre-trained image classification NN that workd fine on mobile devices.

This doesn't mean that energy saving is unimportant on a mobile device of course. But there are very widely deployed production systems (eg all of Android) that use them now with no special GPU acceleration.

[1] http://googleresearch.blogspot.com.au/2015/07/how-google-tra...

[2] http://googleresearch.blogspot.com.au/2015/09/google-voice-s...

[3] http://static.googleusercontent.com/media/research.google.co...

IanCal · on Jan 12, 2016

> A mobile device can execute a pretrained model fine.

Of course it can. The problem is with the size of the network you might want to run.

From your first link, which is about single character level image processing

> We needed to develop a very small neural net, and put severe limits on how much we tried to teach it—in essence, put an upper bound on the density of information it handles.

And from the voice training paper:

> While our server-based model has 50M parameters (k = 4, nh = 2560, ni = 26 and no = 7969), to reduce the memory and computation requirement for the embedded model, we experimented with a variety of sizes and chose k = 6, nh = 512, ni = 16 and no = 2000, or 2.7M parameters

AlexNet is, what, 60M+? VGG is pretty big too.

I guess I wasn't too clear. Yes, there are good results from nets that we can run on mobile devices in realtime. We do, however, want to run significantly larger nets.

nl · on Jan 13, 2016

[1] is a Google-authored demo running Inception-v3[2] (about twice as accurate as VGG) on Android phones.

I don't know how many parameters Inception v3 has, but I know Google considers it more efficient than VGG ("Although our network is 42 layers deep, our computation cost is only about 2.5 higher than that of GoogLeNet and it is still much more efficient than VGGNet".)

Yes, being able to run big networks is great. But ultimately it's what you do with it, and a 3% error rate on ImageNet is a pretty compelling argument that size isn't the only factor.

[1] https://github.com/tensorflow/tensorflow/tree/master/tensorf...

[2] http://arxiv.org/abs/1512.00567

versteegen · on Jan 12, 2016

Well, it depends on the neural network you want to use. At computer vision/image processing conferences I find that it is much more common for people to use preexisting DNNs and plug them into their systems rather than train their own (why would you waste weeks or months on that?). The good pretrained DNNs available today tend to be huge, requiring a large amount of video RAM. I've talked to people writing realtime image processing software (GPU acceleration required) intended for handheld devices which can't yet actually run on the device because they don't have a suitable DNN which will fit in VRAM.

nl · on Jan 13, 2016

3.5% top-5 error rate on ImageNet running on Android. Hard to beat that (unless you are Microsoft Research Asia).

https://github.com/tensorflow/tensorflow/tree/master/tensorf...

dharma1 · on Jan 12, 2016

mxnet has some optimised models for mobile devices - http://mxnet.readthedocs.org/en/latest/tutorial/smart_device...

dharma1 · on Jan 12, 2016

Its very difficult or near impossible to do image or speech recognition realtime on mobile CPUs/GPUs. Especially with large nets

You can do limited tasks but not the kind of things we can do on desktop

secondtimeuse · on Jan 11, 2016

Performing inference with deep learning modes is computationally costly as well. Its a reason why NVidia has two streams of business, Tesla / Titan / upcoming Volta GPUs to train models and Tegra boards to enable near real time inference.