No idea what instruction set the Apple device uses but Google just announced alpha access to their Tensorflow Processing Unit: https://cloud.google.com/tpu/ on Google cloud
It's mainly moving memory around, matrix multiplication, convolution, and applying activation functions (sigmoid, tanh, relu, etc.). Very simple, high-level stuff. This has the handy side-effect of making timing very predictable, which makes the latency a lot more deterministic.