Not really relevant to this. That deals with images, where you can perturb a huge number of pixels slightly to exploit weird edge cases. Even linear classifiers break on it, e.g. https://karpathy.github.io/2015/03/30/breaking-convnets/
I'm pretty sure that if you move just a little the image, the convnet is no longer fooled. So this kind of exploit works only for completely stationary images, and it is very to overcome.
Humans are able to recognize these as optical illusions. (otherwise we wouldn't be calling them "illusions")
Further, none of the optical illusions shown are so wildly "off" and "crazy" as the algorithmic goofs demonstrated in the paper referenced above.
They're obviously crazy to us. Our illusions might look obviously crazy to them.
And generally humans don't natively know when they're experiencing an optical illusion. They have to be taught it. And in either case, it's not the human vision system that learns the lesson, it's some other part of the brain that learns to discount the vision system's conclusions.
Further, these inputs were /optimized/ to confuse the particular classification software. If only we would be able to optimise optical illusions for a given human viewer...
I bet you could do it with sound, too. Imagine producing what would seem like abstract noise – unrelated to anything in the natural world – but with particular structures and sequences of tones optimised to produce particular emotive responses; pleasure, excitement, calm, energised dancing, romantic dancing...
http://arxiv.org/pdf/1412.1897v1.pdf%EF%BB%BF