The proof is constructive, so actually you can. Just create a few neurons for th...

The proof is constructive, so actually you can. Just create a few neurons for the "tower function" and the correct output weight at every point you want to map.

If you mean learn functions efficiently with few parameters, or in ways that generalize well, the article makes no claims about that at all. There are two "no free lunch" theorems. One of which says that it's impossible to guarantee any method will generalize well on all problems, and the other that says it's impossible to guarantee any optimization method will work well on all problems. You just have to make strong assumptions like "my function can be modeled by a neural net" or "the error function is convex."

However neural networks are agnostic to the optimization algorithm you use to set their weights. There are many different ones.