The question in the OP concerns approximation of the f(.) in:
y = f(x)
where the input "x" is d-dimensional (say).
Some problems (i.e., choice of "f") could be easy. Maybe f only depends on one element of x, for example. There would be no curse of dimensionality in this case. Same situation if "f" depends only on any fixed number of elements of "x".
I think this is roughly what you mean by "dimension of problem domain." Fix an "f", that defines a problem. And you're right, efficient solution of that fixed problem is important!
My remark (which is also in the last paragraphs of the Cybenko reference cited in the OP) had to do with increasingly difficult problems.
How to get such a sequence of problems? Suppose you take a simple function like
f(x) = exp(-0.5 * dot(x,x))
i.e., the Gaussian, and approximate it with a linear superposition of 1-d sigmoids (as a neural net would).
The question is, is there an explicit dependence on dimensionality, and is that dependence exponential in d?
And of course, for more general function classes (not just the single Gaussian function above), is there such a dependence? If it is not exponential, that would be astonishing, revolutionary.
The reason this setup ("vary the problem size") is interesting is that we would clearly like to use neural nets for increasingly higher-dimensional problems (e.g., learn appearance of 32x32 cats, then 64x64 cats, then ...).
I think I know the problem you're referring to (despite not having the right kind of specialist training to actually work in this field). I think that the fact that random or quasirandom probes are able to explore these high-dimensional spaces fairly effectively is evidence that the problem is somehow not exponential and therefore tractable in some fashion. Does that sound relevant?
One has to be careful to separate the functional-approximation problem (which is in the OP) from the statistical-model-selection problem (which is NOT in the OP).
The statistical-model-selection problem is easier in some ways (you don't have to approximate the function everywhere, just where you get data; if you didn't get data somewhere, you don't care what happens there).
It is harder in some ways (all that probabilistic folderol, plus, your data is noisy).
There are results that give rates for the functional-approximation problem. By rates, I mean, how good is the approximation versus number of hidden units. The best work I know of is by Andrew Barron, but that was in the mid/late 1990s. He's like a genius bulldozer, so his papers are tough going. You'll note that the Cybenko results, like in the OP, do not give rates. This is obviously a huge difference.
There are also results in the statistical-model-selection problem, of course. With rates. That's Vladimir Vapnik's big contribution, later carried on by others. One of the main results is that a model class has an intrinsic complexity (VC dimension, or other measures) and you only need to have order-of that many random probes to get (close to) the best model in the class. No matter what the dimensionality of the space where the data lives.
This is precisely the point you're making above.
My comment is getting too long, but: notice that the statistical-model-selection work only speaks about how to get (close to) the best model in the model class. You then have to make sure your model class is full enough to get close to the optimal rule (which is chosen by Nature, or whatever).
So to make a full theory, you need both pieces: how to choose a big-enough model class (functional approx.) and how to select a good-enough member of the model class (statistical model selection).
Typically there is a total error having one term for each of the two effects, and therefore a balancing act between making the model class big enough to approximate anything, vs. making it small so that you can easily select a good model. If you look at it sideways, this therefore looks like an optimization problem with a Lagrange multiplier penalizing model complexity. So it's quite elegant.
Some problems (i.e., choice of "f") could be easy. Maybe f only depends on one element of x, for example. There would be no curse of dimensionality in this case. Same situation if "f" depends only on any fixed number of elements of "x".
I think this is roughly what you mean by "dimension of problem domain." Fix an "f", that defines a problem. And you're right, efficient solution of that fixed problem is important!
My remark (which is also in the last paragraphs of the Cybenko reference cited in the OP) had to do with increasingly difficult problems.
How to get such a sequence of problems? Suppose you take a simple function like
i.e., the Gaussian, and approximate it with a linear superposition of 1-d sigmoids (as a neural net would).The question is, is there an explicit dependence on dimensionality, and is that dependence exponential in d?
And of course, for more general function classes (not just the single Gaussian function above), is there such a dependence? If it is not exponential, that would be astonishing, revolutionary.
The reason this setup ("vary the problem size") is interesting is that we would clearly like to use neural nets for increasingly higher-dimensional problems (e.g., learn appearance of 32x32 cats, then 64x64 cats, then ...).