As someone who has studied nonlinear nonconvex optimization, I don't think linear regression is the final word here. In the universe of optimization problems for curve-fitting, the linear case is only one case (albeit a very useful one).
In many cases though, it is often insufficient. The next step is piecewise linearity and then convexity. It is said that convexity is a way to weaken a linearity requirement while leaving a problem tractable.
Many real world systems are nonlinear (think physics models), and often nonconvex. You can approximate them using locally linear functions to be sure, but you lose a lot of fidelity in the process. Sometimes this is ok, sometimes this is not, so it depends on the final application.
It happens that linear regression is good enough for a lot of stuff out there, but there are many places where it doesn't work.
- yes, QM is linear in terms of states, vector spaces and operators acting on states (its just functional analysis [1])
- it’s not linear in terms of how probability distributions and states behave under the action of forces and multi-body interactions (see, eg correlated phenomena like phase transitions, magnetism, superconductivity etc)
That's not really correct either. Regression is but a small (and useful) part of statistics -- it's the part of statistics that most engineers use, but statistics goes beyond regression. For instance, sampling theory is not regression. Hypothesis testing is not regression. Causal inference is not regression.
Probability undergirds a lot of statistical methods.
This is not true. Nonlinear functions generally do not result from combinations of linear functions. It's the other way around -- combinations of linear basis functions are often used to approximate nonlinear functions, often in a spline sort of way.
However, this only works (practically) for mildly nonlinear functions. If you have exp, log, sin, cos functions for instance and you're modeling a range where the function is very nonlinear, the approximation errors often become problematic unless you use a really dense cluster of knots. For certain nonlinear systems of equations, this can blow up the size of your problem. It's a tradeoff between number of linear basis functions you use for the approximation, and the error you are willing to accept.