Oh so training time is virtually irrelevant to us and if it weren’t we would hav...

Oh so training time is virtually irrelevant to us and if it weren’t we would have to be a lot more careful about optimization methods and possibly which language to use. We also cannot use NN for the models we build (we are restricted to LR, but LR has as much model capacity as you need as long as you include more and more feature interaction terms).

NN’s are universal function approximators. They can have arbitrary model capacity, and you can sort of control that with architecture decisions, loss function/regularization choices, and early stopping, but depending on the problem they can cause more problems than they solve. Usually you don’t really know if your NN will generalize well outside of your train/test distributions, so many times it’s better to have a simpler, more predictable model that you can control the behavior of. This is all from my personal experience and is completely moot when we’re talking about e.g. NLP or vision tasks or situations where you’re drowning in data. NNs are super interesting and powerful, don’t mean to suggest otherwise but the mantra is: “what is the right solution to my problem”. Lots of great advantages to NN’s as well (you can get them to do anything with enough cajoling and they can be solutions to major headaches you would usually have in e.g. kernel methods).