It's an interesting conversation but really weakened by failing to take on the g...

It's an interesting conversation but really weakened by failing to take on the generalization problem head on. This is something I see in a lot of discussions about deep nets on smaller data sets, whether transfer or not. The answer "it's built in" is particularly unsatisfying.

The plots shown certainly should raise the spectre of overtraining - and rather than handwaving about techniques to avoid it, it would be great to see a detailed discussion of how you convince yourself (i.e. with additional data) that you are reasonably generalizable. Deep learning techniques are no panacea here.