I have a feeling that too much knowledge might slow learning process as it's harder to spot/test observe steepest gradient. At least that's how it feels intuitively from human PoC. From computation that would be just little more computation but I guess would mean slower convergence also. Taking math as more extreme example it's hard to understand something complex unless you understand basic algebra.
Anyone knows if this might be true mathematically speaking? Does order of data matters?
Anyone knows if this might be true mathematically speaking? Does order of data matters?