The key result is not the introduction of ReLU, this is a misdirection. The key result is the outstanding performance on a general image data set by Alexnet. If the predecessors did all of the work, why was Hinton's lab the first to produce these results.
ReLU is an absurdly simple gate. The question revolved around its effectiveness, which was proven by Hintons lab.
The key result is the outstanding performance on imagenet. If Schmidhuber was the actual pioneer, why wasn't he able produce the same results before Hinton?
NN were known to work for hand writing recognition since the 90s (including papers by Hinton). Dannet being able to do it for Chinese characters in 2010s is unremarkable.
> The key result is not the introduction of ReLU, this is a misdirection. The key result is the outstanding performance on a general image data set by Alexnet. If the predecessors did all of the work, why was Hinton's lab the first to produce these results.
When Fukushima published ReLUs in 1969 and CNNs in 1979, there were neither decent computers nor competitions. No excuse for not citing him.
> ReLU is an absurdly simple gate. The question revolved around its effectiveness, which was proven by Hintons lab.
Many good things are simple. They should have cited the creator, no matter how much they profited later from faster computers or novel datasets or the like.
> The key result is the outstanding performance on imagenet. If Schmidhuber was the actual pioneer, why wasn't he able produce the same results before Hinton?
Did his team ever participate in imagenet? Apparently not. He writes about DanNet: "For a while, it enjoyed a monopoly. From 2011 to 2012 it won every contest it entered, winning four of them in a row (15 May 2011, 6 Aug 2011, 1 Mar 2012, 10 Sep 2012)"
> NN were known to work for hand writing recognition since the 90s (including papers by Hinton). Dannet being able to do it for Chinese characters in 2010s is unremarkable.
The remarkable thing is that "DanNet was the first pure deep CNN to win computer vision contests." Before DanNet, other methods won the competitions. DanNet changed that.
However, the CNN pioneer was Fukushima who introduced the CNN architecture and ReLUs. Hinton did not cite him.
> You are well aware that not citing an earlier paper with different implementation and results is not plagiarism. There is absolutely no evidence of plagiarism anywhere.
So what exactly constitutes plagiarism? It's not about good or bad faith, it's about checking who did it first. If you are using building blocks from previous papers, you must cite them. Schmidhuber cites the difference between unintentional [PLAG1] and intentional plagiarism [FAKE2]:
[PLAG1] Oxford's guidance to types of plagiarism (2021). Quote: "Plagiarism may be intentional or reckless, or unintentional."
[FAKE2] L. Stenflo. Intelligent plagiarists are the most dangerous. Nature, vol. 427, p. 777 (Feb 2004). Quote: "What is worse, in my opinion, ..., are cases where scientists rewrite previous findings in different words, purposely hiding the sources of their ideas, and then during subsequent years forcefully claim that they have discovered new phenomena."
More quotes: "If one "re-invents" something that was already known, and only becomes aware of it later, one must at least clarify it later, and correctly give credit in follow-up papers and presentations." ... "And the authors did not cite the prior art - not even in later surveys."
This is crucial. Even later they did not cite the original sources.
> Following up on your logic is absurd, because I can conveniently state that back prop is just the chain rule in differentiation by Newton and everyone else has plagiarized from him.
The paper apparently both anticipated and corrected your claim (it wasn't Newton): "Some claim that "backpropagation is just the chain rule of Leibniz (1676) & L'Hopital (1696)." No, it is the efficient way of applying the chain rule to big networks with differentiable nodes (there are also many inefficient ways of doing this). It was not published until 1970.[BP1]"
[BP1] S. Linnainmaa. The representation of the cumulative rounding error of an algorithm as a Taylor expansion of the local rounding errors. Master's Thesis (in Finnish), Univ. Helsinki, 1970. See chapters 6-7 and FORTRAN code on pages 58-60. PDF. See also BIT 16, 146-160, 1976. Link. The first publication on "modern" backpropagation, also known as the reverse mode of automatic differentiation.
> And ReLU was plagiarized by Fukushima from neuroscience researchers.
Really? Can you prove this? Do you have a reference?
ReLU is an absurdly simple gate. The question revolved around its effectiveness, which was proven by Hintons lab.
The key result is the outstanding performance on imagenet. If Schmidhuber was the actual pioneer, why wasn't he able produce the same results before Hinton?
NN were known to work for hand writing recognition since the 90s (including papers by Hinton). Dannet being able to do it for Chinese characters in 2010s is unremarkable.