They're not "inherently trained to duplicate"; I think that's a bit of a disingenuous oversimplification. They're trained to learn abstract patterns in large datasets, and remix those patterns in response to a prompt.
"You train them by comparing the output to the original." To the best of my knowledge this isn't correct; can you expand or cite a reference?
They are trained to duplicate, we just hope they do so by abstracting patterns. Various techniques stack the deck to make it difficult to memorize everything but it still happens easily, especially for replicated knowledge.
"You train them by comparing the output to the original." ->
You train neural networks by producing output for known input, comparing the output with a cost-function to the expected output, and updating your system towards minimizing the cost, repeatedly, until it stops improving or you tire of waiting.
Cost functions must have a minimal value when the output matches exactly the expected to work mathematically. Engineering-wise you can possibly fudge things and they probably do so ... now.
I don't agree with your critiques. It isn't an oversimplification, published code literally works as stated.
I disagree with the statement "they are trained to duplicate" because "to" implies a purpose/intent which is incorrect. I.e. "they are trained with the purpose of duplication". This is I believe pretty uncontroversially false. We already have methods to duplicate data. They are trained with the purpose of learning abstract patterns is much more correct. One of the biggest _problems_ of training is duplication, aka over-fitting. To say it's the purpose is imo disengenious.
Ah I see what they meant by that statement. It is true that supervised learning operates on labelled input/output pairs, and that neural networks generally use gradient descent/back propogation. (Disclaimer: it's been a few years since I've done any of this myself so don't quite remember it that well, and the field has changed a lot). Note since the parameter space of the neural network is usually _significantly_ smaller than the training data set, a network will not tend to minimise that cost function near 0 for an individual sample since doing so will worsen the overall result. There is inherent "fudging", although near identical output can potentially happen. The statement here is more reasonable and similar to the training process than the first.
"You train them by comparing the output to the original." To the best of my knowledge this isn't correct; can you expand or cite a reference?