Even the most derivative of singer songwriters tend to use their own voices rather than a weighted average of the voices of other singers in their genre...
Using the skills they presumably developed listening to and copying other singers and studying music, with an instrument built from roughly the same instructions as everyone else.
That a person can't sound like the weighted average is human limitation (although with modern pop people do get quite close!), not because new singers aren't trying to. That of course adds variation that we appreciate, but doesn't change the underlying similarity in how acquired skill is mimicry of those who acquired it before us - with very rare exceptions.
No, sounding like the genre-weighted average of Spotify simply isn't what singers try to do. They haven't listened to that much music, they have actual preferences, they have natural qualities to their voice which they're complimented on or asked to mask, and they're trying to hit notes based on their aural perception of harmony and related theoretical principles not based on the waveforms of other songs involving singer songwriters. The fact that they literally couldn't do what NNs do even if they wanted to also seems quite relevant to the fact that they don't do what NNs do.
What next, are we going to argue that what programmers creating new programs are really trying to do is generate a prompt-weighted average of the bytecode of every program they've ever downloaded, and all that business analysis and functional spec and use of high level programming languages and expressed preferences for coding standards is irrelevant?
That's the physical limitations I referred to, which isn't something humans tend to be happy about but can sometimes end up being a differentiating benefit.
> What next, are we going to argue that what programmers creating new programs are really trying to do is generate a prompt-weighted average of the bytecode of every program they've ever downloaded
That's a horrible strawman. Do you as programmer often read and write bytecode directly?
I'm beginning to assume you're an LLM, because I'm not convinced a human would honestly try to argue that their emotional reaction to their favourite songs is basically equivalent to flipping the values of some bits to ensure that they generate music more similarly to them.
> That's a horrible strawman. Do you as programmer often read and write bytecode directly?
As an improvising guitarist (even a very mediocre one) my creative process is even further removed from an LLM parsing and generating sound files directly....
I suspect the issue here is just the assumption that LLMs are "just flipping some bits", while simultaneously putting humanity on some unreachable pedestal.
We are all nothing but a horde of molecular machines. Your "you" is just individual neurons reacting to input in accordance to their current chemical state and active connections. All your experiences, unique personality treats, and creativity you add to the process is solely the result of the current state of your network of neurons in response to a particular input.
But while an LLM is trained once and then has its state fixed in place regardless of input, we "train" continously, and while an LLM might have experience of an inhuman corpus for a certain subject, we have many "irrelevant" experiences to mix things up.
Your "prompt" is also messy, including the current sound of your own heartbeat, the remaining taste in your mouth from your last meal, the feeling of a breeze through your hair as it tickles your neck, while the LLM has just one, maybe two half-assed sentences. This mix of messy experiences and noisy input fuels "creativity". You don't think "I need to copy XYZ", but neither does the AI. You both just react.
In some regards our chaos is better, in others it is worse. But while the machinery of an LLM still does not even remotely approach a brain, we should not forget that we are nothing but more a cluster of small machines, assembled from roughly 750 MB worth of blueprint.