techniques that i've seen in the past are indistinguishable from noise unless you have the correct key. that is, they use the fact that a key is a psuedorandom bitstream and that audio streams often have psuedorandom noise so ciphertext is ideal for adding into the noise.
i think i presented this paper for a course journal club from two decades ago, a decade ago on the topic:
The whole issue seems to be dedicated to watermarking, but relevant to this discussion is also this article:
"The basic principle borrows from spread spectrum communications. It consists of addition of an encrypted, pseudo-noise signal to the video that is invisible, statistically unobtrusive, and robust against manipulations."
i think i presented this paper for a course journal club from two decades ago, a decade ago on the topic:
https://www.sciencedirect.com/science/article/abs/pii/S01651...