Maybe using Fourier/wavelet/whatsoever transform would be the way to go, just like in digital watermarking techniques. Both high capacity and robustness would seem easier to achieve.
Interesting. I guess we can experiment with how many bits can be compressed to a video frame. Is there a guarantee that YT doesn't change the frame rate?
Another idea is YT as code repo: essentially one makes a movie that shows code files. On retrieval, OCR can be applied to transform the movie back to code in text.
I know that YouTube used to support only up to 30fps video, but IIRC they now support 60fps. This became a thing because people making videos of themselves playing the newest generation of consoles (PS4 / XBOne) want to upload in high quality, and the consoles now do 1080p at 60fps.
If this is a concern for people (recording at 60fps to upload for 60fps), I doubt that Google would downgrade the framerate except for maybe the lower quality versions of the video (does 60fps really matter for 240p video?).