I think this sort of proves out that video is a fair bit further away than just text-based reasoning becoming mass market. When you use just text the mind is able to interpolate away and/or fill in some of the uncanny parts. With video, it's just not possible -- the uncanny valley is novel at first, but boring in the long run.
Video will have a much much higher bar to pass than text and I think it will take some time still.
>Video will have a much much higher bar to pass than text and I think it will take some time still.
I am not so sure about that. The real hurdle at this point seems to be getting something like Dall-E to produce multiple angles and poses of a scene while maintaining characteristics of the elements in the scene. At that point it is an interpolation problem. I admit I am not an expert on these, but these do not seem insurmountable. I would not be surprised if we see something that can make very convincing short videos in less than a year. At that point, since a "short video" is really a scene, we are basically at film stage.
> At that point, since a "short video" is really a scene, we are basically at film stage.
But silent films, no? Or where are things at as far as generated audio that syncs to the video? (sound effects, dialogue, music - each of which will have its own challenges)
I'm in the camp of "satisfying long-watch video is a long way off", but I don't think matching audio to video will be a big hurdle.
Conceptually -- if you're generating the video from a script, you can feed the script and video into a foley and voice generator that recognizes "door slammed" and inserts the kind of sound associated with a door slam and that recognizes face changes and generates speech audio keyed to match. Those tasks will be solved independently on the way to long-form video itself getting sorted because you can exercise them with non-AI video along the way.
I'd be shocked if somebody hasn't already made some good demos of the foley and lip sync stuff.
Video will have a much much higher bar to pass than text and I think it will take some time still.