There are still some issues to be worked out, such as how the head shape distorts in some examples, but overall, this is very, very impressive work.
Back in the old days, Disney and other animation studios rotoscoped actors performances by drawing over the original image (by hand) each frame. It won't be long now before you just have an artist create a few examples of concept art, and just video the performances of the actors without much / any special setup other than maybe wearing a tracking suit.
How many years away are we from the point where you can just type in a script (or just put in some writing prompts and have an AI generate a script), describe the direction for the actors ("bend over and pick up the bucket", "exit stage left"), and then just churn out a movie?
If you pick up just a little bit of skill with animation, compositing, and such you're a one-person movie studio. Crazy times. This is not what I imagined the future was going to look like, but it will be entertaining.
How long until AI can measure my brain response and adjust the script in real time to make the film less predictable, more exciting, relaxing, humorous or engaging based on how I'm feeling?
That's going to get interesting. What makes movies 'great' is not just the movie itself, but the shared experience of others having watched that same movie too. That gets lost when you start doing movies completely customized to the viewer.
And once AI is capable of replacing the "social experience" of movie viewing, humanity might not be far away from having rendered itself obsolete.
What if AI could respond to everyone sharing the experience in the room?
A group of friends all contributing in some way. Afterwards discussing the parts of the film that they might have been responsible for willing into existence.
We've got AI assisted rotoscoping already and while it looks a bit janky at times its still a whole lot faster than doing it all by hand. https://www.youtube.com/watch?v=tq_KOmXyVDo
Interesting - is it still faster when the artist endeavors to correct the AI jankiness where it occurs? Or at the present day is it more trouble than it's worth?
What actors? There are a lot of writers who will jump at the opportunity to skip all the translation and re-interpretation by others and directly build the visuals as they go. Some of this will be extremely cringeworthy, but a lot of it will be astonishingly good.
I can see famous actors licensing their image, with multiple shadow actors doing their moves. Now you can make 50 movies per year instead of 5. You heard it here first...
Pretty sure that's already a thing, not for movies per sé but for ads where a celeb can license their avatar - 3d model and voice for use in ads. Especially useful when they want the celebs lips to move correctly when the ad is translated into non-native languages.
Yup. As an aspiring indie game dev i eagerly await large scale emotive text to speech. Being able to write a script and have it "acted" for a fraction of the cost, possibly free if i build it myself, is kinda mind blowing.
As a gamer this is also something that I could see having a positive impact as currently the requirement for voice acting a) often limits the amount of text content because it costs to have it voice acted in all supported languages and b) locks in content once the voice acting is done because it would be too costly to redo it for fixed text / questst / etc.
I can see studios licensing upcoming actors in perpetuity for a pittance. Alternatively, they can synthesize a persona that never ages/complains, and have dozens on nameless grunts be the underpaid talent -like the thousands of people who've donned Goofy suits for Disney
The next step is obviously for studios to create their own artificial actors. Why let someone get famous enough to demand high licensing fees when you can fully own the likeness that gets famous. Not even sure this is a bad thing since celebrity culture isn't really good either.
Too much exposure. I'm thinking instead about a dozen more young Sean Connery 007 movies, ten more young Harrison Ford Indiana Jones and Star Wars movies, etc. diluted in the next 30 years. Replace with any other star and repeat for the next century. I guess that copyright could be extended to faces and the usual 75 years after the death of the actor. Studios will take care of that.
With the rise of really good neural TTS engines that can re-create voice actors with a high level of accuracy, I've been predicting the same thing will happen in cartoons and anime for years now.
It's often missed that "the rest" includes plausible deniability. As states become ever more infatuated with surveillance, this is going to become extremely important.
I'm inclined to agree. So clever, and yet what's the actual good of it? Feels like there's a crisis in our industry (talking of the wider software engineering world) of finding real, human problems to solve. And so we end up with this, so highly advanced and yet it's not going to improve any lives (and possibly very much the reverse).
..., yet. Mostly because of the tech being decently locked down by license and limited access currently. At the pace these models are developing it's pretty much unthinkable that it will not have a big impact in the next 20 years.
AI compute is increasing at a rate faster than Moore's law. Right now if DALL-E 2 takes an 8xA100 box to run inferences, how long before it's on the workstation? 4-5 years?
It would be unwise to ignore the trajectory, a.k.a. rate of change.
In general, when looking at some phenomena, don't only look at the primary measurement (e.g. f(x)). It is often useful to look at the derivative (e.g. df/dx) as well.
P.S. The second derivative also may be informative as well. Keep going if you want.
That's because GPT and Dall-E are famously locked behind a use case restricting paywall/invite-wall ("Open"AI, my a...). You can already see things picking up with GPT-J-6B, Midjourney, Dall-E Mini and GPT-4chan, all of which are much less restrictive and therefore easier to use for casual creation.
Just wait until these things start to trickle down to the "plebs" like us and non-technical people. The amount of compute required to re-create and run these systems is the main bottleneck for now, but once the hardware is there and the models shrink to make them easier to run... that's when things will get fascinating.
> I love the technology behind this stuff but the number of applications with negative social benefit seem to outweigh the rest.
Trying to measure the overall social impact is notoriously hard. I doubt that "counting" applications is a useful way to do it.
Let me refine the initial idea (above) into two more detailed ones:
1. What are the specific aspects of why this technology might have negative social implications? How might these be mitigated?
2. Who or what (people, institutions, norms, guidelines, laws, incentives, etc.) can help increase the chances that such technology is (a) well understood; (b) designed conscientiously; and (c) deployed reasonably?
> 1. What are the specific aspects of why this technology might have negative social implications? How might these be mitigated?
Seems pretty obvious to me. People are notoriously hard-to-patch components of any IT/social system, and exploiting them with this kind of tech is possible, because they are hardwired to expect that video faking is scarce.
> 2. Who or what (people, institutions, norms, guidelines, laws, incentives, etc.) can help increase the chances that such technology is (a) well understood; (b) designed conscientiously; and (c) deployed reasonably?
The horses have bolted the barn. You literally cannot stop anyone with enough GPUs/AI accelerators from replicating a free-for-all version of any "conscientious" AI scheme, unless we ban general purpose computation.
This tech is already here. There's no taking it back easily.
> Seems pretty obvious to me. People are notoriously hard-to-patch components of any IT/social system, and exploiting them with this kind of tech is possible, because they are hardwired to expect that video faking is scarce.
People will learn to distrust video content once fake videos become commonplace so direct exploitation is only really a short term problem - at most for one generation of people growing up with friends making funny videos of things they didn't do. The longer-term problem is the loss of trust in video content used as evidence. OTOH as others have pointed out this might actually be a positive thing too considering the increasing amount of video surveillance that will too become less trustworthy and thereby also less of a problem.
> unless we ban general purpose computation.
Now that's a scary thought - it's unfortunatly already all too commonly accepted that people can't be trusted to decide what to run on their devices.
> direct exploitation is only really a short term problem - at most for one generation of people growing up with friends making funny videos of things they didn't do
For sure, but there's a window of opportunity in which an attacker can really do a lot of damage. Not much can be done about it, but it's worthy of keeping in mind.
> Now that's a scary though - it's unfortunatly already all too commonly accepted that people can't be trusted to decide what to run on their devices.
Yeah. Not that long ago banning general purpose computation seemed fanciful, but it's not as taboo today (and flipped to being the norm on device classes like the portables).
> This tech is already here. There's no taking it back easily.
Right, I don't expect it to be easy, but it is important. My emphasis here is to ask and get some perspectives on "How do we do it?" rather than leave the conversation dangling with a vague sense of, e.g., "it is too hard..."
I dunno. A lot of the worst stuff this could be used for kinda pales in comparison to what's going on already by different means.
Like, if you're trying to smear someone, using big media and abusing the courts to fuck them up (say, Assange or Hale or Reality Winner) makes this look like a Fisher Price toy club next to an uzi.
If you're trying to sway an election, the big data and subtle ads (and again, big media and the courts) makes far more of a difference than this. Think Brexit, Trump 2016, the count in Florida in 2000, Diebold machines, etc.
If you're trying to topple a government / make taxpayers pay a huge bailout / start an illegal war / etc, etc, etc - this is just another small tool; the leather punching tool on a vast penknife.
**
Conversely, the creative potential may be larger than most suspect. To me, this feels like we're getting close to something like the holodeck in Star Trek, where people can share the fruit of their imagination with anyone for virtually no cost (IP lawyers shudder, like Lionel Hutz imagining a world without lawyers).
There's something almost humorous about the last video being narrated by a text-to-speech system - hearing a system that clones human speech describe a system that clones human motion really adds a surrealist touch to the whole thing.
Really? I thought it failed completely on the mona lisa painting. The result looks more like a completely different face colored green but otherwise mostly ignoring the style of the painting - more like existing face swap apps than any kind of deep fake.
I'm really curious how well this works on highly stylized sources like anime where landmarks aren't equivalent and in some cases may not even exist
Aside, this would be sick for realtime apps, like, imagine you just get a good professional photo or two done and then drive those with your webcam? It'd be like making a vtuber of yourself
This is incredible! When I look at all the advances in computer vision and NLP in the last five years, I can't believe the pace of advancements. I have stopped saying "AI can't do ____ in our lifetime" to my friends.
This is perhaps the most impressive "AI" demo I've seen, and that's saying a lot. Interesting to read about the Moscow-based "Samsung AI Center" that seems to be producing this work: https://research.samsung.com/aicenter_moscow.
I wonder if this type of tech could be used for animating video game characters? Instead of trying to use motion capture or something like that, just record an actor making facial expressions that would drive the 3d model. It seems like they could achieve extremely realistic results.
It's very likely to be used by people who want to become state actors. And I suspect (given the source) that it's also a lowkey warning shot from a country that has a significantly deeper understanding of information and psychological warfare than its opponents.
Back in the old days, Disney and other animation studios rotoscoped actors performances by drawing over the original image (by hand) each frame. It won't be long now before you just have an artist create a few examples of concept art, and just video the performances of the actors without much / any special setup other than maybe wearing a tracking suit.
How many years away are we from the point where you can just type in a script (or just put in some writing prompts and have an AI generate a script), describe the direction for the actors ("bend over and pick up the bucket", "exit stage left"), and then just churn out a movie?
If you pick up just a little bit of skill with animation, compositing, and such you're a one-person movie studio. Crazy times. This is not what I imagined the future was going to look like, but it will be entertaining.