MegaPortraits: One-Shot Megapixel Neural Head Avatars

ansible · on July 20, 2022

There are still some issues to be worked out, such as how the head shape distorts in some examples, but overall, this is very, very impressive work.

Back in the old days, Disney and other animation studios rotoscoped actors performances by drawing over the original image (by hand) each frame. It won't be long now before you just have an artist create a few examples of concept art, and just video the performances of the actors without much / any special setup other than maybe wearing a tracking suit.

How many years away are we from the point where you can just type in a script (or just put in some writing prompts and have an AI generate a script), describe the direction for the actors ("bend over and pick up the bucket", "exit stage left"), and then just churn out a movie?

If you pick up just a little bit of skill with animation, compositing, and such you're a one-person movie studio. Crazy times. This is not what I imagined the future was going to look like, but it will be entertaining.

go_prodev · on July 20, 2022

Why stop there?

How long until AI can measure my brain response and adjust the script in real time to make the film less predictable, more exciting, relaxing, humorous or engaging based on how I'm feeling?

It's not a massive leap from what you described.

selfhoster11 · on July 21, 2022

No, thank you. I would rather not have the film playback software gaslight me in real-time.

While I think this idea has a lot of potential, it bugs me too.

account42 · on July 21, 2022

I can't wait for the new Star Wars edition where YOU decide who shoots first.

dboon · on July 21, 2022

Neal Stephenson in The Diamond Age describes precisely this. It’s a lovely read.

go_prodev · on July 21, 2022

Thanks for the recommendation!

grumbel · on July 21, 2022

That's going to get interesting. What makes movies 'great' is not just the movie itself, but the shared experience of others having watched that same movie too. That gets lost when you start doing movies completely customized to the viewer.

And once AI is capable of replacing the "social experience" of movie viewing, humanity might not be far away from having rendered itself obsolete.

go_prodev · on July 21, 2022

What if AI could respond to everyone sharing the experience in the room?

A group of friends all contributing in some way. Afterwards discussing the parts of the film that they might have been responsible for willing into existence.

nine_k · on July 21, 2022

A movie controlled by your reactions, with plot development unique for you is a good videogame.

keawade · on July 20, 2022

We've got AI assisted rotoscoping already and while it looks a bit janky at times its still a whole lot faster than doing it all by hand. https://www.youtube.com/watch?v=tq_KOmXyVDo

IntrepidWorm · on July 21, 2022

Interesting - is it still faster when the artist endeavors to correct the AI jankiness where it occurs? Or at the present day is it more trouble than it's worth?

anigbrowl · on July 20, 2022

describe the direction for the actors

What actors? There are a lot of writers who will jump at the opportunity to skip all the translation and re-interpretation by others and directly build the visuals as they go. Some of this will be extremely cringeworthy, but a lot of it will be astonishingly good.

mdonahoe · on July 21, 2022

Just think of the memes.

You'll be able to convert from video into a screenplay, edit some text, then render to video again.

daenz · on July 20, 2022

I love the technology behind this stuff but the number of applications with negative social benefit seem to outweigh the rest.

belter · on July 20, 2022

I can see famous actors licensing their image, with multiple shadow actors doing their moves. Now you can make 50 movies per year instead of 5. You heard it here first...

UmYeahNo · on July 20, 2022

Pretty sure that's already a thing, not for movies per sé but for ads where a celeb can license their avatar - 3d model and voice for use in ads. Especially useful when they want the celebs lips to move correctly when the ad is translated into non-native languages.

upwardbound · on July 21, 2022

Damn, I didn't realize that was already being done. Are there any articles about this or other references you could point us to about this phenomenon?

lijogdfljk · on July 20, 2022

Yup. As an aspiring indie game dev i eagerly await large scale emotive text to speech. Being able to write a script and have it "acted" for a fraction of the cost, possibly free if i build it myself, is kinda mind blowing.

nl · on July 21, 2022

Microsoft's latest models are very, very good at this.

Try the demos here (and be sure to experiment with the "speaking style" parameter box, and try out the "(Neural) Preview" voices for US English): https://azure.microsoft.com/en-us/services/cognitive-service...

account42 · on July 21, 2022

As a gamer this is also something that I could see having a positive impact as currently the requirement for voice acting a) often limits the amount of text content because it costs to have it voice acted in all supported languages and b) locks in content once the voice acting is done because it would be too costly to redo it for fixed text / questst / etc.

sheepybloke · on July 21, 2022

Also, shout out to the team my Mycroft for Mimic3, their open source TTS model: https://mycroft.ai/blog/introducing-mimic-3/?cn-reloaded=1

sangnoir · on July 20, 2022

I can see studios licensing upcoming actors in perpetuity for a pittance. Alternatively, they can synthesize a persona that never ages/complains, and have dozens on nameless grunts be the underpaid talent -like the thousands of people who've donned Goofy suits for Disney

selfhoster11 · on July 21, 2022

You say "licensing their image", I see "pirated likeness". Not that this is necessarily a bad thing.

account42 · on July 21, 2022

The next step is obviously for studios to create their own artificial actors. Why let someone get famous enough to demand high licensing fees when you can fully own the likeness that gets famous. Not even sure this is a bad thing since celebrity culture isn't really good either.

ricochet11 · on July 20, 2022

see the film "The Congress" where Robin Wright does exactly this

https://www.youtube.com/watch?v=zkDyKWKNeaE

pmontra · on July 20, 2022

Too much exposure. I'm thinking instead about a dozen more young Sean Connery 007 movies, ten more young Harrison Ford Indiana Jones and Star Wars movies, etc. diluted in the next 30 years. Replace with any other star and repeat for the next century. I guess that copyright could be extended to faces and the usual 75 years after the death of the actor. Studios will take care of that.

wsinks · on July 20, 2022

I've been thinking it, but I haven't seen someone else write it - see ya in 2025 Belter.

throwaway675309 · on July 20, 2022

With the rise of really good neural TTS engines that can re-create voice actors with a high level of accuracy, I've been predicting the same thing will happen in cartoons and anime for years now.

ceeplusplus · on July 20, 2022

The shadow actor still needs to deliver a convincing performance though (=$$$). AI can't magically make bad acting turn into good acting.

kibibu · on July 20, 2022

Why not create an AI model that improves acting?

towaway15463 · on July 20, 2022

Acting style transfer. And they thought the creatives would have the safe jobs! Ha!

DrewADesign · on July 20, 2022

I wonder if The Rolling Stones will be around long enough to license AI-written and performed albums.

jwmoz · on July 20, 2022

That sucks, movie nowadays are getting pretty bad.

poulpy123 · on July 21, 2022

the eternal reboot of the MCU movies

poulpy123 · on July 21, 2022

you should patent/copyright the idea and profit from it

zamalek · on July 20, 2022

> the rest.

It's often missed that "the rest" includes plausible deniability. As states become ever more infatuated with surveillance, this is going to become extremely important.

mrwh · on July 20, 2022

I'm inclined to agree. So clever, and yet what's the actual good of it? Feels like there's a crisis in our industry (talking of the wider software engineering world) of finding real, human problems to solve. And so we end up with this, so highly advanced and yet it's not going to improve any lives (and possibly very much the reverse).

tomc1985 · on July 21, 2022

It will improve the lives of these projects' financial backers.

Won't somebody think of the investors?

postalrat · on July 20, 2022

Similar technology has been around for a while and so far it's only generated a few laughs and some use in the film industry.

GPT-3 hasn't taken over social media.

Dall-e 2 hasn't put all graphical artists out of a job.

Tossrock · on July 21, 2022

Early in the war, "someone" (obviously Russia) released a deepfake of Zelenskyy giving a speech telling his soldiers to surrender: https://www.npr.org/2022/03/16/1087062648/deepfake-video-zel...

gardaani · on July 21, 2022

European politicians were also fooled by deep fake video calls: https://www.theguardian.com/world/2022/jun/25/european-leade...

shynrou · on July 20, 2022

..., yet. Mostly because of the tech being decently locked down by license and limited access currently. At the pace these models are developing it's pretty much unthinkable that it will not have a big impact in the next 20 years.

postalrat · on July 20, 2022

Access is limited by the hardware you have access to. Forget about training, just running these models takes a lot of memory.

mcbuilder · on July 21, 2022

AI compute is increasing at a rate faster than Moore's law. Right now if DALL-E 2 takes an 8xA100 box to run inferences, how long before it's on the workstation? 4-5 years?

selfhoster11 · on July 21, 2022

It's to be expected. Almost nobody really cared about accelerating AI computations, so I expect that AI chips will be catching up.

xpe · on July 20, 2022

It would be unwise to ignore the trajectory, a.k.a. rate of change.

In general, when looking at some phenomena, don't only look at the primary measurement (e.g. f(x)). It is often useful to look at the derivative (e.g. df/dx) as well.

P.S. The second derivative also may be informative as well. Keep going if you want.

xpe · on July 20, 2022

> Similar technology has been around for a while and so far it's only generated a few laughs and some use in the film industry.

Humility and less overconfidence, please. You aren't omniscient. You aren't in every room where these technologies have been discussed.

anigbrowl · on July 20, 2022

Get back to me in 5 years.

selfhoster11 · on July 21, 2022

That's because GPT and Dall-E are famously locked behind a use case restricting paywall/invite-wall ("Open"AI, my a...). You can already see things picking up with GPT-J-6B, Midjourney, Dall-E Mini and GPT-4chan, all of which are much less restrictive and therefore easier to use for casual creation.

Just wait until these things start to trickle down to the "plebs" like us and non-technical people. The amount of compute required to re-create and run these systems is the main bottleneck for now, but once the hardware is there and the models shrink to make them easier to run... that's when things will get fascinating.

xpe · on July 20, 2022

> I love the technology behind this stuff but the number of applications with negative social benefit seem to outweigh the rest.

Trying to measure the overall social impact is notoriously hard. I doubt that "counting" applications is a useful way to do it.

Let me refine the initial idea (above) into two more detailed ones:

1. What are the specific aspects of why this technology might have negative social implications? How might these be mitigated?

2. Who or what (people, institutions, norms, guidelines, laws, incentives, etc.) can help increase the chances that such technology is (a) well understood; (b) designed conscientiously; and (c) deployed reasonably?

selfhoster11 · on July 21, 2022

> 1. What are the specific aspects of why this technology might have negative social implications? How might these be mitigated?

Seems pretty obvious to me. People are notoriously hard-to-patch components of any IT/social system, and exploiting them with this kind of tech is possible, because they are hardwired to expect that video faking is scarce.

> 2. Who or what (people, institutions, norms, guidelines, laws, incentives, etc.) can help increase the chances that such technology is (a) well understood; (b) designed conscientiously; and (c) deployed reasonably?

The horses have bolted the barn. You literally cannot stop anyone with enough GPUs/AI accelerators from replicating a free-for-all version of any "conscientious" AI scheme, unless we ban general purpose computation.

This tech is already here. There's no taking it back easily.

account42 · on July 21, 2022

> Seems pretty obvious to me. People are notoriously hard-to-patch components of any IT/social system, and exploiting them with this kind of tech is possible, because they are hardwired to expect that video faking is scarce.

People will learn to distrust video content once fake videos become commonplace so direct exploitation is only really a short term problem - at most for one generation of people growing up with friends making funny videos of things they didn't do. The longer-term problem is the loss of trust in video content used as evidence. OTOH as others have pointed out this might actually be a positive thing too considering the increasing amount of video surveillance that will too become less trustworthy and thereby also less of a problem.

> unless we ban general purpose computation.

Now that's a scary thought - it's unfortunatly already all too commonly accepted that people can't be trusted to decide what to run on their devices.

selfhoster11 · on July 21, 2022

> direct exploitation is only really a short term problem - at most for one generation of people growing up with friends making funny videos of things they didn't do

For sure, but there's a window of opportunity in which an attacker can really do a lot of damage. Not much can be done about it, but it's worthy of keeping in mind.

> Now that's a scary though - it's unfortunatly already all too commonly accepted that people can't be trusted to decide what to run on their devices.

Yeah. Not that long ago banning general purpose computation seemed fanciful, but it's not as taboo today (and flipped to being the norm on device classes like the portables).

xpe · on July 22, 2022

> This tech is already here. There's no taking it back easily.

Right, I don't expect it to be easy, but it is important. My emphasis here is to ask and get some perspectives on "How do we do it?" rather than leave the conversation dangling with a vague sense of, e.g., "it is too hard..."

mandmandam · on July 20, 2022

I dunno. A lot of the worst stuff this could be used for kinda pales in comparison to what's going on already by different means.

Like, if you're trying to smear someone, using big media and abusing the courts to fuck them up (say, Assange or Hale or Reality Winner) makes this look like a Fisher Price toy club next to an uzi.

If you're trying to sway an election, the big data and subtle ads (and again, big media and the courts) makes far more of a difference than this. Think Brexit, Trump 2016, the count in Florida in 2000, Diebold machines, etc.

If you're trying to topple a government / make taxpayers pay a huge bailout / start an illegal war / etc, etc, etc - this is just another small tool; the leather punching tool on a vast penknife.

**

Conversely, the creative potential may be larger than most suspect. To me, this feels like we're getting close to something like the holodeck in Star Trek, where people can share the fruit of their imagination with anyone for virtually no cost (IP lawyers shudder, like Lionel Hutz imagining a world without lawyers).

hansword · on July 20, 2022

"Neural Head Avatar" is not a good name, but it sure beats "deepfake".

DrewADesign · on July 20, 2022

I wonder which Hollywood actors will come pre-licensed in Unreal Engine 6?

bigiain · on July 21, 2022

You get all of them in 虛幻引擎 6 and probably Нереальный двигатель 6 as well...

roughly · on July 20, 2022

There's something almost humorous about the last video being narrated by a text-to-speech system - hearing a system that clones human speech describe a system that clones human motion really adds a surrealist touch to the whole thing.

coolspot · on July 21, 2022

Wait until they reveal that the whole paper was generated by some GPT-4 network.

anigbrowl · on July 20, 2022

Was not prepared for the emotional reaction to seeing Mona Lisa looking around and really smiling.

bouvin · on July 21, 2022

For me, the most startling one was seeing (a) Frida Kahlo smile.

coffee_beqn · on July 21, 2022

The paintings were super impressive

account42 · on July 21, 2022

Really? I thought it failed completely on the mona lisa painting. The result looks more like a completely different face colored green but otherwise mostly ignoring the style of the painting - more like existing face swap apps than any kind of deep fake.

shimonabi · on July 20, 2022

This is so impressive it acutally scares me.

moondev · on July 20, 2022

This is seriously incredible. Coolest thing I have seen in a very long time. Curious how long it takes to render one of the short example clips shown.

m4x · on July 21, 2022

In their video they state the the optimised versions run at 70 FPS.

I'm very tempted to replace my Teams camera with this.

fire · on July 21, 2022

I'm really curious how well this works on highly stylized sources like anime where landmarks aren't equivalent and in some cases may not even exist

Aside, this would be sick for realtime apps, like, imagine you just get a good professional photo or two done and then drive those with your webcam? It'd be like making a vtuber of yourself

malshe · on July 20, 2022

This is incredible! When I look at all the advances in computer vision and NLP in the last five years, I can't believe the pace of advancements. I have stopped saying "AI can't do ____ in our lifetime" to my friends.

anewpersonality · on July 20, 2022

The deepfake industry just got a lot bigger.

sebmellen · on July 20, 2022

This is perhaps the most impressive "AI" demo I've seen, and that's saying a lot. Interesting to read about the Moscow-based "Samsung AI Center" that seems to be producing this work: https://research.samsung.com/aicenter_moscow.

thewanderer1983 · on July 21, 2022

Now Muggles get animated photos as well.

monkpit · on July 21, 2022

I wonder if this type of tech could be used for animating video game characters? Instead of trying to use motion capture or something like that, just record an actor making facial expressions that would drive the 3d model. It seems like they could achieve extremely realistic results.

Tossrock · on July 21, 2022

https://blog.roblox.com/2022/03/real-time-facial-animation-a...

tomcam · on July 20, 2022

Holy shit

ge96 · on July 21, 2022

Damn that's so good, when does it come out for a Zoom AR layer.

coolspot · on July 21, 2022

https://github.com/iperov/DeepFaceLive

imtu80 · on July 21, 2022

Pretty neat, I wonder if the AI can take predict the voice of the person in the picture and make it talk.

xpe · on July 20, 2022

How does the algorithm decide what the side (and back) of a head looks like?

teruakohatu · on July 21, 2022

A LOT of training data.

baxtr · on July 20, 2022

I wonder how this could be used by a state actor to manipulate.

anigbrowl · on July 20, 2022

It's very likely to be used by people who want to become state actors. And I suspect (given the source) that it's also a lowkey warning shot from a country that has a significantly deeper understanding of information and psychological warfare than its opponents.

reidjs · on July 20, 2022

When will something like this be available to the average user?

coolspot · on July 21, 2022

https://github.com/iperov/DeepFaceLive

zitterbewegung · on July 20, 2022

It already is https://www.thispersondoesnotexist.com is powered by StyleGAN [1] . Its on GitHub at [2]

[1] https://en.wikipedia.org/wiki/StyleGAN

[2] https://nvlabs.github.io/stylegan3/

1f60c · on July 20, 2022

The photos at the top of the page are the researchers—the demos (two videos) are a bit further down the page.