Hacker News new | past | comments | ask | show | jobs | submit login
MegaPortraits: One-Shot Megapixel Neural Head Avatars (samsunglabs.github.io)
280 points by voydik on July 20, 2022 | hide | past | favorite | 86 comments



There are still some issues to be worked out, such as how the head shape distorts in some examples, but overall, this is very, very impressive work.

Back in the old days, Disney and other animation studios rotoscoped actors performances by drawing over the original image (by hand) each frame. It won't be long now before you just have an artist create a few examples of concept art, and just video the performances of the actors without much / any special setup other than maybe wearing a tracking suit.

How many years away are we from the point where you can just type in a script (or just put in some writing prompts and have an AI generate a script), describe the direction for the actors ("bend over and pick up the bucket", "exit stage left"), and then just churn out a movie?

If you pick up just a little bit of skill with animation, compositing, and such you're a one-person movie studio. Crazy times. This is not what I imagined the future was going to look like, but it will be entertaining.


Why stop there?

How long until AI can measure my brain response and adjust the script in real time to make the film less predictable, more exciting, relaxing, humorous or engaging based on how I'm feeling?

It's not a massive leap from what you described.


No, thank you. I would rather not have the film playback software gaslight me in real-time.

While I think this idea has a lot of potential, it bugs me too.


I can't wait for the new Star Wars edition where YOU decide who shoots first.


Neal Stephenson in The Diamond Age describes precisely this. It’s a lovely read.


Thanks for the recommendation!


That's going to get interesting. What makes movies 'great' is not just the movie itself, but the shared experience of others having watched that same movie too. That gets lost when you start doing movies completely customized to the viewer.

And once AI is capable of replacing the "social experience" of movie viewing, humanity might not be far away from having rendered itself obsolete.


What if AI could respond to everyone sharing the experience in the room?

A group of friends all contributing in some way. Afterwards discussing the parts of the film that they might have been responsible for willing into existence.


A movie controlled by your reactions, with plot development unique for you is a good videogame.


We've got AI assisted rotoscoping already and while it looks a bit janky at times its still a whole lot faster than doing it all by hand. https://www.youtube.com/watch?v=tq_KOmXyVDo


Interesting - is it still faster when the artist endeavors to correct the AI jankiness where it occurs? Or at the present day is it more trouble than it's worth?


describe the direction for the actors

What actors? There are a lot of writers who will jump at the opportunity to skip all the translation and re-interpretation by others and directly build the visuals as they go. Some of this will be extremely cringeworthy, but a lot of it will be astonishingly good.


Just think of the memes.

You'll be able to convert from video into a screenplay, edit some text, then render to video again.


I love the technology behind this stuff but the number of applications with negative social benefit seem to outweigh the rest.


I can see famous actors licensing their image, with multiple shadow actors doing their moves. Now you can make 50 movies per year instead of 5. You heard it here first...


Pretty sure that's already a thing, not for movies per sé but for ads where a celeb can license their avatar - 3d model and voice for use in ads. Especially useful when they want the celebs lips to move correctly when the ad is translated into non-native languages.


Damn, I didn't realize that was already being done. Are there any articles about this or other references you could point us to about this phenomenon?


Yup. As an aspiring indie game dev i eagerly await large scale emotive text to speech. Being able to write a script and have it "acted" for a fraction of the cost, possibly free if i build it myself, is kinda mind blowing.


Microsoft's latest models are very, very good at this.

Try the demos here (and be sure to experiment with the "speaking style" parameter box, and try out the "(Neural) Preview" voices for US English): https://azure.microsoft.com/en-us/services/cognitive-service...


As a gamer this is also something that I could see having a positive impact as currently the requirement for voice acting a) often limits the amount of text content because it costs to have it voice acted in all supported languages and b) locks in content once the voice acting is done because it would be too costly to redo it for fixed text / questst / etc.


Also, shout out to the team my Mycroft for Mimic3, their open source TTS model: https://mycroft.ai/blog/introducing-mimic-3/?cn-reloaded=1


I can see studios licensing upcoming actors in perpetuity for a pittance. Alternatively, they can synthesize a persona that never ages/complains, and have dozens on nameless grunts be the underpaid talent -like the thousands of people who've donned Goofy suits for Disney


You say "licensing their image", I see "pirated likeness". Not that this is necessarily a bad thing.


The next step is obviously for studios to create their own artificial actors. Why let someone get famous enough to demand high licensing fees when you can fully own the likeness that gets famous. Not even sure this is a bad thing since celebrity culture isn't really good either.


see the film "The Congress" where Robin Wright does exactly this

https://www.youtube.com/watch?v=zkDyKWKNeaE


Too much exposure. I'm thinking instead about a dozen more young Sean Connery 007 movies, ten more young Harrison Ford Indiana Jones and Star Wars movies, etc. diluted in the next 30 years. Replace with any other star and repeat for the next century. I guess that copyright could be extended to faces and the usual 75 years after the death of the actor. Studios will take care of that.


I've been thinking it, but I haven't seen someone else write it - see ya in 2025 Belter.


With the rise of really good neural TTS engines that can re-create voice actors with a high level of accuracy, I've been predicting the same thing will happen in cartoons and anime for years now.


The shadow actor still needs to deliver a convincing performance though (=$$$). AI can't magically make bad acting turn into good acting.


Why not create an AI model that improves acting?


Acting style transfer. And they thought the creatives would have the safe jobs! Ha!


I wonder if The Rolling Stones will be around long enough to license AI-written and performed albums.


That sucks, movie nowadays are getting pretty bad.


the eternal reboot of the MCU movies


you should patent/copyright the idea and profit from it


> the rest.

It's often missed that "the rest" includes plausible deniability. As states become ever more infatuated with surveillance, this is going to become extremely important.


I'm inclined to agree. So clever, and yet what's the actual good of it? Feels like there's a crisis in our industry (talking of the wider software engineering world) of finding real, human problems to solve. And so we end up with this, so highly advanced and yet it's not going to improve any lives (and possibly very much the reverse).


It will improve the lives of these projects' financial backers.

Won't somebody think of the investors?


Similar technology has been around for a while and so far it's only generated a few laughs and some use in the film industry.

GPT-3 hasn't taken over social media.

Dall-e 2 hasn't put all graphical artists out of a job.


Early in the war, "someone" (obviously Russia) released a deepfake of Zelenskyy giving a speech telling his soldiers to surrender: https://www.npr.org/2022/03/16/1087062648/deepfake-video-zel...


European politicians were also fooled by deep fake video calls: https://www.theguardian.com/world/2022/jun/25/european-leade...


..., yet. Mostly because of the tech being decently locked down by license and limited access currently. At the pace these models are developing it's pretty much unthinkable that it will not have a big impact in the next 20 years.


Access is limited by the hardware you have access to. Forget about training, just running these models takes a lot of memory.


AI compute is increasing at a rate faster than Moore's law. Right now if DALL-E 2 takes an 8xA100 box to run inferences, how long before it's on the workstation? 4-5 years?


It's to be expected. Almost nobody really cared about accelerating AI computations, so I expect that AI chips will be catching up.


It would be unwise to ignore the trajectory, a.k.a. rate of change.

In general, when looking at some phenomena, don't only look at the primary measurement (e.g. f(x)). It is often useful to look at the derivative (e.g. df/dx) as well.

P.S. The second derivative also may be informative as well. Keep going if you want.


> Similar technology has been around for a while and so far it's only generated a few laughs and some use in the film industry.

Humility and less overconfidence, please. You aren't omniscient. You aren't in every room where these technologies have been discussed.


Get back to me in 5 years.


That's because GPT and Dall-E are famously locked behind a use case restricting paywall/invite-wall ("Open"AI, my a...). You can already see things picking up with GPT-J-6B, Midjourney, Dall-E Mini and GPT-4chan, all of which are much less restrictive and therefore easier to use for casual creation.

Just wait until these things start to trickle down to the "plebs" like us and non-technical people. The amount of compute required to re-create and run these systems is the main bottleneck for now, but once the hardware is there and the models shrink to make them easier to run... that's when things will get fascinating.


> I love the technology behind this stuff but the number of applications with negative social benefit seem to outweigh the rest.

Trying to measure the overall social impact is notoriously hard. I doubt that "counting" applications is a useful way to do it.

Let me refine the initial idea (above) into two more detailed ones:

1. What are the specific aspects of why this technology might have negative social implications? How might these be mitigated?

2. Who or what (people, institutions, norms, guidelines, laws, incentives, etc.) can help increase the chances that such technology is (a) well understood; (b) designed conscientiously; and (c) deployed reasonably?


> 1. What are the specific aspects of why this technology might have negative social implications? How might these be mitigated?

Seems pretty obvious to me. People are notoriously hard-to-patch components of any IT/social system, and exploiting them with this kind of tech is possible, because they are hardwired to expect that video faking is scarce.

> 2. Who or what (people, institutions, norms, guidelines, laws, incentives, etc.) can help increase the chances that such technology is (a) well understood; (b) designed conscientiously; and (c) deployed reasonably?

The horses have bolted the barn. You literally cannot stop anyone with enough GPUs/AI accelerators from replicating a free-for-all version of any "conscientious" AI scheme, unless we ban general purpose computation.

This tech is already here. There's no taking it back easily.


> Seems pretty obvious to me. People are notoriously hard-to-patch components of any IT/social system, and exploiting them with this kind of tech is possible, because they are hardwired to expect that video faking is scarce.

People will learn to distrust video content once fake videos become commonplace so direct exploitation is only really a short term problem - at most for one generation of people growing up with friends making funny videos of things they didn't do. The longer-term problem is the loss of trust in video content used as evidence. OTOH as others have pointed out this might actually be a positive thing too considering the increasing amount of video surveillance that will too become less trustworthy and thereby also less of a problem.

> unless we ban general purpose computation.

Now that's a scary thought - it's unfortunatly already all too commonly accepted that people can't be trusted to decide what to run on their devices.


> direct exploitation is only really a short term problem - at most for one generation of people growing up with friends making funny videos of things they didn't do

For sure, but there's a window of opportunity in which an attacker can really do a lot of damage. Not much can be done about it, but it's worthy of keeping in mind.

> Now that's a scary though - it's unfortunatly already all too commonly accepted that people can't be trusted to decide what to run on their devices.

Yeah. Not that long ago banning general purpose computation seemed fanciful, but it's not as taboo today (and flipped to being the norm on device classes like the portables).


> This tech is already here. There's no taking it back easily.

Right, I don't expect it to be easy, but it is important. My emphasis here is to ask and get some perspectives on "How do we do it?" rather than leave the conversation dangling with a vague sense of, e.g., "it is too hard..."


I dunno. A lot of the worst stuff this could be used for kinda pales in comparison to what's going on already by different means.

Like, if you're trying to smear someone, using big media and abusing the courts to fuck them up (say, Assange or Hale or Reality Winner) makes this look like a Fisher Price toy club next to an uzi.

If you're trying to sway an election, the big data and subtle ads (and again, big media and the courts) makes far more of a difference than this. Think Brexit, Trump 2016, the count in Florida in 2000, Diebold machines, etc.

If you're trying to topple a government / make taxpayers pay a huge bailout / start an illegal war / etc, etc, etc - this is just another small tool; the leather punching tool on a vast penknife.

**

Conversely, the creative potential may be larger than most suspect. To me, this feels like we're getting close to something like the holodeck in Star Trek, where people can share the fruit of their imagination with anyone for virtually no cost (IP lawyers shudder, like Lionel Hutz imagining a world without lawyers).


"Neural Head Avatar" is not a good name, but it sure beats "deepfake".


I wonder which Hollywood actors will come pre-licensed in Unreal Engine 6?


You get all of them in 虛幻引擎 6 and probably Нереальный двигатель 6 as well...


There's something almost humorous about the last video being narrated by a text-to-speech system - hearing a system that clones human speech describe a system that clones human motion really adds a surrealist touch to the whole thing.


Wait until they reveal that the whole paper was generated by some GPT-4 network.


Was not prepared for the emotional reaction to seeing Mona Lisa looking around and really smiling.


For me, the most startling one was seeing (a) Frida Kahlo smile.


The paintings were super impressive


Really? I thought it failed completely on the mona lisa painting. The result looks more like a completely different face colored green but otherwise mostly ignoring the style of the painting - more like existing face swap apps than any kind of deep fake.


This is so impressive it acutally scares me.


This is seriously incredible. Coolest thing I have seen in a very long time. Curious how long it takes to render one of the short example clips shown.


In their video they state the the optimised versions run at 70 FPS.

I'm very tempted to replace my Teams camera with this.


I'm really curious how well this works on highly stylized sources like anime where landmarks aren't equivalent and in some cases may not even exist

Aside, this would be sick for realtime apps, like, imagine you just get a good professional photo or two done and then drive those with your webcam? It'd be like making a vtuber of yourself


This is incredible! When I look at all the advances in computer vision and NLP in the last five years, I can't believe the pace of advancements. I have stopped saying "AI can't do ____ in our lifetime" to my friends.


The deepfake industry just got a lot bigger.


This is perhaps the most impressive "AI" demo I've seen, and that's saying a lot. Interesting to read about the Moscow-based "Samsung AI Center" that seems to be producing this work: https://research.samsung.com/aicenter_moscow.


Now Muggles get animated photos as well.


I wonder if this type of tech could be used for animating video game characters? Instead of trying to use motion capture or something like that, just record an actor making facial expressions that would drive the 3d model. It seems like they could achieve extremely realistic results.



Holy shit


Damn that's so good, when does it come out for a Zoom AR layer.



Pretty neat, I wonder if the AI can take predict the voice of the person in the picture and make it talk.


How does the algorithm decide what the side (and back) of a head looks like?


A LOT of training data.


I wonder how this could be used by a state actor to manipulate.


It's very likely to be used by people who want to become state actors. And I suspect (given the source) that it's also a lowkey warning shot from a country that has a significantly deeper understanding of information and psychological warfare than its opponents.


When will something like this be available to the average user?




The photos at the top of the page are the researchers—the demos (two videos) are a bit further down the page.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: