Hacker News new | past | comments | ask | show | jobs | submit login
Sakuga-42M Dataset: Scaling Up Cartoon Research (arxiv.org)
85 points by snats 11 months ago | hide | past | favorite | 31 comments



Wow, this looks like the beginning of the end for in-between animation.

A large chunk of an anime's budget goes to in-betweening. Essentially human interpolation of graphics between two key frames (Its usually like 6-12 inbetweens per key frame). People hate this job, and it is highly unproductive, so generally outsourced by Japan to other countries.

Western animation decided to abandon it altogether, and move first to flash, then to 3d animation. But in retrospect that was a mistake, as it lost so much of the creative flexibility of 2d animation. Anime today is substantially bigger than western animation as a result. Crunchyroll has 13 million subscribers.

AI will solve the problems with 2d animation. Something like SORA fine-tuned on anime-keyframe data like this. Can probably easily solve in-betweening. Then the 2d animation workflow will dominate 3d. Its so much easier to just draw a beginning and end key-frame, then have the AI fill it in. Than to model and rig and render a entire scene.


This is one job I am actually hopeful will be fully automated. I've heard before that the people doing this 'tween' frames make less than minimum wage on contract.

That being said, I'm sure this is going to take a while to fully replace the manual efforts. There will probably be awkward phase in between the outset and the perfect modelling efforts, and I'm sure lower budget shows (or ones looking to cut corners) will be the early adopters.


Anime already cuts corners by using cheap 3d models in place of 2d hand drawn objects.

The more rigid the object, the better this works. 3d cars look better than 2d cars, even in an otherwise 2d show. Mechs look a bit worse. And human characters look horrible.

Yet anime studios still do it. Including for critical highlight scenes like dance scenes (Check out Love live dance scens), because it is so, so hard to draw humans dancing.

So if anime studios are willing to do something, that looks obviously bad, as a widespread practice. There'll be 0 barriers to AI inbetweening adaptation, which would likely look BETTER than human inbetweening within a year of release.

AI anime art has already wiped out the lower-end of patreon artists, and is heavily impacting the mid-tier. Because AI has gotten more technically proficient than the average mid-tier artist. Pretty much only the higher-end can hold their heads above water. Or they have to transition to drawing comics with storylines, instead of just simple images.


A lot of the dancing stuff is about the ability to spin the camera around a moving subject to the music, which is quite difficult otherwise.

There's a lot of impressive work in 3D animation that looks quite good. Outside of Bandai Namco's work on idol anime, Studio Orange has made some of the best looking 3D modeled anime lately and a few other studios have been getting into it. I'm more familiar with video game animation, where Arcsys Works has made great strides too, by using animation on threes, manual tweening, stretch and squish bones, and carefully UV mapped textures for crisp color boundaries.


>3d cars look better than 2d cars, even in an otherwise 2d show.

This is quite debatable, if you notice that the car is a 3D object, then something is already wrong.


It was a major consideration for Initial D, which is pretty much the definitive car anime. Animating movement in tandem with dynamic camera movement is very difficult (also why shows like Love Live use 3DCGI for dance scenes, and why Disney was using 3D elements in films like Beauty and the Beast) and modeling accurate vehicle physics in hand drawn animation is also difficult. It simply wouldn't have been viable without 3D animation.

https://youtube.com/watch?v=YDqKsQu9el4


Usually you can tell by the fact that nothing is wrong, since the 3D model is very consistent and on model. A hand-drawn car is usually not that.


The same could be said for a 3D human character, it's very consistent because it's a 3D model but it's horrible to look at.


Do you notice when Bluey characters are animated from 3D or 2D? The software they use allows to do 2D drawings from 3D animated models.

https://www.celaction.com/en/celaction2d/


I have never seen Bluey but from the software you linked it is clear that it looks 2D because of how inconsistent the character looks at different angles, for example when you rotate the character the mouth changes position, the hands jump from one sprite to another, it's cel shading with a lot of 2D element on top, it works with simple animation but for anime I'm not optimistic.


His point, I believe: artistically interpreting the motion and shape of humans or objects with larger moving parts makes animation look more on-style.

But for "boring" rigid objects, there's less of this advantage; hence, the consistency benefits often are more important.


unless it's Miyazaki films - in which case most humans are intended to be lifeless rigid objects and every machines are to be lifelike animations (/s)


I've seen 3D animation where the people are still quite fluid and not awful to look at. Not as good as 2D animation but still pretty good and more than watchable.

https://www.youtube.com/watch?v=OO9zNw_uHg4

https://www.youtube.com/watch?v=eCc4md8Cuy8


Friend, but anime was never about quality of animation. In fact, it was a prime example how to cut corners to get to animations. That was always the case. It doesn't reflect on the quality of character designs, environments, storytelling, camera action, directing, etc. Motion was not one of them; Never was. Anime is the first place I'd expect to see new ground breaking, just like it was with all the tools from 90's onwards (Toonz, anyone?).


Anime is much better at action animation than anyone else, simply because noone else remembers how to do it because they've stopped trying.

I'm not sure if I'll be able to find this, but there's an episode of Steven Universe with an extended reference to a scene from Kill la Kill of a transformation sequence.

It looks maybe 1% as good; not only that, but the character turns into a pure white silhouette for the entire transformation because it doesn't have a character design suitable for being transformed. (Instead it has one designed to make the animators' lives easier.)


Most of that could be dealt with via proper compositing of the shots and managing the layers/lights. When it's good, you only recognize the 3d computer drawn effect because it's so good that you realize no human could have ever done this.

When it's bad, you recognize that it's janky crap tier 3d animation from a company that either didn't care or was put under such a tight timeline that they simply couldn't care.


> I've heard before that the people doing this 'tween' frames make less than minimum wage on contract.

An employee of an animation company describes in a comic book his experience on working with people drawing the in-between frames. They were paid literally with rice bags [1].

[1]: https://en.wikipedia.org/wiki/Pyongyang:_A_Journey_in_North_...


Can't have a conversation about inbetween frames without sharing this[1] Noodle video. Now granted the interpolation there is pretty rigid, and the goal of this dataset will be to train an LLM replicate hand tweening, but I can't help but feel there's going to be a massive uncanny valley.

[1]https://youtu.be/_KRb_qV9P4g?si=vtDvfLU6XYz5QwWV&t=355


I wondered about that—they stated that they processed videos to pull out keyframes. It makes me wonder how much that will or won't help with in-betweening.


Keyframes in digital video means a different thing from keyframes in animation; they could be related but usually wouldn't be. You can hide animation keyframes with a number of techniques, for instance by having different "keyframes" for different parts of a scene so the video doesn't show them all at the same instant.

(Rough generalization, but anime is more likely to do this with foreground vs background elements, while Western animation would do this with different characters. In anime the keyframe artists tend to draw every character in the frame, and put more of a personal touch on them, so it's easier to see their individual styles.)

Since this is a "sakuga" dataset, Mitsuo Iso is one of the most famous examples of really natural looking movement here.

https://www.youtube.com/watch?v=NTMJ8dGFUkc

When they're bad at it you get this effect where characters constantly seem to "settle" into a keyframe pose that looks realistic but too static, and then immediately go back into inbetweening that moves a lot but isn't physically possible. I feel like B-grade Disney stuff is the worst here but don't have an example on hand.


Doesn't this dataset very obviously violate copyright?


The idol dancing example from is clearly from "Oshi no Ko", and the Nyarlahotep example is clearly from "Haiyore! Nyaruko-san". They seemed to have adopted an "ask for forgiveness" instead of "ask for permission" approach with respect to copyrights, and I don't think that's the right way to go.


It's a list of links, and they're not rehosting the media, so it's as legal as any search engine or other collection of links.

Of course, for a user to use the dataset, they'd have to download it. Whether or not that's a copyright violation depends on their local laws.


> It's a list of links, and they're not rehosting the media, so it's as legal as any search engine or other collection of links.

That's probably still illegal. The implied purpose is copyright violation and piracy. Judges aren't computers - they're capable of discerning when someone is trying to skirt the law by saying "here's something you could use to do something illegal...wink wink nudge nudge".


It's not a list of links to piracy websites and torrents. If the destination is legal, then there's nothing to worry about.

Those other sites that got in trouble are links to all pirated content, for the purpose of pirating content.

And we still haven't decided this isn't fair use (it's non-commercial and used only for research, so it can't really be said it harms the copyright holders' interests), and fair use is by definition not a violation of copyright.


From the GitHub repo of the “dataset”:

> [May 16, 2024 Update] Due to the recently added anti-bot measures by the data holders, our downloading pipeline is no longer working. The video links are still accessible through a browser but not via our python crawler. We are working on a workaround and will make an update once we find one. At this time, we are still providing the parquet files to researchers, but researchers will need to find a way obtain the video data. Thank you for your understanding.

Ah, so on top of making a “dataset” of a category of specific works, they are also making people hammer the servers of other parties who never agreed to pay the bandwidth for a bunch of “researchers” wanting to download all of these files. Classy.


It's used as research. How valid is this usage as fair use?


Explicitly legal in some countries, and the same way LAION worked for Stable Diffusion.


What kind of question is this? Why would it and why is it obvious? Could you state your reasons and/or qualifications for this statement? Rather than having random people speculate as to what you mean? This question seems designed to fish for low quality replies.


No, it's well within the fair use exemption.


Did anyone save it before they took it down?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: