Hacker News new | past | comments | ask | show | jobs | submit login

>All the vids have that instantly recognizable GenAI "sheen"

That's something that can be fixed in a future release or you can fix it right now with some filters in post in your pipeline.






I think the big blind spot people have with these models is that the release pages only show just the AI output. But anyone competently using these AI tools will be using them in step X of a hundred step creative process. And it's only going to get worse as both the AI tools improve and people find better ways to integrate them into their workflow.

Yeah exactly. Video pipelines that go into productions we only see the end results of have a lot of steps to them beyond just the raw video output/capture. Even Netflix/Hollywood productions without VFX have a lot of retouching and post processing to them.

Not even filters; every text2image model ever created thusfar, can be very easily nudged with a few keywords into generating outputs in a specific visual style (e.g. artwork matching the signature style of any artist it has seen the some works from.)

This isn't an intentional "feature" of these models; rather, it's kind of an inherent part of how such models work — they learn associations between tokens and structural details of images. Artists' names are tokens like any other, and artists' styles are structural details like any other.

So, unless the architecture and training of this model are very unusual, it's gonna at least be able to give you something that looks like e.g. a "pencil illustration."


> "That's something that can be easily fixed in a future release (...)"

This has been the default excuse for the last 5+ years. I won't hold my breath.


5 years ago there were no AI videos. A bit over a year ago the best AI videos were hilarious hallucinations of Will Smith eating spaghetti.

Today we have these realistic videos that are still in the uncanny valley. That's insane progress in the span of a year. Who knows what it will be like in another year.

Let'em cook.


Disco Diffusion was a (bad) thing in 2021 that lead to the spaghetti video / weird Burger Kind Ads level quality. But it ran on consumer GPUs / in Jupyter notebook.

2 years ago we had decent video generation for clips

7 months ago we have Sora https://news.ycombinator.com/item?id=39393252 (still silence since then)

With these things, like DALL-E 1 and GPT-3, the original release of the game changer often comes ca. 2 years before people can actually use it. I think that's what we're looking at.

I.e. it's not as fast as you think.


What video generation was decent 2 years ago? Will smith eating spaghetti was barely coherent and clearly broken, and that was March 2023 (https://knowyourmeme.com/memes/ai-will-smith-eating-spaghett...).

And isn’t this model open source…? So we get access to it, like, momentarily? Or did I miss something?


So you're right to be excited, I agree. And I don't know, Meta, like OpenAI, seems to release conditionally, though yes, more. I doubt it before the election.

When the Will Smith one was released, it was kind of a parody though. Tech had already been able to produce that level of "quality" for about 2 years at the time of it's publishing. The Will Smith one is honestly something you could have created with Disco Diffusion in early 2021, I used to do this back then...

2022 saw: https://makeavideo.studio/ (coherent, but low res - it was possible to upscale at extreme expense) https://sites.research.google/phenaki/ https://lumiere-video.github.io/

It was more like 18-20 months ago sorry so early 2023, but https://runwayml.com/research/gen-1 was getting there as was https://pika.art/home - Sora obviously changed the game, but I would say these two were great.


The subtle "errors" are all low hanging fruit. It reminds me of going to SIGGRAPH years back and realizing most of the presentations were covering things which were almost imperceptible when looking at the slides in front. The math and the tech was impressively, but qualitatively it might have not even mattered.

The only interesting questions now have nothing to do with capability but with economics and raw resources.

In a few years, or less, clearly we'll be able to take our favorite books and watch unabridged, word-for-word copies. The quality, acting, and cinematography will rival the biggest budget Hollywood films. The "special effects" won't look remotely CG like all of the newest Disney/Marvel movies -- unless you want them to. If publishers put up some sort of legal firewall to prevent it, their authors, characters, and stories will all be forgotten.

And if we can spend $100 of compute and get something I described above, why wouldn't Disney et al throw $500m at something to get even more out of it, and charge everyone $50? Or maybe we'll all just be zoo animals soon (Or the zoo animals will have neuralink implants and human level intelligence, then what?)


> In a few years, or less, clearly we'll be able to take our favorite books and watch unabridged, word-for-word copies. The quality, acting, and cinematography will rival the biggest budget Hollywood films. The "special effects" won't look remotely CG like all of the newest Disney/Marvel movies -- unless you want them to. If publishers put up some sort of legal firewall to prevent it, their authors, characters, and stories will all be forgotten.

I'm also expecting, before 2030, that video game pipelines will be replaced entirely. No more polygons and textures, not as we understand the concepts now, just directly rendering any style you want, perfectly, on top of whatever the gameplay logic provided.

I might even get that photorealistic re-imagining of Marathon 2 that I've been wanting since 1997 or so.


> In a few years, or less, clearly we'll be able to take our favorite books and watch unabridged, word-for-word copies. The quality, acting, and cinematography will rival the biggest budget Hollywood films. The "special effects" won't look remotely CG like all of the newest Disney/Marvel movies -- unless you want them to. If publishers put up some sort of legal firewall to prevent it, their authors, characters, and stories will all be forgotten.

I don't think so at all. You're thinking a movie is just the end result that we watch in theaters. Good directing is not a text prompt, good editing is not a text prompt, good acting is not a text prompt. What you'll see in a few years is more ads. Lots of ads. People who make movies aren't salivating at this stuff but advertising agencies are because it's just bullshit content meant to distract and be replaced by more distractions.


Indeed, adverts come first.

But at the same time, while it is indeed true that the end result is far more than simply just making good images, LLMs are weird interns at everything — with all the negative that implies as well as the positive, so they're not likely to produce genuinely award winning content all by themselves even though they can do better by asking them for something "award winning" — so it's certainly conceivable that we'll see AI indeed do all these things competently at some point.


> "In a few years, or less, clearly we'll be able to take our favorite books and watch unabridged, word-for-word copies."

That would be a boring movie.


You had AI videos 5 years ago?

AI in general.

…I mean, it was advancing slowly for linguistic tasks until late 2022, that’s fair. That’s why we’re in such a crazy unexpected rollercoaster of an era - we accidentally cracked intuitive computing while trying to build the best text autocomplete.

AI in general is from 1950, or more generally from whenever the abacus was invented. This very website runs on AI, and always has. I would implore us to speak more exactly if we’re criticizing stuff; “LLMs” came around (in force) in 2023, both for coherent language use (ChatGPT 3.5) and image use (DALLE2). The predecessors were an order of magnitude less capable, and going back 5 years puts us back in the era of “chatbots”, aka dumb toys that can barely string together a Reddit comment on /r/subredditsimulator.


AI so far has given us ability to mass produce shit content of no use to anybody and the next iteration of customer support phone menu trees that sound more convincingly yet remain just as useless. That and another round of IP theft and mass surveillance in the name of progress.

This is a consequence of a type of cognitive bias - bad examples of AI are more easily detectable than good examples of AI. Subsequently, when we recall examples of AI content, bad examples are more easily accessible. This leads to the faulty conclusion that.

> AI so far has given us ability to mass produce shit content of no use to anybody

Good AI goes largely undetected, for the simple reason that it closely matches the distribution of non-AI content.

Controversial aside: This is same bias that results in non-passing trans people being representative of the whole. Passing trans folk simply blend in.


This basic concept can be applied in many places. Do you ever wonder why social movements seem to never work out well and demands are never met? That’s because when they do work out, and demands are met, those people quickly become the “oppressor” or the powerful class from which others are fighting to receive more rights or money.

All criminals seem so incredibly stupid that you can’t understand why anyone would ever try since they all are caught? The smart ones don’t get caught and no one ever hears about them.


You're making an unverifiable claim. How are we supposed to know that the undetected good AI exists at all? Everything I've seen explicitly produced by any of these models is in uncanny valley territory still, even the "good" stuff.

Don't care. Every request for verification will eventually reach the Münchhausen trilemma

Okay. So you are a person who does not care if what they are saying is true. Got it!

Verificationism[1] is a failed epistemology because it breaks under the Münchhausen trilemma. It's pseudo-scientific like astrology, four humors, and palm reading. Use better epistemologies.

https://en.wikipedia.org/wiki/Verificationism


The core use case is as a small part of larger programs. It’s just computer vision but for words :)

We don't have AI in general today



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: