Astounding but I can't help be be underwhelmed at the results. Look at the Shelby Cobra dropped in at 1:01 [0] - it's a vaguely grey car-shaped thing like all of the rest of the models that captures little of the source image.
It seems like an April Fool's joke to me. The dramatic narration overlaid on lumpy terrible-looking cars made me laugh out loud several times. I mean, I'm sure what they're doing is technically impressive but the results are downright awful. All the cars look like they're beaten to hell and back and I honestly wouldn't be able to pick "KITT" out of a lineup of any other black car if it weren't for the cylon light on the front.
I was thinking that the results looked pretty bad, until the video mentioned storyboarding and it clicked to me. Even as low-quality as the models are, they still can work great as placeholders. So this would allow very quick iteration on designing a overall scene, pulling items from reference images from google searches etc, and when you are happy with the result you can send it to an army of 3d modelers (or designers or whatev) to be built.
So the comparison point should probably be like grey boxes or whatev people have been using as placeholders/throwaways so far, and compared to that these generated models actually look pretty great
Definitely not great... I bet that in a few years I bet it will be shockingly good. Experts have probably been working on it for years, and now that they've chosen a sufficient model structure with effective convolutions and whatnot, it's just a matter of tweaking the parameters and feeding it more and more data (3D scanning tools are making this easier as time goes on too).
That’s just the “texture” view. There’s also the parts and materials view that helpful segmented the various pieces of the car.
I’m not sure why they chose to show off the “texture” view without even explaining what it is.
Also, while it may look iffy right now, it seems they’ve done the heavy lifting. A round of polish and in a couple years nobody will be laughing at this. (I’m already impressed, personally).
The heavy lifting is the not-yet-done fine details in the 3D-models. Vaguely car-shaped blobs not even respecting the original general shape of the car (e.g. the Cobra or the Toyota SUV) but with the correct texture extrapolated is nifty, but nothing more.
Yeah...that looks...frankly worse than the kind of thing I did in my undergrad 3d graphics class, where we did some edge detection, tessellated it, and projected it into 3D space.
>This gives creators of any skill level the ability to easily generate models for diverse uses - storyboarding, pre-vis, game content, architectural models, and more.
It sounds like they expect you to use it when you just need a vaguely correct shape anyway, at least as a starting point for further refinement.
They turned KITT into a blurry 3d blob, poorly lit to hide the flaws. Barely qualifying as a full 3d model. "Black car" is the appropriate name for the result, nothing more.
They disabled comments on the article, and on the youtube video. Combined with YouTube's newly hidden dislike ratings, we can't know how many viewers are unimpressed. Hype has more breathing room than ever before.
I don't get the negative comments. The results look alright, but they lack detail at this point. This will improve over time and there are probably a lot of applications that don't need this level of detail anyway. I'd guess this will put some 3D modelers out of a jobb x years into the future, classic disruptive tech stuff!
I like to take a line from Two Minute Papers[1] - If this is what one "paper" was able to produce, imagine the improvements will be two "papers" down the line.
Same. These results are far better than I (with little 3D modeling experience) could generate.
A pro could do better, but a) do you have a pro available? and b) can they afford the time to model all of the item you want to throw into your mockup?
Please let me know if this already exists, but: video games are becoming huge file sizes in part due to HD graphics. Why not just generate say the faces of pedestrians on the fly? Is it important that some character in my copy of the game looks identical in all copies of the game? I would argue it just needs a unique face to go with the name.
I've seen some work on, and expect some games have used, a similar method where they'll have a bidirectional process to generate something from a seed, AND create a seed after having crafted something, allowing them to mix and match, and still achieve a super high compression rate (as, at the end, all that is stored for each face or other element is the seed that generates it).
Certainly, there are similar things in user space where you can share your creations from some games using a code that was generated or similar.
Well, there's Minecraft. Its worlds are huge and procedurally generated.
As to faces, Skyrim does that. There aren't unique face meshes for every NPC, there's some base meshes and various parameters to adjust texture and shape.
I mean is file size really a concern? I would rather have a 300 GB game with something like quixel megascanned assets than procedurally generated crap.
The latter sounds nice but in practice game execs just use that to cut corners and you end up with an unrewarding grirnd fest of a game.
Aesthetics and gameplay are completely separate topics. Procedural generation can increase the visual quality without having anything to do with how you play.
For example, increasing background details (more pedestrians, cars, traffic, sounds, etc) in a city scene can greatly add to the atmosphere regardless of your specific gameplay loop.
Spoken dialog is a killer. Studios have to contract voice actors and then it's very difficult to make tweaks later. Why not fancy text to speech with variables for intonation, smoker/nonsmoker, age, gender, accent, etc.
Going even further, things like GPT can generate text given an input. It's beyond me but couldn't something similar be used to describe environments on the fly to be generated, or generate NPC stories on the fly given inputs?
I think a lot of this tech is coming together and games are about to get a whole lot larger, more immersive, and cheaper to produce. Even just replacing voice actors would be a huge leap.
There are games that completely waste their space, but we can’t lump them all together. Some create vast detailed and beautiful worlds from their storage budget. Some of them, however, are large without any compelling reason to be, with many gigabytes of wasted space on assets that are barely used.
There's a trade off between processing time and storage space in pretty much everything to do with games. For example the lighting in a scene is often precomputed offline and then baked in to a texture which ships with the game. This increases the speed in which a frame can be rendered, but also means that your game just gained another few mbs in weight. Similar thing with animations of clothing, these are often physically based simulations done offline which are then baked in (again we often use textures to store the information, mapping RGB to XYZ).
If you were to generate faces "on the fly" as a level streamed off disk then that would likely take up too much of your frame budget and your game would grind to a stuttering halt. The other way of doing it (and this is generally how games which use procgen work) would be to either have a precompute step when you first install or launch a game. This might give you a long install or launch time and would still mean that you have a huge install size, but you would avoid taking up a chunk of your frame budget. Or you could do some clever scheduling and level design where you force the player through "tunnels" between areas that gives you enough time to generate a few new faces and load in assets before they enter an area. This is a pretty common technique where you need to load in a bunch of assets or do some heavy calculatuins. The Witcher 2 had an interesting and slightly buggy version of this where whenever Gerald opened a door to go from outside to inside the camera would swing around so you couldn't see inside the building, a second long animation would play and then when the camera turned back so you could see into the now fully loaded interior if the building. It didn't work properly but it was a good idea.
You could do what ms flight does and have a locally installed low resolution world and a high resolution streamed world. Adapting this you could stream in procgen'd faces and other items generated offline. I think this would work rather nicely.
Lastly, I know it was just an example but still, faces are probably the last thing you want to procgen with out passing the results by a human filter as humans are incredibly sensitive to them looking "off". If all your characters end up looking grotesque and off putting you might be shooting your self in the foot. See the furore about eFootball's terrifying crowd models.
I think most open world AAA games have done this for over a decade. They either vary a base face model randomly, or mix from hundreds of variants, hairstyles, clothing options, accessories etc. Literally no one is making 400 unique models for a crowd of 400.
Yes! The tech mentioned in the article solves this problem: no more "we have to remove/compress all of these models/textures because our game can't be 300GB". Just use 10 high-res pictures of the object from all angles instead of a model. This could save orders of magnitude of disk space
It also solves the problem of "we have to choose what objects to model because our 3D artists don't have infinite time". Now the artists review GAN output and tweak what the algorithm gets wrong, creating tons of content without needing to create every vertex from scratch. Very exciting
Not sure if "10 high-res pictures of the object from all angles instead of a model" saves any disk space at all considering models in games are usually just one high res picture uv wrapped to vertexes.
Also, creating new content from GAN might not be super appealing for the artistic qualities of the game. Why not do what Escape from Tarkov seem to do with photogrammetry on real objects? Many props like soup cans, couches, whole environments, weapons, can be acquired in real world reasonably easily.
It’s great for Indie game devs who just want quick assets and don’t want to go through the hassle of finding some artist and paying obscene money for a couple assets. My inability to get good consistent artwork is what has always demotivated me from actually finishing any kind of game.
I remember as a kid messing around with early desktop-computer photogrammetry software: the kind where you have to print-out some black+white marker circles and place them around some trinket you want to capture in 3D.
..it worked, but the output *.3ds files were low-resolution and the textures only looked good from one direction - and the material's lighting was way off.
So the concept is nothing new, but it's never as simple as taking even a bunch of photos, let alone 1 photo - there's so much information necessary for future renders that cannot be captured from photos alone.
Curious, does anyone know software that could generate body sizing for an individual based in image stills or video taken from a normal mobile phone? Essentially creating a 3d human model but with high degree of accuracy.
I don't know of one yet. Lot of people are working on it, using various strategies. The grail being able to provide a virtual fitting-room experience. Or monitoring fitness of individuals.
A picture always has scale uncertainty (a 2-meter human viewed from 1 meter away look the same that a 1-meter human viewed from 0.5 meter away), so that is an additional problem that must be taken care of.
But now recent phone have 3d sensor that provide information that could be useful.
To generate 3d human models there is also makehuman . In the old days there was a soft called facegen, that could generate 3d face models and could automatically fit their parameters to two pictures using an iterating refinement procedure.
Deep-learning usually estimate everything jointly in a single step so they are faster, but often less accurate. But there exist models that learn to refine a previously generated model, so you can apply them repeatedly and get improved quality (Denoising Diffusion Probabilistic Models is one generic class of models that does this).
A commercial (and therefore, far more expensive) version of this tool is https://www.capturingreality.com. Cool that nVidia is starting to make this more accessible, even if the results aren't quite usable yet
Wasn't there just a tool posted here to do this on a Show HN recently? I can't remember the name of it now. I believe you had to upload more than one image for it to work but they had an api.
Edit: after looking this is what I was thinking of.
Yeah, much of human intelligence is also just statistics. We do a lot of unsupervised learning and reasoning based on correlation.
The opposite of this would be causal reasoning but that is hard for humans as well. Not everyone can use causal reasoning, it requires many years of training. The discovery of causal mechanisms (science) is slow and it takes many people to do it.
For example the president of Turkey believes interest rates should be lowered in a hyperinflation. What can we do? Causal reasoning is hard.
I think we do something else, slightly different than causal reasoning. We're generating hypothesis and testing them out, in an iterative process. Like a GAN (generator+discriminator) or Actor Critic (policy + value function). More famously, AlphaGo was generating many rollouts for each move, one model to generate moves, another to evaluate their consequences.
[0]: https://youtu.be/gz5E9wszZSI?t=61