If anyone is interested in this field, we need your help with real-time reconstruction.
Currently a huge challenge is in real-time reconstruction. Approaches involve estimating point clouds from images then optimizing splats on those. Other approaches are using SLAM like LiDAR to have distance of points to the camera but then still optimizing on that.
Optimization is producing good results but takes iterations that are not suitable for real-time.
Pixel-wise splat estimation with iPhone LiDAR could produce good results but need help and expertise
If anyone is up to the challenge, Apple provides an excellent sample project available for download with low level access and operations on the LiDAR points here and shading all for an iPhone app:
You could probably already make something compelling like a small myst-style game that uses largely static assets with existing tools, taking advantage of being able to render real world environments with high fidelity.
However for more typical games there's still a long way to go. Most of the research focus has been towards applications where existing mesh-based approaches fall short (eg. photogrammetry), but this isn't really the case for modern game development. The existing rendering approaches and hardware have largely been built FOR games and leveraged elsewhere.
Rebuilding the rendering stack around an entirely new technology is a tall order that will take a long time to pay off. That being said, the technology is promising in a number of ways. You even have games like Dreams (2020), which was built using custom splat rendering to great effect (https://www.youtube.com/watch?v=u9KNtnCZDMI).
It will require major changes to the art pipeline. New tools and tech that hasn't existed before. But, that tech is being made right now at a surprisingly fast rate. In a couple years at most it will be possible to make at least an indie-level game with splats. Maybe even with models, animations, audio made to the 80% level with generative AI.
I personally think it's a distraction that we're using so many real-world sources as the source data. But arbitrarily detailed synthetic data, e.g. Nanite, Fractals, etc. can provide much more interesting spaces. I'm kind of surprise that nobody isn't just using examples of Mandelbulbs or whatever as standard data sets for these techniques.
On the whole though, this is really interesting and definitely is an improvement over the much older splat techniques.
Could someone explain what splatting is suitable for? I see lots of recreations of photographs. Is this some a kind of compression technique, or is there some other usage?
Traditionally, if you have a real scene that you want to be able to reproduce as 3D graphics on the computer, you either:
1. Have an artist model it by hand. This is obviously expensive. And, will be stylized by the artist, have a quality levels based on artist skill, accidental inaccuracies, etc...
2. Use photogrammetry to convert a collection of photos to 3D meshes and textures. Still a fair chunk of work. Highly accurate. But, quality varies wildly. Meshes and textures tend to be heavyweight yet low-detail. Reflections and shininess in general doesn't work. Glass, mirrors and translucent objects don't work. Only solid, hard surfaces work. Nothing fuzzy.
Splatting is an alternative to photogrammetry that also takes photos as input and produces visually similar, often superior results. Shiny/reflective/fuzzy stuff all works. I've even seen an example with a large lens.
However the representation is different. Instead of a mesh and textures, the scene is represented as fuzzy blobs that may have view-angle-dependent color and transparency. This is actually an old idea, but it was difficult to render quickly until recently.
The big innovation though is to take advantage of the mathematical properties of "fuzzy blobs" defined by equations that are differentiable, such as 3D gaussians. That makes them suitable to be manipulated by many of the same techniques used under the hood in training deep learning AIs. Mainly, back-propagation.
So, the idea of rendering scenes with various kinds of splats has been around for 20+ years. What's new is using back-propagation to fit splats to a collection of photos in order to model a scene automatically. Before recently, splats were largely modeled by artists or by brute force algorithms.
Because this idea fits so well into the current AI research hot topic, a lot of AI researchers are having tons of fun expanding on the idea. New enhancements to the technique are being published daily.
Thanks for the explanation. Reading the paper, it seems it still takes 2-4 hours to train (one scene?). I imagine that's still faster than any manual method.
One big reason why people are excited is that it's finally a practical way to synthesize and render complete photorealistic 3D scenes without any of the traditional structural elements like triangle meshes and texture maps.
Think of the difference between vector graphics (like SVG) and bitmap graphics (like JPEG or PNG). While vectors are very useful for many things, it would be quite limiting if they were the only form of 2D computer graphics, and digital photos and videos simply didn't exist. That's where we have been in 3D until now.
Probably the biggest real life application is 3D walkthroughs of houses for sale & rental. This already exists, but the quality isn't as good as shown here.
Other examples are things like walking through archeological sites, 3D virtual backgrounds (e.g. for newsrooms), maybe crime scene reconstruction?
It's basically perfect 3D capture, except the big limitations are that you can't change the geometry or lighting. The inability to relight it is probably the most severe restriction.
It is likely the future of compositing and post-processing. Instead of blue screens, you can capture actors and real life items and blend them seamlessly with CGI stages and props (relighting is likely coming soon too). Additionally, you can reframe, add or remove elements from the scene in post-production essentially for free.
Novel view synthesis, so based on some images of a scene, rendering views from positions and angles that were not originally recorded. 3D Gaussian Splatting, besides beating the state-of-the-art in terms of visual quality at its time of release, also has some nice physical properties like having the splats associated with actual points in 3D (obtained through feature extraction).
It's useful for help in planning just about any work that is done outdoors, since it gives you a digital version of reality, just like photogrammetry. Think agriculture, construction, architecture, etc. Many environments can not be photographed in their entirety because of the laws of physics. Here, a digital recreation of reality is the only way to get an accurate picture. For example underwater environments and caves.
3d navigation, lidar-like maps of a space just based on pics, to allow a drone to beable to have a lidar-like spatial awareness from 2d imagery? (aside from the cool photograpgy bits it offers)
As an academic, nothing infuriates me more than the promise of code and/or data. There are valid reasons this happens, but the reality is such promise is quickly forgotten.
Even provided code is poorly documented and rarely works on a machine other than the author's.
One common phrase is.
"Data is available from the authors upon reasonable request"
Try doing that for any paper older than 18 months.
All it does it contribute to the replication crisis.
Which is kind of the point I was inferring. Everything now is just a stake in the ground, house pending… I think for a paper about an algorithm or novel way of doing something should be required to include the source code (not necessarily for release, but for proof to the journal reviewer at least). I’ve waited countless times for source code promised in a paper to never arrive, only to be flipped into some commercial offering only the ultra wealthy can afford. So I’m taking the stance of “I don’t believe you unless there’s a runtime or source code” because it’s too easy with AI to fake the claim.
Currently a huge challenge is in real-time reconstruction. Approaches involve estimating point clouds from images then optimizing splats on those. Other approaches are using SLAM like LiDAR to have distance of points to the camera but then still optimizing on that.
Optimization is producing good results but takes iterations that are not suitable for real-time.
Pixel-wise splat estimation with iPhone LiDAR could produce good results but need help and expertise