My grandmother's farm had to be sold in 2012 after she died. Since my family moved around when I was a kid, but always visited there for holidays, it felt more like home than any other place I lived in. I have extensive videos I recorded in 2006. It'd be wonderful to walk through there again using reconstructions from material I already have.
Or maybe not. There's a reason I haven't watched those videos in years. Who wants to remember the garden of Eden when you know you can't go back?
I think it depends on where you are in life. If your life is good and you feel safe and happy, then looking back can be a nice way to remember your childhood, and possibly rediscover things about yourself you had forgotten. If your life is difficult and you’re not feeling great, looking back at better times can be a painful reminder of things not going the way you’d like. But our lives ebb and flow, and there may be a time where you feel like looking back.
Or, to quote Watchmen, "I'm 65 years old. Everyday the future looks a little bit darker. But the past, even the grimy parts of it, well, it just keeps on getting brighter all the time."
When you're young there's often nothing to look forward to but the future, so why revisit the past?
When you're old, your health is failing, and there's no future to look forward to, you're gonna want to look back in time -- maybe remember the better days of your youth.
Structure from motion does not produce good visual fidelity on plants. I’m designing a farming robot and I want remote farmers to be able to view a 3D image of the plants to check for issues, so fidelity is very important. I’ve done a lot of experiments with photogrammetry and NERF, while still presenting a lot of technical challenges, seems far superior for this.
I get the sense that they are mostly using the smoothed views as an example of good results on long scenes. Ultimately the point of nerf is novel/arbitrary view synthesis, which you’re not going to get with Hyperlapse.
And NERF of long tracks is exactly what we need to capture a long row of plants at the farm.
This algorithm constructs a 3D environment from the video data - they're just showcasing it with stabalization. Classical methods require better cameras and more meta. Deep learning is an opportunity for more robust methods for the same end, but also do things like estimating lighting and capturing large scenes.
The renderings look a lot prettier but the 3D structure seems not really good.
Most nerfs use classical bundle adjustment (eg Colmap) as an initialization but this one does not, and the authors mention that they leave bundle adjustment for future work.
It's interesting how the level of detail improves as the camera gets closer to objects. Specifically the transparency of foliage gets more detailed when getting closer. That makes one think if a multipass version of this thing could use the details it learned later in the timeline to improve the detail level earlier.
Let's say you want to slightly alter the spline the camera of the 3D scene moves along, but the positions/angles would change in such a way there is missing data in the new rendering. How feasible would it be to use current inpainting technologies to fill in the gaps straight in the scene? Would it be better to try to inpaint the rendered frames instead?
I think what you're getting a demo of there is the bit of our depth perception apparatus that doesn't need stereo pairs to work. Out past a certain distance our brains reconstruct geometry from visual depth cues, and I suspect that the smoothed camera path means that, in contrast to the shaky original video, your eyes are seeing something your inner ears aren't contradicting too much.
Never thought of it that way, but you might very well be right. Something like "my ear isn't shaking, the camera isn't shaking, must be real" ;-)
I wonder if maybe their implementation "fills in the blanks" in the 3D space in the same way our brain does, so it looks "properly 3D" because it's what we're expecting it to look like already?
Check out Luma it's an iPhone app that makes it really easy to create Nerfs and render out videos from them. Although I don't think it handles long paths like this method yet.
What a weird paper. They compare 3D rendering a synthetic camera path against a stock 2D image stabilisation algorithm. Of course, true 3D algorithms will win.
And their main takeaway seems to be that one should do global bundle adjustment for recovering the camera poses ... which I thought has been common knowledge for years and is what pretty much every SfM tool implements.
My TLDR would be: stuff that works well continues to work well even if you use a neural radiance field instead of a point cloud for representing geometry.
Those results look eerily similar to Microsoft's 2016 Hyperlapse paper&software.
It looks like the paper and code for both the original NeRF and this new Localrf method are free and open source, so that bodes well for integration into Kdenlive. In any case, I'm available for contracts ;)
Or maybe not. There's a reason I haven't watched those videos in years. Who wants to remember the garden of Eden when you know you can't go back?