Hacker News new | past | comments | ask | show | jobs | submit login
Localrf – Nerf from casual shaky videos (localrf.github.io)
267 points by smusamashah on June 15, 2023 | hide | past | favorite | 30 comments



My grandmother's farm had to be sold in 2012 after she died. Since my family moved around when I was a kid, but always visited there for holidays, it felt more like home than any other place I lived in. I have extensive videos I recorded in 2006. It'd be wonderful to walk through there again using reconstructions from material I already have.

Or maybe not. There's a reason I haven't watched those videos in years. Who wants to remember the garden of Eden when you know you can't go back?


I think it depends on where you are in life. If your life is good and you feel safe and happy, then looking back can be a nice way to remember your childhood, and possibly rediscover things about yourself you had forgotten. If your life is difficult and you’re not feeling great, looking back at better times can be a painful reminder of things not going the way you’d like. But our lives ebb and flow, and there may be a time where you feel like looking back.


Or, to quote Watchmen, "I'm 65 years old. Everyday the future looks a little bit darker. But the past, even the grimy parts of it, well, it just keeps on getting brighter all the time."


When you're young there's often nothing to look forward to but the future, so why revisit the past?

When you're old, your health is failing, and there's no future to look forward to, you're gonna want to look back in time -- maybe remember the better days of your youth.


FYI: Under the "Video Comparisons" section, those two little "Sequence" and "Method" buttons are actually dropdowns.


Oh thank god. I wondered what on earth it was compared to


For those like me who didn't know what NeRF means - Neural Radiance Fields.

[1] https://www.matthewtancik.com/nerf

[2] https://datagen.tech/guides/synthetic-data/neural-radiance-f...


dang, might the title be corrected to NeRF instead of the toy? ;)


Is this better than what was previously achievable using classical structure from motion? It seems worse because of extreme detail loss?

And if they're just plotting a smoothed but similar path through the scenery, didn't Microsoft do that in 2014 with Hyperlapse?


Structure from motion does not produce good visual fidelity on plants. I’m designing a farming robot and I want remote farmers to be able to view a 3D image of the plants to check for issues, so fidelity is very important. I’ve done a lot of experiments with photogrammetry and NERF, while still presenting a lot of technical challenges, seems far superior for this.

I get the sense that they are mostly using the smoothed views as an example of good results on long scenes. Ultimately the point of nerf is novel/arbitrary view synthesis, which you’re not going to get with Hyperlapse.

And NERF of long tracks is exactly what we need to capture a long row of plants at the farm.


This algorithm constructs a 3D environment from the video data - they're just showcasing it with stabalization. Classical methods require better cameras and more meta. Deep learning is an opportunity for more robust methods for the same end, but also do things like estimating lighting and capturing large scenes.


The renderings look a lot prettier but the 3D structure seems not really good.

Most nerfs use classical bundle adjustment (eg Colmap) as an initialization but this one does not, and the authors mention that they leave bundle adjustment for future work.


Incredible future VR use-case for all the hours of useless video footage in my archives.


It's interesting how the level of detail improves as the camera gets closer to objects. Specifically the transparency of foliage gets more detailed when getting closer. That makes one think if a multipass version of this thing could use the details it learned later in the timeline to improve the detail level earlier.


Let's say you want to slightly alter the spline the camera of the 3D scene moves along, but the positions/angles would change in such a way there is missing data in the new rendering. How feasible would it be to use current inpainting technologies to fill in the gaps straight in the scene? Would it be better to try to inpaint the rendered frames instead?


You may be interested in GeNVS (https://nvlabs.github.io/genvs/) which combines NeRFs and generative diffusion models.


Related: "Nerfies: Deformable Neural Radiance Fields"

AKA: Selfie nerfs from hand-held phone camera selfie vids.


I wonder if this could be implemented in https://github.com/gyroflow/gyroflow to further enhance the result from gyro based stabilization.


When I've got my glasses on, the "Forest" sequences look *wildly* 3D to me.

Without my glasses it looks less so but that might be a function of not really being able to focus on stuff closer than about ten metres ;-)


I think what you're getting a demo of there is the bit of our depth perception apparatus that doesn't need stereo pairs to work. Out past a certain distance our brains reconstruct geometry from visual depth cues, and I suspect that the smoothed camera path means that, in contrast to the shaky original video, your eyes are seeing something your inner ears aren't contradicting too much.


Never thought of it that way, but you might very well be right. Something like "my ear isn't shaking, the camera isn't shaking, must be real" ;-)

I wonder if maybe their implementation "fills in the blanks" in the 3D space in the same way our brain does, so it looks "properly 3D" because it's what we're expecting it to look like already?


It's interesting how people that are clearly visible in the "input" disappear in the processed input and/or look like ghosts. It's a bit scary looking


Can't wait for this stuff to become so good and polished behind a UI, so that I can just point it to a folder of videos and see what comes out.


Check out Luma it's an iPhone app that makes it really easy to create Nerfs and render out videos from them. Although I don't think it handles long paths like this method yet.

https://lumalabs.ai/



What a weird paper. They compare 3D rendering a synthetic camera path against a stock 2D image stabilisation algorithm. Of course, true 3D algorithms will win.

And their main takeaway seems to be that one should do global bundle adjustment for recovering the camera poses ... which I thought has been common knowledge for years and is what pretty much every SfM tool implements.

My TLDR would be: stuff that works well continues to work well even if you use a neural radiance field instead of a point cloud for representing geometry.

Those results look eerily similar to Microsoft's 2016 Hyperlapse paper&software.


Hey, that kinda looks like the trail from Upper Yosemite Falls to El Capitan.


nice

edit: is this process public? how can i test and try this? on kdenlive?

edit 2: https://github.com/facebookresearch/localrf

got it but when will it be implemented by kdenlive? any dev here?


> got it but when will it be implemented by kdenlive? any dev here?

I hate to be that person but https://kdenlive.org/en/developers-welcome/

This is research work. I don't suspect anyone currently has any plans to implement something like this into kdenlive.


It looks like the paper and code for both the original NeRF and this new Localrf method are free and open source, so that bodes well for integration into Kdenlive. In any case, I'm available for contracts ;)




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: