Could this be used for completely automated exemption of objects based on the focus range? I think, with a clever algorithm which analyzes the sharpness of all these layers it migght be possible.
I don't know if this technique also expands to moving images, but if so, maybe it could also be used to automatically composite those. Without the need of a green-screen whatsoever. Basically, you would be separating the image layers based on distance instead of chroma.
So for sports photography it seems very useful.
Replacing the green screen doesn't seem to make sense.
For macroscopic objects depth reconstruction with two cameras like in the Kinect seems the better alternative.
Object excemption seems to be a nice idea. I can't remember having read anything about this. In principle it should be possible to recover an unsectioned stack of images.
One can then use an iterative algorithm to subtract in-focus information from on slice of the stack from each of the other slices and end up with a deconvolved image of sectioned images.
Then one could delete one object in the stack and recalculate a superposition of blurred sectioned images to recover a reconstruction representing the object without the image.
This is quite complicated. Just imagine to remove a wine glass from a scene. One needs to delete all the rays that went through the wine glass and bend them such as though the wine glass wasn't there.
One can argue that polarization and absorbtion effects will be very hard or even impossible to handle correctly.
Certainly light fields contain A LOT of potential.
The initial applications of this are interesting, but it's what comes AFTER that will be really cool.
This is the capture part of the capture and display of true 3D images.
What do I mean by 'true'? Imagine a screen that works like a window.
If you think about a window or a mirror as a display screen, you can imagine that every point on the screen is a tiny hemispherical lens, light exits the screen in all directions due to these lenses. By producing light in every direction (as opposed to just perpendicular to the screen + diffusion) you could let your eye decide on what to focus. Additionally such a system would be view-angle agnostic, so you could look from the side and see a wider 'view' into the scene (again noting this works for n viewers).
Such a display would be complex to implement, but even if you had one you'd need image capture such as Lytro is providing to make it work.
Hacked some python script together for parsing this and generated some HTML file with the data out of it.
Still not sure when to change the frame, the whole thing is of by some pixels.
Should i post it, not sure about the legal component. I'm from germany so reverse engineering is allowed in some cases, not sure if this is covered here.
Surely those are just standard depth-maps as used every day in VFX for compositing? Each pixel has a z-depth as to how far it is from the camera, and using standard compositing software (Nuke) you can blur just a slice (depth-wise) of the image based on the z ranges?
Anyone know anything else about the company? Founders, investors, etc? The only thing I could dig up is that Manu Kumar has Lytro's Twitter account on his "portfolio" list [1] and that the domain was, interestingly, created in 2003. Formerly known as "Refocus Imaging".
Ren Ng, a former student of Marc Levoy's (at Stanford), started the company. Ng was widely believed to be one of the rising superstars in computer graphics/computational imaging when he decided to leave academia and start Refocus Imaging.
The company seems to be doing well, and recently changed names to Lytro in order to not be pigeonholed into refocusing applications only.
A bit off-topic, but do you think that's the right approach? A niche field like that can be really good for a company, and if you want to broach another field, don't you think it's a good idea to start a separate brand and image for that field instead of generalizing away from your niche?
McDonald's bought Chipotle but didn't rename it "McDonald's", neither did they rename McDonald's Chiptole. McDonald's is for hamburgers and Chipotle is for burritos even if the money ends up in the hands of the same people.
It is almost never beneficial to merge existing brands (unless one of the brands has a horrible reputation), so why should someone generalize away from a successful niche application and lose the associated branding instead of just starting a new brand for the new field?
Couldn't you do something like this with Kinect? Depth information is all one needs to get a plausible "software focus" effect. Not sure how effective it would be outdoors though.
You could hack up something very roughly like it with Kinect but it wouldn't work as well, it'd be a neat hack but not actually usable for anything serious.
Kinect's image capture is very low resolution (even by today's cellphone camera standards), it doesn't give you depth information at a per-pixel resolution and even ignoring those issues in addition to depth information you also need a source image which has critically sharp focus across the entire viewing range (you can't selectively focus in software that which was captured out of focus with a standard digital sensor), which means using a very small aperture (large F-stop value). So it'll be very difficult to capture anything but still-life images because the small aperture means a long exposure time, and thus motion blur if anything moves. Granted this is already less of an issue with Kinect because the sensor in it is so tiny that getting out-of-focus areas is not that much of a concern, but the cost of that is that the image resolution is also atrocious.
Once you get up to usable sensor resolutions, if you're already limited to taking long exposures of still-life images on a tripod, you might as well skip the IR depth perception and just take a series of wider aperture pictures at different focus plane levels, focus stack the results and preprocess the image series for blur levels to work out the relative depths of the in-focus bits of each source image. At least doing it that way you can use a DSLR to get quality photos.
Neither of these is a true replacement for what they are doing here, though.
For the Kinect to work you need a surface where the infrared spot pattern is imaged. For most of the things we look at the Kinect will be sufficient.
However one captures a lot more information with the light field camera. For example transparent things like smoke, fog, glass and things with weird optical properties like polished steel or the Tiger eye mineral with Chatoyance will be captured by this camera.
This gives the photographer tremendously more artistic space. Just imagine photographing a close up of an eye with the Kinect technology.
One can argue that the light field camera will maintain a quality in the image one could never achieve with Kinect based systems (without a lot of photoshopping).
Their method captures a light field instantaneously at
the expense of spatial resolution. They place a microlens array where the film would be, followed by a sensor. Each microlens forms an disk on the sensor with angular light distribution. Developed in Mark Levoy's group http://graphics.stanford.edu/papers/lfcamera/
This method works nice for photography where all the dimensions involved are much bigger than wavelength.
I work on something like that in fluorescence microscopes.
I can tell you, it is much harder when you have to
consider wave optics.
The idea has been around since the turn of the century.Lippmann proposed it [1]. But the first guy to BUILD a light field camera was a Russian in 1911 [2]. He built a light field camera AND display using a copper plate drilled with 1200 pinholes and reconstructed the light field of a lamp.
BTW, your comment illustrates that much of "computational photography" is just re-application of tricky imaging tech from other fields. Nothing wrong with that, but something to keep in mind.
Of course they're not really holograms - they deal on the "pencil" ray level and not the wave optics level - even so... seeing the first 3D reconstruction of a light field (the apparent image of the lamp) must have been totally thrilling.
Film still has many advantages for plenoptic stuff. It's a large, single-use sensor.
My coworker went to GoogleI/O and got a Galaxy Tab 10.1. With an Android app that meets that description. I cannot take another loom or get it's name over the weekend, though.
It also had a feature that automatically saved the grams of a video that had a face wit a smile.
Tried it just now and it is merely a "technology demo" rather than a proper app. Having played with it for 20 minutes I wasn't able to make a single image that was worth saving, but still it was a buck well spent.
What they are doing is simply grabbing frames from DSLR video; a short 1-2 second video recording with manual focus shifting from one subject to the other, and just saving a number of frames ripped out of that short video clip.
On the other hand, it does look like they're interpolating between discrete frames. In their fourth example (http://lytro.com/gallery/content/lytro_50_00090.php), take a look at the piece of confetti to the top left of the middle head when the focus slider is about 1/3 of the way up from the bottom. It has a sharp edge, but there's also a halo around it from a frame where it's out of focus.
Kind of interesting. Somebody has to do it I guess. But its not as flashy as they think - refocus the picture? Or focus right the 1st time I guess. Zoom? Enough megapixels and what else? Nothing I suppose.
We've seen some really interesting stuff on HN about tracking thru crowds, reconstructing images from fragments etc. If these folks can do anything like that, they aren't showing it.
It's extremely interesting, for two groups of reasons: (1) creative possibilities (playing with depth of field is my favorite part of photography); (2) market applications.
Regarding (2), if I understand the paper properly, this should allow for a massive increase in lens quality while also allowing for lenses to be much smaller. Both are worth tons of money... together, it's massive. As a guy who carries around a $1900 lens that weighs 2.5 lbs, because of the creative options it gives me that no other lens can, this appeals to me greatly!
Big lenses are always better as they collect more light.
If one uses a 10M pixel sensor in a light field camera, one will have to reduce the resolution of the output image by a factor of 10 in both directions (depending on what microlenses one chooses).
Before I read the paper, I was adamant about "more light is better." But read the paper:
"We show that a linear increase in the resolution of images under each microlens results in a linear increase in the sharpness of the refocused photographs. This property allows us to extend the depth of field of the camera without reducing the aperture, enabling shorter exposures and lower image noise."
You're right that you still need good, small sensors to enable good, small lenses, but my ultimate point is that digital camera sensors scale with advances in silicon. Lens technology is much, much slower to advance. The more of this we can do in software (and thus, silicon) the better.
Related to this and presumably not common knowledge is that there was recently a breakthrough in camera CMOS sensor technology.
A few companies now offer scientific cameras (price tag 10000 USD, for example
http://www.andor.com/neo_scmos) that allow read out at 560MHz and 1 electron per pixel readout noise as opposed to 6 electrons per pixel in the best CCD chips at 10MHz.
This means one can use the CMOS at low light conditions and at an extremely fast frame rate (the above camera delivers 2560 x 2160 at 100fps). You will actually see the poisson noise of the photons.
Unfortunately representatives (the few I spoke with) of those companies don't seem too eager to bring these sensors to mobile phones.
Refocusing the picture after the fact isn't just about being able to focus "right" after the fact, so it's not fair to just compare with "focusing right the first time". With normal cameras, when you focus, you lose information, and it is simply not possible to do what they do in their demos, namely, to capture a continuous range of fields of view. (With several cameras or more than one lens, you can capture multiple discrete fields of view, but not a continuous range like this.) This is only possible because they're capturing 3-dimensional information about where each object is.
Granted their demo isn't impressive, but they're underutilizing their technology, and honestly I can't think of a better demo either, but don't be misled. This light field camera is capturing far more information. Meaningful information. I wonder if it's possible to like, create 3D models of objects in these images? That would probably be more "computational camerawork". What's impressive that that could be done after the fact.
How about macro photography. When you get really close to something your depth of field shrinks. This lead to the invention of focus stacking. If instead of having to take 4 pictures, you can now take only 1, you can capture incredible things.
Mark Levoy (and his group) do all these things.
However, if one wants to track through crowds one needs a much bigger aperture. So one would combine many small cameras into an array as opposed to a microlens array infront of the sensor.