You are right on the loss: it's purposefully introduced as a quantization step after performing the DCT, and before losslessly compressing the resulting coefficients with Huffman and encoding to the final JPEG bitstream.
Despite all of that, JPEG has now become computationally tractable. I remember the days where it took tens of seconds to encode a JPEG on a commodity machine. Now, with the help of SIMD, we can encode a high quality image in msec on a mobile device.
Fortunately you can choose the quantization matrix that determines the amount of loss. Even if you were to choose a unitary matrix, no human, not even superman with his laser eyes, can "detect" the quantization noise.
For SnappyCam, I chose to invest in JPEG a little more because it's a ubiquitous standard for still image compression.... and with the right hardware and algorithms, quite tractable.
I'll consider adding a JPEG "quality setting" so you can choose the amount of loss introduced... sounds like a great idea to me.
The idea behind SnappyCam was also to code each picture independently, and not rely on motion prediction or video codecs. If you try and pull a single frame from a HD video you might be disappointed: they compress the YUV dynamic range (studio swing) and it looks washed out, even if you land on an i-frame.
Lastly, as far as I can tell, the image sensor is yielding complete scans with each frame. I'd hazard a guess to say that any motion prediction or frame deltas might actually slow the whole chain down.
Man, I hope I'll know as much as you do one day... Congrats on the app! I just bought it and I'm really liking it! One suggestion; would outputting to a gif file be too difficult? I could really see people using this to create gifs of their lives that they could share to Tumblr or Facebook.
Thanks! :-) To be honest, none of my formal training prepared me for what went into SnappyCam---only the "how" to go about learning to do so. I'm sure you could pick up a few tricks in the same way by working on a cool pet project or two.
With so many recent requests for AGIF, I'm absolutely going to add it to the app. (It's been on my list for a while, but lower priority than getting up a solid core product and the social sharing that is in development at the moment.)
I'm Aussie, so I did my schooling in Melbourne, Australia.
I don't recommend the same path for everyone, but my ugrad at RMIT University, dual bachelors of EE and CS. I then went on to do a Ph.D at the University of Melbourne in EE. My dissertation was on mathematical optimization of wireless and wireline DSL. (Prof Jamie Evans is an awesome guy if you're on the lookout for an advisor!)
I've been in SFO for just over 5.5 years, and started SnappyLabs after winning the "greencard lottery".
SnappyCam, you might say, is the embodiment of both a very practical ugrad and somewhat applied but very theoretical pgrad.
I run a two-person web development business and have worked out of Adelaide for myself for 15 years. Always seems to be enough work around and a decent amount of variety.
The other SnappyCam purchaser I mentioned is an iOS/Android developer who is a sub-tenant in my office. We actually have a cheap, spare desk in this room at the moment if you wanted to visit and work for a while. Email me if you want to ask any questions.
Adelaide is good. Plenty of gov & defence work if that is what you are after, but I've worked here nearly 15 years and never done either (ok, my current place is quasi-government, but still..)
Motion estimation/prediction is used video coding because it minimises the compressed size. However, it is incredibly expensive to perform the motion search. A typical video encoder spends well over half its CPU time in this stage. After motion estimation, the residual image is encoded in the usual way, so speed-wise the motion search is a complete loss.
One thing I've gotten very into recently is multishot techniques, where I take multiple shots in burst mode, align them, and then average them to reduce sensor noise. Similar to what http://www.photoacute.com/ does but they do more advanced superresolution stuff that your invisible noise might preclude, but a simple average or median in areas where there isn't too much motion really improves the quality quite dramatically in some cases, particularily in low lighting situations. If your frame rate is that high it probably wouldn't be hard to get a really good alignment between frames, so I thought I'd bring this up as a thought for a future feature (as in - you select the frame you like, then the program grabs a few frames immediately before and after and uses them to increase the image quality of the final output).
I first tried using the GPU, using old school GPGPU textures and OpenGL ES 2.0 shaders, but unfortunately the performance wasn't there for a variety of reasons given in [1].
SnappyCam has since been making extensive use of ARM NEON for the JPEG codec and a bunch of other image signal processing operations, like digital zoom. It's a great instruction set!
Just curious, I know nothing about low level ARM stuff. I was wondering is this iPhone/Apple specific tech or is the work you've done portable to other mobile platforms? Congrats on what you've done. I couldn't quite work out whether you've optimized the hell out of the standard DCT algorithms or whether you've come up with new algorithms. If it's the latter would you be able to publish them or would that give away too much secret sauce? ;-)
No, NEON is an ARM-specific tech, and is widely available and used e.g. on Android smartphones.
It's like MMX / SSE of x86 world, a set of extra instructions to process many small integers in parallel in one instruction. Since image data are usually independent 3-byte pixels (or 3 planes of 1-byte subpixels, one per color channel), NEON is great for many image-processing tasks.
Re video's "studio swing" dynamic range, the YUV components do have a different encoding range to those in JPEG, but if you expand them back out to 0-255 the image is in fact the same - you lose a lil' fraction of your bit-depth but no dynamic range.
I think you definitely made the right choice though - it's interesting that the obvious delta-coding and motion compensation tricks to reduce bandwidth are rarely used for video acquisition apart from the most limited devices like phones, stills cameras and the GoPro. Everything that can afford to uses per-frame coding like ProRes, REDCODE, AVC-Intra, DNxHD, Cineform... being able to seek quickly is important!
In fact Canon's 1DC 4k camera uses (dun dun duhhhhh...) motion JPEG :)
It's just bizarre that you would be doing the complete JPEG process at the instant you get the image from the sensor. As you note, there are a plethora of steps that JPEG performs, from color space conversion, to DCT transformation (essentially a gigantic matrix multiplication), Huffman coding, quantization, arithmetic coding and encoding as JPEG bitstream.
The only reason would be that you are pressed for memory or bandwidth, but certainly you have the resources to store one full frame and produce deltas, or just apply part of the JPEG chain, enough to remedy memory pressure. You can always encode it to an actual JPEG after the process.
And yes, pulling single frames from a completely encoded video isn't helpful, because they can get away with more compression. But there are very sophisticated algorithms for eliminating the redundancy between frames, which would have been my first avenue in attempting to do something like this.
which would have been my first avenue in attempting to do something like this.
Have you attempted to do something like this? Because he not only has attempted, he's done it. Therefore I think you should stop talking down to him ("completely pointless", "it's just bizarre", "my first avenue"). It comes across as wanting to prove how smart you are instead of seeking to learn from someone who has done incredible work and—lucky for us—is bursting at the seams with enthusiasm to share it.
Oh and congratulations jpap on what's looking like the most successful and technically solidest HN launch in quite some time! I hope your hard work pays off.
Why is this comment getting downvoted? I see that it adds insight to the subject and makes some good points, which OP even acknowledges. The written complaint is that revelation is not being deferential enough, which is bullshit on an in-depth technical discussion.
There's a difference between "not being deferential" (which I don't think the complaint is) and talking down to someone who's clearly built something very cool. Put another way, there's a difference between dismissing someone's approach and asking questions to try to understand it better.
"Deferential" has nothing to do with it. In a technical discussion, the focus should be purely on the content. Inserting one's self into it (such as by being supercilious) detracts from that.
Despite all of that, JPEG has now become computationally tractable. I remember the days where it took tens of seconds to encode a JPEG on a commodity machine. Now, with the help of SIMD, we can encode a high quality image in msec on a mobile device.
Fortunately you can choose the quantization matrix that determines the amount of loss. Even if you were to choose a unitary matrix, no human, not even superman with his laser eyes, can "detect" the quantization noise.
For SnappyCam, I chose to invest in JPEG a little more because it's a ubiquitous standard for still image compression.... and with the right hardware and algorithms, quite tractable.
I'll consider adding a JPEG "quality setting" so you can choose the amount of loss introduced... sounds like a great idea to me.
The idea behind SnappyCam was also to code each picture independently, and not rely on motion prediction or video codecs. If you try and pull a single frame from a HD video you might be disappointed: they compress the YUV dynamic range (studio swing) and it looks washed out, even if you land on an i-frame.
Lastly, as far as I can tell, the image sensor is yielding complete scans with each frame. I'd hazard a guess to say that any motion prediction or frame deltas might actually slow the whole chain down.