Surely some of the work should be possible to re-use? I mean for most pixels beyond a certain depth the incident eye vector direction will be identical for all practical purposes, so if one could just fudge it and use the same calculated pixel color for both eyes and just offset it slightly then it should be usable without having to be calculated twice. No one would notice if the reflections or specular lobe for the right eye were calculated with the indicent camera of the left eye.
Once you have calculated the pixels for the left eye, those should be possible to re-use for the right eye, with some mapping. Certain pixels that are only visible to the right eye will have to be computed. I'm not sure if it's possible or if it even has a chance to be a performance gain (or indeed if this is actually how it already works). Doing the full job of 2x4k pixels for two eyes when they are a) almost identical for objects beyond a certain distance and b) quality is almost irrelevant for most pixels where the user isn't looking.
With foveal rendering and some shortcuts it should be possible to go faster with a 2x4k VR setup than for a regular 4k screen when you need to render every pixel perfectly because you don't know what's important/where the user is looking. Obviously one needs working eye tracking etc. first too...
I agree with you. There's no reason to run every pixel shader twice in full.
It seems logical that each surface/polygon could be rendered once, for the eye that can see the most of it (a left facing surface for the left eye, a right facing surface for the right eye), then squashed to fit the correct view for the other eye. Then, fill in all the blanks. Of course, the real algorithm would be more complicated than this, but it seems like at least some rendering could be saved this way.
Technically the lighting won't be right, but you don't have to use it for every polygon, and real-time 3D rendering is already all about making it 'good enough' to trick the human visual system, not to be mathematically accurate. If technically-accurate was what we insisted on, games would be 100x100 pixels at 15FPS as we'd insist on using photon mapping.