If we do eye tracking we can probably lower that to 1024x786 equivalent rendering, by using high resolution where the eye is looking and tapering off to just a blurry mess further away. You can even completely leave out the pixels at the optic nerve blind spot. The person with the headset won't be able to tell they aren't getting full 4k or even higher resolution. And we can run better effects, more anti-aliasing, maybe even raytracing in real-time.
If this is the nvidia/smi research you are referring too, well it seems nice but without details, specifically dynamic performance, and there is reason to be sceptical of how good it is.
The field of view of current consumer HMDs is too narrow for there to be a big saving compared to the downside. As you move to larger FOV displays the brain will start doing more saccades (rapid step changes in viewpoint[1]) and the response time of the image generator and eye tracker is too slow to generate more pixels at the right spot. It's much more effective to just render the whole thing at the maximum possible resolution. There has been promising research on rendering at a reduced update rate or reduced geometry in low interest areas of the scene[2].