That's one approach - just add a camera, mix the camera image with the graphics, and feed it to a VR headset. Works, but it's more intrusive for the user than the AR enthusiasts want.
The main issue with this approach is that the video pipeline adds dozens of milliseconds of latency, and it becomes awkward to interact with the physical environment. You couldn't play AR pong for example.