It's clocked at a higher frequency, and the video codecs seem to be built in software that runs on part of the GPU that isn't directly user-accessible (early builds of the firmware supported fewer codecs). So, the question would be if it is hardware limitation, or if it's just a case that hasn't been implemented in the firmware blob yet. Maybe it wasn't implemented before because the hardware wasn't fast enough to do it at the lower clock.
That may be. That portion of the chip is completely closed (no public specs or compiler), so I'm just speculating based on inputs and outputs to a blackbox system.
I do know that I've seen people trick the thing (VC4 on Pi2) into pushing out 4K video at about 20fps and decoding 3 h.264 videos at once. Playing a 4th video caused a lot of visual artifacts. So it has potential for more than it's usually used for. There could be a low-level assumption of 8 bits per sample, or some other constraint that we don't know about. Or it could be as simple as Broadcom not being interested in marketing the chip toward higher-end uses like the 10bit profiles. Without signing an NDA and ordering 6-7 digits worth of chips, I think we can only speculate.