Unfortunately this would only apply to one-on-one low latency video chats. For streaming to an audience, which generally uses a distribution network between the user and the video source to help handle load and geographical distribution, the CDN itself has no influence on video encoding. The CDN would need to jump in and do this back-and-forth negotiation and delivery of lower quality frames, which it is not currently suited for. I'd love to see it come about, but it's not just the codecs we need to look at for adoption beyond point-to-point video calls.
The other major limitation is that forking the encoder state significantly inflates the number of reference buffers you need to keep, which greatly increases memory requirements. That's not much of an issue for software, but it can be a significant problem for hardware (a lot of real-time interactive encoding is still done purely in software, however).