Yep, this is achieved using slices, which can be arbitrary regions of the frame....

Yep, this is achieved using slices, which can be arbitrary regions of the frame. Each slice can have its own quantization parameters (ranging from highly lossy to perceptually lossless). Each slice can also switch between intraframe prediction (more like still image encoding) and interframe prediction (relative to prior frames).

So, with this, you can have high-quality static text in one region of the frame while there is lossy motion encoding (e.g. for an animating UI element) in another region of the frame.