Seems like is preventing data persistence (replace, delete) was chosen over mini...

nine_k · 2024-08-06T23:40:11 1722987611

Not only that. JPEG works best on natural-looking images, with gradients, curves, constant and wide color variation, etc. Computer screens very often show entirety different kinds of images, dominated by few flat colors, small details (like text) and sharp edges. That is, exactly by "high-frequency noise" JPEG is built to throw away.

JPEG either makes "smeared" screenshots or low-compression screenshots. PNG often works better.

A proper video codec mostly sends the small changes between frames (including shifts,like scrolling), and relatively rare key frames. It could give both a better visual quality and better bandwidth usage.

What's interesting in the "screenshot per second" solution is that it can be hacked together from common existing pieces, like imagemagic, netcat, and bash; no need to install anything. (Imagine you've got privilege-limited access to a remote box, and maybe cannot even write to disk! Oh wait...)

kijin · 2024-08-07T03:54:32 1723002872

The problem with the JPEG vs. PNG debate for screenshots, is that screenshots can contain anything from photos to text to UI elements to frames of video.

Just open any website and you'll see text right beside photos, or text against a photographic backdrop, often in the middle of being moved around with hardware-accelerated CSS animations.

I think we need an image container format that can use different compression algorithms for different regions or "layers" of the image, and an encoder that quickly detects how to slice up a screenshot into arbitrary layers. Both should be possible with modern tech. I just hope the resulting format isn't patent-encumbered.

nine_k · 2024-08-07T05:10:02 1723007402

Completely agree. JPEG-only is insufficient. PNG-only is insufficient. An adaptive codec would apply a right algorithm to an area depending on its properties.

I suppose than the more modern video compression algorithms apply such image analysis already, to an extent. I don't know how e.g. VNC or RDP work, but it would be naural for them to have provisions like that co save bandwidth / latency, which is often in a shorter supply than computing power.

Of existing still image codecs, JPEG XL seems to have the right properties[1]: the ability to split image to areas and / or layers, and the ability to encode different areas either with DCT or losslessly. But these are capabilities of the format; I don't know how well existing encoder implementations can use them.

[1]: https://en.wikipedia.org/wiki/JPEG_XL#Technical_details

bblb · 2024-08-07T05:49:23 1723009763

> how RDP work

Uses a combination of different tech [0]. MS-RDPBCGR is at the base of it all, sort of like the main event loop [1]. MS-RDPEGDI looks into the actual drawing commands and optimizes them on the fly [2]. Then there's the MS-RDPEDC for desktop composition optimizations [3]. Also a bunch of other bits and pieces, like MS-RDPRFX which uses lossy compression optimization [4].

In RDP you don't get to play only with the bitmap or image stream data, but the actual interactions that are happening on the screen. You could say for example that the user right clicked a desktop item. Now send and render only the pop-up menu for this, and track and draw the mouse actions inside that "region" only.

[0] https://learn.microsoft.com/en-us/openspecs/windows_protocol... [1] https://learn.microsoft.com/en-us/openspecs/windows_protocol... [2] https://learn.microsoft.com/en-us/openspecs/windows_protocol... [3] https://learn.microsoft.com/en-us/openspecs/windows_protocol... [4] https://learn.microsoft.com/en-us/openspecs/windows_protocol...

m0dest · 2024-08-08T02:33:23 1723084403

The state of the art here is really Parsec, Moonlight, and Apple's "High Performance Screen Sharing" [0]. All three of these use hardware-accelerated HEVC in some UDP encapsulation. Under the right network conditions, they achieve very crisp text, 4K60 4:4:4 with low latency.

[0]: https://support.apple.com/guide/mac-help/screen-sharing-type...

ranger_danger · 2024-08-08T03:13:53 1723086833

Are you suggesting that HEVC can adapt its compression for different regions of the same frame similar to JPEG-XL? I don't think this is possible but I would love to be proven wrong.

m0dest · 2024-08-08T11:24:40 1723116280

Yep, this is achieved using slices, which can be arbitrary regions of the frame. Each slice can have its own quantization parameters (ranging from highly lossy to perceptually lossless). Each slice can also switch between intraframe prediction (more like still image encoding) and interframe prediction (relative to prior frames).

So, with this, you can have high-quality static text in one region of the frame while there is lossy motion encoding (e.g. for an animating UI element) in another region of the frame.

ranger_danger · 2024-08-08T13:17:31 1723123051

Ok that's cool, I didn't know that. Did avc/h264 have this?

kccqzy · 2024-08-08T01:56:58 1723082218

You are reinventing PDF.