Some instruction sets are variable length, there are also things like thumb and ...

Some instruction sets are variable length, there are also things like thumb and the 'compressed' riscv extension.

If you compress each instruction one at a time into a variable number of bits the ability to jump to any instruction is preserved but compression is hurt by not being able to use cross-instruction conditional probabilities and having to pack things into integer numbers of bits.

One could imagine a compression format that had 'jump points' -- compiler selection locations where it was possible to jump and decode from, so that you'd only take the above losses at potential jump targets.

You could go further an imagine the instruction set having some kind of constrained short jump forward a limited distance (by simply letting the decoder decode that way) or that can go back a limited distance without using a jump point so long as there was no way for controlflow to change out from under it.

I wonder what percentage of instructions would need to be jump points in such a system?

But I think this is mostly of academic interest: except in very small embedded devices code size is not a huge performance driver and like anything else you get diminishing returns from further optimizations. Variable length multiple-of-bytes instruction sets like x86 probably get a significant fraction of the potential compression gains and do so without a lot of complexity.