It's exactly Harvard. Instructions can only be loaded from the I cache, and data operands from the D cache. If you JIT something you have to flush the relevant D cache entries and invalidate the relevant I cache and then it will get reloaded.
Yes. Linear address spaces are an abstraction to hide this, because everything is in pages (minimum 4k on most machines, up to huge page sizes), and it is the pages that are controlled in terms of W^X.
In the era of ROP and gadgets (control flow being determined by data, to implement strange virtual machine and interpreters) it seems somewhat quaint, but it has made exploits a lot more complicated. The mixing of JMP/RET addresses and stack data is why stack overflow and ROP is so easy; CFG, CET and shadow stacks are all trying to achieve separate I and D stacks.
Arduinos and the like can jump to ram and execute code from it. They simply also have a read-only portion of memory where the code is stored. You can also treat the ROM as memory and use it to store tables, saving you from having to use RAM for them.
https://en.wikipedia.org/wiki/Harvard_architecture
Perhaps most familiar to the typical HN reader via Arduinos, and contrasted with von Neumann:
https://en.wikipedia.org/wiki/Von_Neumann_architecture
And anyone who has tried to make any kind of interesting general-purpose system on a Harvard design will tell you that it's not really practical.