Much more limited capability wise, but much tighter guarantees about latency. More like a 80Mhz AVR 8bit than an ARM coprocessor.
The real shame of modern microcontrollers is the decoupling of peripherals and GPIO from the main processor. It prevents these sorts of hacks as all access has to be through the memory bus, effectively capping bitbanging at sub 10MHz speeds.
By that I mean how IO is coupled to the processor. In AVR8 (and other older processors I presume) GPIO access was via a CPU register, thus a single instruction (one clock RISC) can directly modify the GPIO state.
Every GPIO implementation I’ve seen on modern processors accesses it via a memory mapped peripheral. The difference being accessing a memory bus is not a single cycle operation. You have to wait for the bus to be free, then wait to fetch or write the data.
The most extreme analogous example of this is modern cpus are effectively infinitely fast but bounded by cache misses that necessitate memory access.
This is fundamentally why every toggle a GPIO pin benchmark is flawed. What is really being measured is memory bus latency.
This misunderstanding is why people have trouble reconciling why a multi ghz processor cannot also bitbang GPIO at ghz speeds, although if such a processor existed it would be amazing.
You could compare this to a GPU (special purpose coprocessor for vector calculation) or a very complicated IO peripheral you are configuring.
An FPGA allows you to build up actual hardware gates and wire them together. For pio you configure existing hardware.
There are some similarities in the domain you're using them, though. FPGA are often used for timing critical IO as well.