I'm not sure what the performance/persistence implications of this (FRAM) actually are...
But to your point, simply copying the processor state to a known location in FRAM (0xFFFFFFF0) and having the start routine read state from that location seem like a very low overhead solution to the problem.
How long would it really take to do something your computer does as part of preemptive multi-tasking? Nanoseconds? Milliseconds? We are talking about $order(hundred) of instructions
This may work on something with the complexity of a microcontroller or SoC, but becomes tricky beyond that. Any peripherals, especially removable ones would still need to be rediscovered and reinitialized on boot. Network connections may have died/reset while the system was off.
Essentially, this scheme has all the major complications of resuming from sleep/hibernation in practice.