Handling two different writes to memory is not really a concern - existing speculative/out of order processors already solve this issue by completing (perform architectural side effects) instructions in-order. So even if two writes are made, one in each branch, by the time the write is meant to be completed, the prior branch is resolved and we know which write is actually meant to be made and the bad one can be discarded.
Doubling the execution units also isn't strictly needed - you can use the existing out-of-order core to send two sets of instructions through the same functional units. There will be more contention for the resources, possibly causing stalls, but you don't need to fully double everything.
Things similar to this idea are already done in processors - simultaneous multithreading, early branch resolution, conditional instructions, are all ideas that have similar implementation difficulties. So the reason this specific idea is not done is more in line with your last two points rather than the first two.
Doubling the execution units also isn't strictly needed - you can use the existing out-of-order core to send two sets of instructions through the same functional units. There will be more contention for the resources, possibly causing stalls, but you don't need to fully double everything.
Things similar to this idea are already done in processors - simultaneous multithreading, early branch resolution, conditional instructions, are all ideas that have similar implementation difficulties. So the reason this specific idea is not done is more in line with your last two points rather than the first two.