3 or 4 people had bits of demo code up on twitter earlier today.
I implemented it myself simply based on the clues in the press release from AMD explaining why they weren't vulnerable. I don't even have a computer security background.
So the vulnerability likely isn't something nobody thought of, it's just that nobody seriously expected the CPU vendors to make the mistake of speculating across multiple loads and actually leaving observable modifications in the caches.
Note that even speculating across multiple loads could lead to observable side-effects by measuring memory bandwidth to differentiate between loads of accessible and silent page fault addresses. [1]
An interesting question is whether the CPU would also speculate on loads from mapped PCI device regions, as that could be also detectable in many different ways.
> Both hardware thread systems (SMT and TMT) expose contention within the execution core. In SMT, the threads effectively compete in real time for access to functional units, the L1 cache, and speculation resources (such as the BTB). This is similar to the real-time sharing that occurs between separate cores, but includes all levels of the architecture. [...] SMT has been exploited in known attacks (Sections 4.2.1 and 4.3.1)
100% agree with the post. AWS and Amazon in general prefers to release bare and unpolished code that just works. To integrate that in your env is awful.