The major thing that MPI did right, and that almost all other models have done w...

The major thing that MPI did right, and that almost all other models have done wrong, is library support. Things like attribute caching on communicators are essential to me as a parallel library developer, but look superfluous in the simple examples and for most applications.

The other thing that is increasingly important in the multicore CPU space is memory locality. It's vastly more common to be limited by memory bandwidth and latency than by the execution unit. When we start analyzing approaches with a parallel complexity model based on memory movement instead of flops, the separate address space in the MPI model doesn't look so bad. The main thing that it doesn't support is cooperative cache sharing (e.g. weakly synchronized using buddy prefetch), which is becoming especially important as we get multiple threads per core.

As for fault tolerance, the MPI forum was not happy with any of the deeper proposals for MPI-3. They recognize that it's an important issue and many people think it will be a large enough change that the next standard will be MPI-4. From my perspective, the main thing I want is a partial checkpointing system by which I can perform partial restart and reattach communicators. Everything else can be handled by other libraries. My colleagues in the MPI-FT working group expect something like this to be supported in the next round, likely with preliminary implementations in the next couple years. For now, there is MPIX_Comm_group_failed(), MPIX_Comm_remote_group_failed(), and MPIX_Comm_reenable_anysource().