Simply, and without too many details: the load balancer failed to work properly when one server in the cluster stopped responding, causing a cascade of errors which successively crashed all of the other servers (including, interestingly, the RDS database server -- which even Amazon was unable to explain).