Don't think there is much rationale behind all these models. It's more like P(would buy a vacuum | bought a vacuum) > P(would buy X | bought a vacuum) where X is a single product. Now P(would buy a vacuum | bought a vacuum) < sum(P (would buy X | bought a vacuum)) for X that is not a vacuum, but what would be the recommendation? Hey, you bought a vacuum, come back and buy some non-vacuum stuff?
For most recommendation UIs, you would need a hero item that make people want to click on. It might turn out that another vacuum is probably the best item for some people to click on, and go on to buy other stuff once they are on the site.
The reason you see such an obvious false positive in this case isn't because people who bought vacuums are likely to buy another, but rather that people who look at vacuums are likely to buy a vacuum, and the model hasn't accounted for whether you've already bought one.
A different recommended might use different types of conditionals (items bought instead of items looked at, for example), and also have success in different areas (like recommending iPhone cases for iPhone owners). In order to converge the models in a Bayesian framework you'd have to deal with the combinatorial explosion of products and event conditionals which might be pretty gnarly. But some convergence work would be better than none, otherwise you end up with 20 different recommender widgets on a page.
Overall I don't think amazon's approach to date has been bad...it's just time to clean up a bit.
For most recommendation UIs, you would need a hero item that make people want to click on. It might turn out that another vacuum is probably the best item for some people to click on, and go on to buy other stuff once they are on the site.