Let's consider an example that would be a case of Simpson's Paradox. Suppose you are A/B testing two different landing pages, and you want to know which will make more people become habitual users. You partition on whether the user adds at least one friend in their first 5 minutes on the platform. It might be that landing page A makes people who add a friend in the first 5 minutes more likely to become habitual users, and it also makes people who don't add a friend in the first 5 minutes more likely to become habitual users. But page A makes people less likely to add a friend in the first 5 minutes, and people who add a friend in the first 5 minutes are overwhelmingly more likely to become habitual users than people who don't. So, in this case at least, it seems like the aggregate statistics are most relevant, but the fact that page A is bad mainly because it makes people less likely to add a friend in the first 5 minutes is also very interesting; maybe there is some way of combining A and B to get the good qualities of each and avoid the bad qualities of both