Hacker News new | past | comments | ask | show | jobs | submit login
How Races and Religions Match in Online Dating (okcupid.com)
160 points by noodle on Sept 29, 2009 | hide | past | favorite | 94 comments



I liked this on a few different levels:

-Hacking data to reveal the answers to important difficult questions.

-Exploring contentious issues in a neutral & intelligent way.

-Great example of how to attract potential customers by providing valuable information that also gently demonstrates competence related to your service & differentiates your offering from competitors.

I enjoyed reading this, I learnt something, and I thought favourably of the website based on this. No idea how good they are, but that post was a good marketing lesson.


okcupid is pretty much considered the hip version of plenty of fish.


I don't know what that means, but I did think this was a brilliant marketing post from a company, and an interesting read.


it means that plentyoffish has nothing but fat chicks(everyone calls it Plenty of Fat), while okcupid has the hipster girls


I don't think it is really as neutral as it seems. Match percentage is a very preliminary survey of how compatible two people are, and the article itself is fairly judgmental.


One possible explanation for the low Hindu-Hindu match is that there are many websites dedicated to specifically Hindu dating which means that the Hindus using OkCupid are those who are specifically not looking for a Hindu date.

For other religions, such sites aren't nearly as prevalent.


Not really.

The matches are based on answers users give to questions (both site designed and user designed).

When the blogger says they don't match, it means:

  - Person A: I like X

  - Person B: I don't like people who like X
Where X could mean ice cream, god or tooth-brushing. It has nothing to do with people selecting the race or religion of the people they think they want to date.


Unless there are very few Hindus using the site, in which case there could be (in the extreme) a single bad hindu-hindu match that skews everything.

We need error bars.


Yeah, but if Hindu men like ice cream, and I don't want Hindu men, I might not like men who like ice cream either because a) it's what Hindu men do, or b) I don't like people who like ice cream to begin with, and I don't like Hindu men for that reason.

So it's not totally unrelated.


But is there anything you can replace "ice cream" with to make it a workable example? What _real_ questions are you likely to see this kind of Hindu-Hindu antimatching behaviour on?


Arranged-marriage.


Sorry are you serious? Why would a Hindu that is pro-arranged marriage be on a dating website? Arranged marriages were a severe form of keeping the caste system in tact but that's not applicable now. But then again your type of logic can probably explain why Islamic people aren't liked since everyone is anti-terrorism.


s/ice cream/samsara/g


Some Hindu men will say they like ice cream.

Some Hindu men will say they don't.


Well, I'm not sure why I was down-modded. However, I hope I didn't offend anyone. Race/religion can be touchy subjects.


If I had to guess, you got down-modded because you were ignoring the premise of the above statement that being a hindu man might be correlated to liking ice cream. You are free to disagree with that but you need to give a reason.

Of course, ice cream wasn't the point and other people gave better examples.


I was going to post a comment directly disagreeing with this, and citing the large number of Jewish dating sites, but these are really only used by people on the more religious end of the spectrum, so it is possible that the effect is more prevalent in the Hindu-Hindu match calculations than in Jew-Jew match calculations. I would posit, however, that this effect, along with the self-selection bias of using a multi-cultural dating website, is almost certainly the cause for the tendency for higher matches towards non-relgious people.


Given that strongly religious people (of whatever religion) had a harder time matching, removing the most religious Jews would improve the compatibility of the remaining Jews.


JDate and similar sites aren't exactly religious. There are lots of secular Jews who would prefer to date or marry another Jew, for ethnic, cultural, or national reasons.


True, I think part of the confusion stems from the fact that you can have "secular Jews," since the Jewish way of life seems to be a bit more than just about religion (not saying the religion isn't an important part of it). If someone were to say "secular Christian" I'd look at them funny.


... which means that the Hindus using OkCupid are those who are specifically not looking for a Hindu date.

Or maybe they are specifically looking for a non-Hindu date. I mean that they might be looking for a uncompromising relation, maybe easier with a person with different roots.


They left out the axis on the "message reply rate" diagram as a teaser for the next blog post.

I just copy and pasted the image onto the axis. It shows that white and native American males are more likely to get responses than the other races.

http://www.codexon.com/temp/mrr.png

And I wrote a small analysis here if you can be bothered to take a look.

http://www.codexon.com/posts/okcupid-dating-statistics-shows...


Assuming they didn't deliberately shuffle the ordering just to confound people like you.

Your results look depressingly plausible, though, so I'm guessing not.


As a single male Asian I am praying that the first row isn't me. But odds are it is anyway :( It certainly conforms with stereotypes about Asian girls...


You're lucky to be an Asian male... look at the black male line. It's the only one that doesn't have a single green square.

EDIT: Also, look at the response line from black females; they're all green except for the yellow square in response to black males.


[deleted]


Haha are you a THChubber? Yeah, there isn't... it sucks and I've been reduced to chatting on HN. What was your hub name?


Wow, I was a hubber too ('bird').


Jeez. It's a regular party on here. ashen says hi.


I addressed those concerns in a comment on my website.


Nice, thanks for this.

However, I think one critical component is missing (from what OKCupid posted, not from your analysis). Who sends the first message? For example, we are seeing a lot of green in the black female column. Does this mean that:

a) Black females reply back a lot?

b) Lots of men reply to black females?

These two results would mean entirely different things. The first means that black females aren't picky, the second that men like black females.


As an aside, I can't help but wonder if OKCupid intentionally left the reply rate chart unlabeled and unexplained, knowing that some people would post an analysis on their own blogs and generate some viral goodness.

Brilliant strategy if so.


Am I the only one who had a good chuckle at the zodiac grid?


I was laughing before that with this:

> ". . . I point this out now so that, below, when we claim that Jewish women are easier to get along with than Christians, you don’t blame us, you blame Jesus."


Nope, hilarious.

And that's the first really solid evidence that I've seen shared that zodiac signs are utter BS, which makes me doubly grateful to them. Something to point to when someone says it can't be proved otherwise...


The best evidence against astrology I've heard came from my college astronomy professor. Basically, due to the precession of the earth on its axis, the constellations are visible at completely different times than they were when astrology was invented. So even if it wasn't BS back then, it sure is now.

http://www.livescience.com/strangenews/your-astronomical-sig...


Ooh, even more portable, no computer needed. Thanks!


Favorite line of the article: "If religion is a minefield, then race is a field that’s just one giant mine"

The article was not quite as statistically sound as it could be, but at least they aren't trying to pass it off as statistical data.


It seems like recently immigrated groups who are not established and similar disadvantaged groups of people do worse than the established.


I can only imagine what else OkCupid's data would reveal in adept hands. I could probably write a book if they sent me a spreadsheet.


I'm loving this blog. Have we seen match percent vs reply rate anywhere?


I think this is probably too biased, as messages from users with a close match percentage are highlighted in your "inbox". The data might not be good enough to discern between "replied to message because it was highlighted" and "replied to message becuase match % was high".

(Personally, I have a feeling that "pretty profile picture" is more influential than "high match %". But of course, it's very hard to measure what one considers physically attractive.)


[deleted]


Maybe they just tend to be more likely to tell people to f off. A reply isn't always a good reply. I would think you would need to look at conversation rates.


This is hilariously interesting. Data mining can always bring good insights if you have good data.

I wonder, though, (a) if most people search for similar people or for different people, (b) how well does the matching percentage really reflect a "liking probability" (ie, there is always some wishful thinking when designing date profiles, and some of that might be counter-productive, ruling out good matches).


Endogamous marriages in Ashkenazi Jews have resulted in serious genetic disorders and hereditary diseases.

http://en.wikipedia.org/wiki/Ashkenazi_Jews#Specific_disease...


Could you elaborate on how this relates to the article? I'm sure you have an interesting point to make, but I find it difficult to extrapolate.


The article fails to mention statistical significance of race versus religion?

Because this data makes sense, I hesitate to protest. Nevertheless, I'd be interested to see if race is statistically significant versus religion.


So many statistics.

So little statistical significance testing.

This tells me much about this service.


I'm not sure it's valid to infer that their entire service is poor from these series of rather niche blog posts.

(Although saying that, I wonder if this series of theirs is to attract press attention? If so, then it would be nice to see some significance data.)


Brilliant.

Free publicity. Great lesson in PR. Thank you.


While I love the number crunching ethos, and think these guys are giving it one hell of a go - I will go on the record and say that they will ultimately fail because they are operating under many false premises, a few being:

1. People do not change their minds.

By algorithmically matching based on answers to questions, you must assume that those answers are meaningful. And meaningful means CONSISTENT. And people are NOT consistent. Especially on complex issues; in fact there are whole professions (sales (insurance, real estate, car, etc) comes immediately to mind) where success is based on one's ability to get people to change their minds. (That example is offered as evidence, not proof - but proof probably isnt a complicated exercise)

2. Like Thinkers make Great Daters

Assuming you are even GETTING like thinkers, (which is tenuous at best) where is it written that like thinkers make great companions? You don't want to have absolutely nothing in common, (at least a common language is necessary) but it isn't linear and i would be surprised if there were any correlation let alone one that was non-asymptotic. Simply - more like thoughts does not entail more compatibility.

3. Looks aren't EVERYTHING

Just based on sheer observation - this is not the case. It is unfortunate, and it speaks badly about human nature, but most human beings make judgments based on appearances. And this "appearance crutch" is only exacerbated when it comes to looking for someone to lay with. How many single, male, bleeding heart liberals would take a swing at Sarah Palin if given an iota of an opportunity? Again - offered as evidence, not proof.


"where is it written that like thinkers make great companions?"

OKCupid does not judge match percentage based on similar answers. For every question on the system it will ask for your answer and your ideal mate's answer. The system knows that some things are important to some people - but maybe not in reverse.


4. People Know what they want

Clearly they don't - otherwise they wouldn't be on okcupid in the first place.


What do you mean by this?


"By algorithmically matching based on answers to questions, you must assume that those answers are meaningful. And meaningful means CONSISTENT. And people are NOT consistent."

In practice, the user base for OkCupid will change. And old users will reanswer questions sometimes too.

"Assuming you are even GETTING like thinkers, (which is tenuous at best) where is it written that like thinkers make great companions?"

That's not how it works. It's not as simple as "answer match questions and we hook you up with people who answered the same way". You can specify which answers your ideal match would give, even if those answers conflict with the answers you yourself give.

"most human beings make judgments based on appearances"

Which is why you upload photos to OkCupid instead of just using match percentages.


Uploading photos isn't what supposedly sets okcupid apart, right? They aren't using physical attractiveness in their matching algorithms and that is the point I was trying to make.

Your other point has been addressed above; people don't know what they want. Asking them what they want their "ideal match" to answer is as foolhardy as asking them to describe their "ideal mate" neuron by neuron.


Given a photo by itself, or a photo plus output from a matching algorithm, the photo plus output from a matching algorithm gives more information. My point is that judging physical attractiveness is (mostly!) a solved problem on dating sites, so your main competitive advantage is attacking the other problems.

People do not, in fact, know what they want. But they can make educated guesses.


Judging physical attractiveness on dating sites isn't even a partially solved problem - I invite you to name the sites that you have found that are close, as a counterexample.

How does one go about making an EDUCATED guess with NO knowledge?

To put it plainly - statistical information assimilated from meaningless data is also meaningless. I don't know what my perfect mate looks like - and no one does. If you accept that premise, you are then forced to accept that any data gathered from some person A about what their perfect mate P is like is MEANINGLESS. So statistical algorithms from any such information yield meaningless data as well; ergo okcupid cannot do what it hopes to do. QED


> I don't know what my perfect mate looks like - and no one does.

While I agree that one's guess at a perfect partner might not be 100% accurate, are you sure you have no information on this at all? Here are some OkCupid questions. Would you be equally willing to date someone regardless of how they answered these questions?

-Is homosexuality a sin?

-How important is religion/God in your life?

-Are you happy with your life?

-Is interracial marriage a bad idea?


Interestingly, the universal answer set of these questions, from all of the women I've seriously dated, would be full! That is, every possible answer would be in the set.

And I am certain I am not unique. Indeed, half of the enjoyment in getting to know someone is in the mutual effort expended in trying to understand how the other person was formed: thoughts, feelings, desires, etc - and this enjoyment is independent of the actual thoughts, feelings, etc themselves.

As relationships grow, both parties discover the thoughts, feelings, desires, etc. that were perhaps, formed irrationally - and then adjust accordingly. This is called personal discovery - and is evidence enough that most people are too inconsistent and too nearsighted for an approach such as the one okcupid employs.

That said - I wish them the best of luck.


That's why OkCupid gives you aggregate percentages instead of ruling people out or anything. An aggregate percentage says you mostly agree with this person on the things you care about most. It doesn't mean you have perfect harmony, nothing means that, but it gives you an easier starting point.


Really-really interesting.

I'm curious why asian females scored worst of all when asian males scored descent.


The differences on the race graph are so small as to be insignificant.


Keep in mind that the 'Match %' is based on OKCupid.com's algorithms that number crunch against quiz answers that the users answer. This has nothing to do with 'success rate' of users on the site hooking up with other users.


Right, so compare that with the message reply rate chart at the bottom of the article. Assuming the axes haven't changed from other charts on the page (which IMHO is a pretty good assumption)...

It suggests that certain races have good matches with other races, but horrible reply rates. Reply rate may be the best metric okcupid has as to "success rate". It also suggests strong racial biases on the part of its users.


To elaborate more on your point:

Many people claim they aren't racist when they refuse to date other races. They always say they haven't found the right <insert minority race here> guy/girl.

What I've found from personal experience, is that the bar is set much higher for those minority races.


You can't attribute lack of attraction to racism.

While it's possible for people to not date anyone of a certain race because they are bigoted against that race, I think that it's more common for people to just not feel a strong physical attraction towards certain races. My theory on this is that people that haven't grown up (at least during their childhood and/or teenage years) with many people of a certain race won't end up finding most of the 'feature' characteristics of that race to be attractive (i.e. if you grow up in a mostly white neighborhood you might not find black women attractive).

I could be completely off-base here. I have no idea how the development of attraction in the brain works (and I'm sure there's probably much research into this area of psychology). But I tend to find that a lot of people (whether they admit to it or not) have a preference for certain races and not others. Sometimes people don't have a preference for their own race. I know Asian women that are not attracted at all to Asian men, but that doesn't stop them from having Asian men as friends and interacting with them. I would hardly think that they are bigoted against Asians (seeing as they are Asian themselves).

> What I've found from personal experience, is that the bar is set much higher for those minority races.

It's not necessarily minority races. There are plenty of people that are not attracted to others of their own race. 'Yellow-fever,' 'jungle-fever,' etc weren't created for 'one-off' instances (i.e. the person accused of having 'the fever' only dated a person of that race once, and doesn't have an affinity to dating persons of that race) of inter-racial dating/mating/marriage.


What I've found from personal experience, is that the bar is set much higher for those minority races.

Depends what you mean by that. Certain physical traits are correlated to race. To the extent that these traits matter to someone, one would need to set the bar higher on other traits to offset this. I hesitate to give examples since that could anger some people...


Come to think of it, the differences on the race graph are amazingly small, given how strongly race is correlated with certain religions -- (e.g. middle-eastern people are much more likely to be Muslim, and almost all Hindus are Indian.)


I think you're confusing P(member of particular race | member of particular religion) with P(member of particular religion | member of particular race).


Hmm, nope, don't think so. I picked my words pretty carefully.


The reason is because at that point they already decided to interact with people of a different race.

If you want a more interesting result, take a look at my other comment for a diagram that shows response rates between different races. It illustrates a greater schism that one would come to expect.


Nope; the differences on the zodiac graph are.

Differences on the race graph want to tell us something. Not necessarily something relevant, but something, anyway.


Bullshit, us pastaferians are not represented in the religious statistics! This is heracy!

On the other hand I am glad I'm not Aquarius because statistically speaking they are insignificantly less compatible with an Aquarius. His noodelieness has blessed me well!

RAmen!


go jews!


The atheists and agnostics beat the jews, however.


actually because of the response of muslim women to jews being more positive than to ath/agn, I think Jewish men overall come out on top.

Combine that with Jewish men who ARE atheists, which is a good number of em, and you've got a super-match. Which all goes to prove what my mother always told me; "I love you."


Non-hacking, race & religion in one article, that's got to be a record for flamebait.


I'd argue that it's quite a hack -- Trying to discern an algorithmic approach to attraction via a large data set, and create a recommendation engine to a very fuzzy problem. =)


I'd argue that it's not at all a "hack" (I really dislike HN's tendency to equate everything to a hack), but nonetheless of significant interest to those of us interested in statistical trends in large web systems and social networks.

This kind of statistical kung-fu is well worth a look even if your own interests do not lie in dating sites.


Yes, but these are the results of the hack, not the hack itself. OkCupid has a nice thing going but this is not about their software, this is about trends they've picked out of their database of dating profiles.

And it's not like we haven't been here before:

http://news.ycombinator.com/item?id=822782

http://news.ycombinator.com/item?id=803603

http://news.ycombinator.com/item?id=66129


It's not flamebait at all. The data is all about people's responses to generic questions about what they're looking for in a date. They don't say "I don't want to date Muslims" and have that reflected in match scores. The match score is based on things like "I hate people who squeeze the toothpaste from the middle."

It's quite interesting.


To put data based on 'race & religion' out in the open to me is already a thing that is not excusable (besides the lousy definition of such things as race and religion, both of which are floating, not discrete).

There is a massive pre-selection problem here (people that frequent dating sites, rather than people in general) and the link to 'hacking' is a very tenuous one.

I've seen Michael Jackson referred to as a hacker here, so I probably shouldn't be too surprised.

This is not a hack, this is some statistical analysis with input data of dubious value, pretty pictures and all, drawing conclusions that are completely off the wall.

"Jews and Agnostics get along better with people"

"Muslims of both sexes and Hindu men get along worse"

"Catholics are more universally liked than Protestants"

Please, if that isn't flamebait I really don't know what is.

Where are the control groups, where are the standard deviations and so on.

Effectively this is a dating advert for programmers.

This is infotainment at it's best, statistical noise at its worst.


I'll agree that it's not a hack - that doesn't, in my mind, preclude it from being hacking. They're two different things. For me, anyway, playing with giant data sets does not fall into the first group, but does fall into the second.

That said, "is it hacking or not" isn't really a terribly interesting meta-discussion to have, imho, so I'll try and go in a different direction.

Please, if that isn't flamebait I really don't know what is.

The data appears to support the assertions you are calling flamebait. Provided their algorithm for calculating likeability actually works, then the data does support the assertions[1]. (This is all, of course, within the context of the pre-selection you noted in your post).

[1] Note that the assertions are of the form "Members of Set X are more likeable to members of Set Y than members of Set Z". If the assertions were "Being a member of Set X makes you more likeable to members of Set Y", they wouldn't necessarily be supported by the data.


If you're going to make this kind of generalizing statement about large groups of people I simply think that you have the responsibility to do it properly, so with control groups and so on, or not at all, otherwise the results are totally meaningless.

The way it is presented right now is as if simply the size of OkCupid and their data gathering methods give them the license to make this kind of claim.

It's 'interesting' but not 'rigorous'.

The three I listed above are particularly galling, I really don't think any computing algorithm can give someone license to make the statements listed above (and as an atheist I have no dog in that fight), but without proper methods it's even worse.


I think that you're assigning more meaning to the words "like" and "get along with" than the authors intend. Pretty much everywhere they use one of those phrases it should be qualified like "X _say they_ like Y". But that gets tedious, and I think the authors are also making the assumption that you were paying attention when they explained where these numbers come from.

Note that the end of the article is leading directly into the objections that people keep making along the lines of "this doesn't mean that these people really get along in real life", and the conflict between what people say they are looking for and the choices they actually make about who to contact and respond to.


> It's 'interesting' but not 'rigorous'.

Of course it's not rigorous. To some degree people interacting with OKCupid's site are a self-selecting bunch. You really need to take this at face value. This is just a blog post showing some number crunching on their site. There are no sensationalist headings like "Muslim Males the Most Hated Group of People." Keep in mind that this is also not a scientific journal, not peer-reviewed research... nor is it claiming to be.


It's way past my bedtime here, and judging by the moderation I'm not able to make my point, which is simply this:

If you are going to be making sweeping statements about people, even including race and religion then you really should do your homework, or if you're doing it out of curiosity, keep the results to yourself. By presenting the data in a format that looks as though very hard work went in to its creation and by hammering home the reliability of that data you are creating the illusion of something that is scientifically solid when in fact it isn't.


I disagree, the data was very interesting and it would be pity if they didn't release it.


I agree, I'm glad to get the chance to read about this stuff, but I think the PP's point was that if you're going to present a bunch of data in that manner, it's best to say, up front, something like, "our sample, while extensive within our service, is not necessarily representative of particular ethnic groups," and point out something akin to what slashdot says about their polls: you're insane if you intend to do anything serious with this data.


You're right. It's infotainment. So? Are you incapable of taking anything at face value? They aren't making claims about how the world works; they're making claims about how people interact on their site. If that's not interesting to you, flag it and move on.


The flaw here is that how people interact on their site has very little bearing on how those same people will interact in real life.

The plural of data isn't evidence, even for large amounts of data.

Sure you can do interesting statistical analysis, but the real action is after two people have found each other, and that's where the huge flaw is in all this analysis, the statements are about interactions in real life, the data is gathered online.

Any kind of statement about people being more or less likable would have to be taken out of the context of the website and into real life, without that statements as listed above are unsupportable.


The flaw here is that how people interact on their site has very little bearing on how those same people will interact in real life.

I still find interesting to know how they interact online.


Gotta agree with you on this one. It's amusing to look at the plots and note the bits that agree (or disagree) with your prejudices, but without some information about the variance this is nothing more than entertainment.


They need to make it a little more clear that they're not trying to say anything bad about a particular group.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: