OP here. We address this argument in detail in our paper, and we're deeply skept...

rspeer · on Aug 25, 2016

I agree with the suggestion to de-bias the application and not the representation itself.

Recently I was using a version of Conceptnet Numberbatch (word embeddings built from ConceptNet, word2vec, and GloVe data that perform very well on evaluations) as an input to sentiment analysis. So its input happens to include a crawl of the Web (via GloVe) and things that came to mind as people played word games (via ConceptNet). All of this went into a straightforward support vector regression with AFINN as training data.

You can probably see where this is going. The resulting sentiment classification of words such as "Mexican", "Chinese", and "black" would make Donald Trump blush.

I think the current version is less extreme about it, but there is still an effect to be corrected: it ends up with slightly negative opinions about most words that describe groups of people, especially the more dissimilar they are from the American majority.

So my correction is to add words about groups of people to the training data for the sentiment analyzer, with a lot of weight, saying that their output has to be 0.

jdp23 · on Aug 25, 2016

I'm not convinced by your skepticism about correcting prejudiced bias. Debiasing certainly gives the AI a understanding of the world than the original (biased) language dataset, but it's not necessarily less complete - or less accurate. After all, any one corpus is incomplete, and has biases based on the items that were chosen for it - which are likely to reflect the biases of the past, and of the person choosing the corpus. It may not be a "complete" or "accurate" reflection of today's world - let alone the future. So it's not at all clear to me that efforts to undo the bias will necessarily make it less "accurate".

AnthonyMouse · on Aug 25, 2016

> Debiasing certainly gives the AI a understanding of the world than the original (biased) language dataset, but it's not necessarily less complete - or less accurate.

If you're translating from an ungendered language and have to choose, the only way you're going to get anything sensible is from context and common usage. Which is going to choose "she is a nurse" because an algorithm that can deduce that fathers are most likely male can also deduce that nurses are most likely female. But without that you get bad translations like "she is a father" and "he is a fine ship" and "John is her own person."

yummyfajitas · on Aug 25, 2016

"She is a nurse" is also not a bias. It's a prior and a valid one - the system will be right 93% of the time.

http://work.chron.com/gender-equality-issues-nursing-careers...

A bias would be if it incorrectly weighted "JOHN" and "nurse", and used the feminine for "John the nurse".

jdp23 · on Aug 25, 2016

> "She is a nurse" is also not a bias. It's a prior ...

Assuming that lower-status professions are female and higher-status professions are male ("he is a doctor") when translating ungendered words is indeed a bias.

> the system will be right 93% of the time.

And "this person is a doctor, that person is a nurse" will be right 100% of the time.

yummyfajitas · on Aug 25, 2016

It's a bias in the sense that it accurately reflects a fact you dislike. It's not a bias in the statistical sense, namely something that causes the answer to be wrong systematically in a particular direction. See my other post here discussing the distinction.

It's also not wrong in the sense of generics: https://sites.ualberta.ca/~francisp/papers/GenericsIntro.pdf

The phrase "this person is a doctor" has a different meaning than "she is a doctor" - "she" and "he" refers to (I'm probably messing up the terminology here) contextually implicit person. "This person" does not.

AnthonyMouse · on Aug 25, 2016

> And "this person is a doctor, that person is a nurse" will be right 100% of the time.

Except when it produces "that person is a fine ship" or "John is that person's own person" or equally ridiculous things.

igravious · on Aug 25, 2016

Interesting paper, thanks!

See how you sway the argument in your favour using words with negative connotations like "fairness through blindness" and "hurt meaning and accuracy". Nobody would want to deliberately blind or hurt something, would they? How about rebalance or recalibrate or re-correct.

A concrete analogy:

1) I have a meter measuring stick but I discover that it was made wrong, it is actually 2mm shorter than advertised. Every time I make a measurement with it I have to add 2mm to the measurement. Would it not be better to use a more accurate stick and not have to continually compensate?

im4w1l · on Aug 25, 2016

Stereotypes have great predictive power. The reason we sometimes avoid them is they can lead to outcomes that are seen as undesirable.

pixl97 · on Aug 25, 2016

With your analogy that would assume we know exactly how long a meter is. "We know that we are wrong, but we don't know the exact right answer". Also language shifts and biases are not constants. Oh, then you have the issue of a corpus attempting to manipulate the learning algorithm itself.

foolrush · on Aug 25, 2016

Are there any overlaps or thoughts regarding Sapir Whorf here in your research?

randomwalker · on Aug 25, 2016

Yes! We address this in the section "Implications for understanding human prejudice".

The simplicity and strength of our results suggests a new null hypothesis for explaining origins of prejudicial behavior in humans, namely, the implicit transmission of ingroup/outgroup identity information through language. That is, before providing an explicit or institutional explanation for why individuals make decisions that disadvantage one group with regards to another, one must show that the unjust decision was not a simple outcome of unthinking reproduction of statistical regularities absorbed with language. Similarly, before positing complex models for how prejudicial attitudes perpetuate from one generation to the next or from one group to another, we must check whether simply learning language is sufficient to explain the observed transmission of prejudice. These new null hypotheses are important not because we necessarily expect them to be true in most cases, but because Occam’s razor now requires that we eliminate them, or at least quantify findings about prejudice in comparison to what is explainable from language transmission alone

(The paper has more along these lines.)