Hacker News new | past | comments | ask | show | jobs | submit login

I think the idea of ML as “unaccountable black boxes” is a bit of a bait-and-switch from the problem being described. The problem is that ML has no political biases, it just minimizes some loss function on the data you give it. So it can’t correct for implicit biases in how that data was collected, or how the model’s outputs are used.

If you fitted a decision tree to predict recidivism risk, it would be extremely easy to interpret. But if black men are rearrested more often in your dataset, then black men will likely have a higher predicted risk on average—no matter the causes of that feature of your dataset.




Your example demonstrates a political bias in a dataset that can lead to a biased ML model. This is what some people mean when they say ML can be biased.


If I slap my friend Tom once a day, train an ML model to detect which of my friends is going to be slapped next, and then find out it always predicts Tom, the model isn't biased against Tom: the model is correctly showing me my own anti-Tom bias.

I don't get to stand there and blame the model when I'm the one doing the slapping.


For us, technologists, yes, the distinction between "AI bias", and the bias in the data is clear. The point however, is when it comes to the general public, "AI" is the whole thing, and actually the public has absolutely no saying (perhaps even no knowledge) about the data; nevertheless, technocrats will argue that "data doesn't lie".

Edit: the auto correct had written "data doesn't like"


It's not just biased data, though, it's an objective function optimising a biased metric.

We've picked a metric, recidivism rate, that is believed to be inherently biased because cops arrest a lot of protected minorities. The model has correctly predicted that cops will arrest a lot of protected minorities. The general public has then turned around and shot the messenger rather than hold cops accountable for all that arresting they're doing.


Technologists shouldn’t try to dumb things down for the general public; we should try to state as clearly as possible where the problem lies and how it might be mitigated. In this case, we need to make it clear that what’s called “AI” is just a new kind of statistical tool, and like all statistical tools it’s only as good as the data it’s provided and the human interpreters of its outputs.

Ironically, I think “conservative” is both an excellent descriptor of function approximators—they tend to conserve whatever bias they’re provided with—and a terrible word to use for popular writing, since it’s so easily confused with political conservatism (even though e.g. no politically conservative “AI” would autosuggest “on my face” as a completion of “can you sit”).




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: