Suppose you think the average article in The Economist is terrible. One might then conclude that it's not worth reading in general. But HN acts as a recommendation, and that might be a strong enough signal to interest you in reading this particular article.
But The Economist has an unfriendly layout or puts up a paywall or blares ads at you, so you flip into reader mode as soon as you see the article. You retain an impression that The Economist is terrible, while still reading this article.
There's commentary abiut incidentally discriminating based on race (e.g. zip code aa an input can act as a proxy for race).
Would giving out more loans than is rational by excluding stuff like zip codes be a good thing? Wouldn't that lead to more defualts among those groups of people zip codes can discriminate against?
This brings up a bit of a "trolley problem" of an ethical dilemma. Say you're asked to create a machine learning system, but you know that the data quality is so poor that you're very likely to overfit in a way that will deny economic opportunities to underserved communities that currently have them. But if you don't create that system, you're denying economic opportunities to other underserved communities that currently do NOT have them. Do you take the job? Moreover, if you'll do less harm than someone else who might be hired, does that make a difference?
There's no Hippocratic oath for our profession, and in many ways that's important, because we create systems whose impact may very well outlive us and out-scale anything that a single medical professional could do. But that also doesn't mean we should operate in a utilitarian environment without constraints.
Sadly there is no profession for our profession. I often think the model for any software profession (if we can create that - something I doubt) is railway engineer - where the professional signs off on the safety / completeness of work done on the railway - and that no train can travel
without it. It leads to plenty of uncompetitive practises - but also to ... y'know ... people not dying in crashes.
How we start that is hard (probably something to do with safety critical software systems) because we aren't too sure what is the right way to build software.
And then we have the fun problem of the members of the profession trying to decide the answer to your trolley problem. Sorry scratch that. The various legislatures proscribing the answer and the profession trying to implement the conflicting results !
Two ways where these systems may give out more loans than is strictly profitable, and they are both investments:
- Fairness. If you have a variable race and a zip code, you could account for discrimination via redundant encodings, while still using the feature for the optimal trade-off between a fairness criteria and model performance.
- Exploration. Concept drift (the correlational and causal meaning of variables shifts over time) can introduce wrong predictions. If all you have is few samples from a zip code, the model will always be uncertain. You can counter this by exploration and active learning: gather samples, not because this makes the model max-profit, but gather samples, to better learn how to predict these samples in the future.
But yes, giving out too many loans to minorities, may very well lead to further crisis and defaults, and tainting the credit scores of people will low access to finances even further. A bit like how well-meaning people donate money and food to Africa during Christmas time, then a few months later when donations subside, there are increases in famines. There is such a thing as "being too good".
Maybe the idea is that banks should have an incentive to find some better data that isn't a proxy for race? That is, getting closer to the ideal of judging people as individuals instead of as a members of prohibited groups.
I think that's the intent, but it seems to only have incentivized discovering another proxy for race. For example, I recently watched an infosec talk 'hacking your credit score.'[0] Where the presenter mentioned that Fair Isaac (a reporting agency mentioned in the article) has a parameter in their algorithm called 'HMA' (High Minority Area), that he found in an internal presentation. I think a solution would be any parameter that remotely gives a correlate for race should be entirely eliminated from scoring systems.
Another startling insight from that talk was how normal services (i.e. home utilities, insurance, etc) will do credit inquiries and set a customer's rate based on their score. Which results in people with low credit scores pay more for services that are traditionally unrelated to borrowing. I'm worried that it could lead to a positive feedback loop that heavily affects those with poor credit scores in the long term. For this, it seems like a limitation on the kind of business relationships allowed to perform credit inquiries should be implemented.
> I think a solution would be any parameter that remotely gives a correlate for race should be entirely eliminated from scoring systems.
The problem is, that's every parameter. The outcome you're trying to predict (will borrower repay debt) correlates with race, so other things that correlate with that outcome also correlate with race.
What you really need to do is to include as many non-race factors as you can, to give people with a couple of negative factors (who are often black or hispanic) more chances to redeem themselves with the other ones, to get the false positive rate down as much as you can.
Because the false positives (and false negatives) are really the problem. The true positives are... well, true.
I don't think 'could lead to' is correct, because that implies this can only happen in a structured, intentional way.
It's an epiphenomenon of the interaction between this information and the normal workings of society on so many levels, and that feedback loop has been in place for a LONG time and is only getting stronger, now with real intentionality.
I often think the most effective means to increase loan repayment rates is to provide the debtor with a effective accurate money management tool - sort of like Mint but better. I am supposedly a well educated intelligent software engineer and yet trying to get a single unified view of what I spent is outrageously challenging or requires discipline at a level of dieting.
Apart from fraudsters, people who take out a loan want to pay it back but like dieting human failings cause the problems.
Just a instant check on what you have spent globally will make a huge difference in budget management.
That’s part of the problem. A budget is a plan, not a running total. You budget what you can spend then change what you do spend to match. (If only it were that easy, though. I’m human too)
I think it is important to realize that this is really a US thing. In my country people have a mortgage and that is usually it. The rest, we save money for and buy it when we have the money.
Our mortgages are pretty bad though and rent combined with housing prices pretty much adapt to the median income, imo, when rent goes down, housing prices shoot up to compensate and get monthly costs back at the same level where a median family need two working parents for a reasonable home in the city roughly. Now, rent has nowhere to go but up and housing prices will fall again, I predict.
It's not really a US only thing - based on your username I'm assuming Dutch? - a quick Google says that there was 6 billion euro in new consumer credit in the Netherlands last year, and it's been much more than that even in the past decade. People in the Netherlands seem jsut as likely to buy cars on finance and rack up credit card debt as everyone else.
Also, due to massive migrations the distinctions between countries are disappearing. So traditional Dutch may behave exactly as you say, but if immigrants from the same source go to different countries (like Netherlands and US) but act similarly then they will push these economic metrics towards each other across their host countries.
The key here being that non-financial data isn't actually useful in predicting ability to repay loans. I'm sure it'd be used somehow by modern financial institutions if it were predictive.
> The key here being that non-financial data isn't actually useful in predicting ability to repay loans.
Sure it is. Even religion correlates with loan risk. (Could be a proxy for social status and people from your tribe helping you out when you can't pay back). I could probably get a predictive model better than random guessing by mining your HackerNews comments or Facebook likes.
> I'm sure it'd be used somehow by modern financial institutions if it were predictive.
It is used. All data that is even remotely informative is used. To the fullest extend made possible by jurisdiction/ anti-discrimination laws.
Cool, I look forward to your evidence and data that you're using to contradict the story in the submission, because the very article states that if you tried to pitch a loan approval algorithm with any of the things you suggested in a modern bank, you'd be laughed out of the firm.
The evidence is in the article. It tells of a US company using 10.000 data points in a jurisdiction where this is allowed.
The article cites Turner, who a few years back wrote this:
> The non-financial data was found predictive in all three outcomes examined when no other
‘traditional’ credit information was used, strongly suggesting that alternative data would be useful to lenders in underwriting the so-called ‘no-file’ or ‘no-score’ consumer who have little or no payment/credit information available.
I think his quote about "laughing out" is taken out of context. No predictive modeler will throw away informative features, because she can not distinguish noise from signal after 26 variables. That's 10 to 1% of a modern credit scoring model. It may be the perspective of a regulator though (they start drowning in noise after reviewing 100+ variables).
Yes, all data that is legal to use and predictive, will get used, if not by you, then by your competitor.
And informative variables that can not be used in the decision to give a loan, are used internally to predict if the loan will be paid back. There is more to credit scoring than the initial yes-no.
The US company doesn't operate in the US, and the quote wasn't taken out of context. The context is that 10,000 was way over the 10 or so maximum number of actually valuable points of data.
The whole article is about how utterly useless the vast majority of "data" ends up being, and how they are not used internally to predict if the loan will be paid back.
So you are 100% in disagreement with this article, and forgive me if I trust a nationally published periodical such as Newsweek of a throwaway commenter on the Internet.
No, it says it's used incorrectly/wrongly. The whole article is about how hard it is to get an accurate view of a person's likelihood to pay a loan back, and the crazy (and generally inaccurate) ways lenders try to make up for that.
You drew the wrong conclusion about something you don't know a lot about and doubled down. Good luck with that, though I don't honestly forgive you.
Yes. For example, mortgage lenders in the US are required to obtain and report this.[1] If applying in person the applicant chooses not to furnish the information, the lender is required to guess based on observation.
I get confused by those they claim some of these ML or NN algos are black boxes and so could be breaking the law. If you’re not inputting illegal info (like race, sex, national origin, religion, name, etc. or corollaries), then it’s not making its risk assessment on that basis. All you have to do is look at the inputs. It isn’t unknowable whether or not it’s breaking the law.
Let me try to clear your confusion: an input my seem innocent (e.g. zip code), but a zip code is likely to correlate to ethnicity and race in some regions. So even if the inputs seem legal, an ML model that’s sophisticated enough can derive illegal results that discriminate against certain populations.
This gets at the heart of one of the issues with these discrimination laws.
What if, all else being equal, people from zipcode A are statistically much more likely to default than those from zipcode B? Do financial firms have to pretend like they don't know that fact?
How removed from race does information have to be in order to be considered by a financial firm?
> How removed from race does information have to be in order to be considered by a financial firm?
In general, the standard is "disparate impact"--if you accepted 80% of all white applicants but only 20% of black applicants, then you're probably liable for racial discrimination even if you were completely race-blind.
That standard is unworkable. What happens when an idealistic charity goes into a homeless shelter in a black neighborhood with a high rate of drug use and helps them all fill out mortgage applications, so that 80% of the lender's white applicants are married with stable middle class jobs and 80% of the black applicants are single, homeless, unemployed and suffering from drug addiction? The institution evaluating the applications may not even be aware of why their applicant population skews that way.
There is a legal defense, where you can argue that the disparate impact is caused by practical business need instead of by implicit bias. An example (using gender instead of race) is that women have less upper body strength than men, on average, so they are far less likely to meet job requirements that require being able to carry 100 pounds of equipment on their backs.
Even then, however, you still have to demonstrate in your defense that the requirements are actually justified and not a backdoor proxy. In the example I gave earlier, you have to demonstrate that employees actually need to carry 100 pounds of equipment on their backs, and furthermore that there is no workable alternative to that requirement. If I were a legal compliance officer, machine learning for applicant screening would scare me, because I would have a hard time arguing in court that the results constituted a legitimate business need and not racism-by-proxy, especially if the plaintiff found that some of the metrics highly correlated to race.
If you're giving loans and justifying a factor is the same as showing that it correlates with repayment rate then it's that easy to show a justification, but it will also basically always produce a "disparate impact" because repayment rate itself correlates with race, so any reasonably accurate measure of repayment rate will do the same.
A "disparate impact" standard is useless because it's routinely met even when discrimination is not actually occurring. In fact, finding no disparate impact would be highly indicative that someone was impermissibly taking race into consideration in order to fudge the numbers.
So either you have a disparate impact because that's what naturally happens with accurate predictions, or you do the expressly prohibited thing and take race directly into consideration in order to make it go away.
Imagine a human intervening to remove a factor from consideration because considering it benefits black applicants relative to other applicants, even though considering it improves accuracy overall. Would they not rightfully get their butts sued off? But what happens if it's the same thing, only this time it benefits hispanic or korean or greek applicants relative to black applicants? The only remaining option is to consider all the information you have available, which is what people are inclined to do to begin with.
I'm not sure what you're trying to argue here. Are you saying that disparate impact liability is fraught? Everyone agrees that it is. Cases based on it hinge on whether the impact results from "legitimate business need" with no reasonable alternatives, a standard that expressly allows e.g. credit scores with strong racial or gender correlations.
So if that's what you're concerned about, the response is simple: it's not enough to simply show a correlation; in enforcing disparate impact claims (under ECOA or FHA or Title VII), regulators have to show not just the correlation, but also the illegitimacy of the (facially neutral) action, or at last that some other (facially neutral) business practice would accomplish the same goals without producing the impact.
Since this is HN, I can't discard the idea that maybe you're instead arguing that disparate impact isn't in fact the standard in US law, in which case: no, a simple Google search for "disparate impact" and any of the laws I cited in that last paragraph will quickly disabuse you of that.
> it's not enough to simply show a correlation; in enforcing disparate impact claims (under ECOA or FHA or Title VII), regulators have to show not just the correlation, but also the illegitimacy of the (facially neutral) action, or at last that some other (facially neutral) business practice would accomplish the same goals without producing the impact.
But that's the problem.
Suppose you're evaluating whether to consider zip code in loan approvals, and that doing so improves the overall prediction rate, helps hispanic applicants, but hurts black applicants.
If you choose to stop considering zip code, you've got a disparate impact against hispanics but no business justification for doing it that way, meanwhile they can show an alternative (i.e. taking zip code into account) that serves your business goals better and reduces the disparate impact against hispanics, so not considering it may get you into trouble.
But we also have people suggesting that you shouldn't consider it because it increases the disparate impact against black applicants, and you might "have a hard time" demonstrating that business justification in court. Which is perhaps not ridiculous, because the business justification is there, but it's also in complicated algorithms that are not easy to explain to a layman.
So you have two basically reasonable alternative courses of action, either of which could arguably result in liability. This is the hallmark of unworkable legislation.
> Suppose you're evaluating whether to consider zip code in loan approvals, and that doing so improves the overall prediction rate, helps hispanic applicants, but hurts black applicants.
That's not disparate impact. The legal code does not follow exact predicate logic, where if you meet conditions A, B, and C, you violate the law. It tends to follows rules of fuzzy logic instead--that's why you'll see legal opinions that involve words like "tends to", "probably", "factors" a lot. Particularly where a strict interpretation would lead you to an apparent contradictions, the court system instead tries to find a reasonable course of action. Indeed, often merely showing that you are making a good-faith effort to comply with all applicable laws and regulations is sufficient to absolve you of penalties for failure to comply.
Ultimately, the arbiter of reasonableness isn't a blackbox oracle. It's a panel of 12 members of the general public, or perhaps a panel of 3-9 judges.
Isn't that a bit like asking how many pedestrians you can hit and still keep your license? The right answer is to try for zero.
The dilemma you're describing isn't due to the law itself, but rather because the difficulty of writing law results in only the absolute worst abuses being criminalized. The abstract safe answer is to not engage in group-based discrimination at all, regardless of it seeming quite lucrative to hire a compliance department to analyze just how far you can get away with stretching it.
> Isn't that a bit like asking how many pedestrians you can hit and still keep your license?
This is nothing like that. Hitting or not hitting a pedestrian is binary, and there is a clear way to determine fault. We often call car collisions "accidents", but they aren't. Someone did something wrong to cause a collision.
The issue here is that you can be discriminatory "by accident". You can be 100% race blind and have the best intentions in the world and work very hard to be equitable but still have the slightest bias due to the nature of the data.
In fact, when poverty is correlated with race so strongly, I would argue it's impossible for a bank not to accidentally discriminate. A bank isn't going to lend you someone who is unlikely to pay it back - there's nothing racist about that. But they have had a disparate impact.
Yes, but criminal intent is less clear. Someone can be declared at fault for hitting a pedestrian, yet not have been criminally negligent. And even if this happens a few times from really bad luck, it's reasonable that they'd get to keep their license [0].
> You can be 100% race blind and have the best intentions in the world ... bias due to the nature of the data
The question is what data? Are you feeding things in that have a clear causal relationship with your desired result? Or are you inputting everything you can hoping to discover correlations ? The latter is essentially trying to suss out informal groups, and engaging in any group-based discrimination means you cannot claim to have "the best intentions" - regardless of whether the group can be named as a legally protected one or not. If say you're try to base mortgage underwriting on credit card purchase data, it's wholly disingenuous to claim that the resulting fallout is "accidental".
[0] In a society where cars are de facto mandatory to get around. I will disclaim that this casual attitude is a large part of what keeps roads so hostile for everyone else, but it currently is what it is. Also historically we haven't had such a likely hidden factor as phone use.
> So even if the inputs seem legal, an ML model that’s sophisticated enough can derive illegal results that discriminate against certain populations.
That doesn't make any sense.
The old school discrimination (e.g. redlining) worked like this. They would find some factor that correlates strongly with black people (e.g. black neighborhood zip codes), then assign a weight for that factor based on how well it correlated with what they wanted to discriminate against (black people) rather than how well it correlated with what they were supposed to be measuring (creditworthiness).
You can certainly do that on purpose with ML, but the way you do it is to give the algorithm the data on which factors are associated with black people and then ask it to assign blackness scores rather than credit scores and use the blackness scores for making credit approvals. I am not aware of anybody stupid/racist enough to actually be doing that in 2019.
What you're supposed to do is to weight factors based on how well they correlate with the outcome you're trying to predict (e.g. loan repayment) and use those weights rather than the ones chosen purposely to discriminate on the basis of race.
That doesn't mean none of those factors will ever correlate with race. Everything correlates with everything to one degree or another. True independent variables are the exception rather than the rule. But weighting each factor based on how well it correlates with the outcome you're trying to predict rather than how well it correlates with race is maximally non-discriminatory -- doing something else would be purposely giving advantage to one race over another disproportionate to the best available information. And nobody who is not an overt racist has reason to fudge the numbers that way, because it would also make worse predictions and cause you to lose money.
A recruiting tool used by Amazon developed a bias against women despite not being told candidates' genders. It penalized candidates who were graduates of all-women's colleges and also those who had the word "women" in their resume (e.g. “women’s chess club captain.”) It had been trained on resumes submitted to Amazon during the previous ten years, so the tool's bias was likely reflective of real human bias in Amazon's recruiting process.
From my reading of that article, I think the recruiting tool was fed resumés and a data point saying whether or not the corresponding candidate was hired or not. As a result, the tool not only developed a bias against women, but was effectively evidence that there was bias against women in the original hire / not hire decisions.
I just missed the deadline to edit my post, so I am replying to myself.
Looking at the parent comment again, I seemed to have just restated it without adding anything new of my own. I meant to add that my reasoning for why Amazon pulled development of this tool was not just because the tool’s bias, but also because that the existence of the tool and its associated training data could open up Amazon to litigation claiming that their hiring decisions were biased against women in ten year span referred to by the article.
It's interesting that nobody even bothered to check whether the bias was illicit. They found something that sounds bad and the immediate response is "OMG bad PR, pull emergency shutdown."
They just assume that "women's" is coding for female candidates and not something more specific, like gender-segregated activities that may legitimately produce lower quality candidates than the equivalent integrated activities that exposed the student/candidate to a more diverse cohort population. Probably also doesn't help that some of the biggest gender-segregated institutions are penal in nature, i.e. "reform school for troubled girls" or "women's correctional facility."
Did anybody even check whether it also penalizes words like "boys" and "gentlemen's"?
To be fair, it would probably take more time and money irrespective of litigation to be sure that there was illicit bias than to just delete everything and call the exercise a failure.
I think that’s a great example on needing to sanitize data, but being a member of a women’s club or attending a women’s college is itself telling you the candidates gender so I don’t know that this is deriving protected information from non-protected information.
This is a game of whack-a-mole. Given enough data, the ML system will just find other characteristics or groups of characteristics that act as proxies. It might unfairly penalize candidates in ways that are impossible to detect by human evaluators. The promise of these systems is that if you give them a pile of raw data, they will detect subtle patterns that aid in assessing individuals (be they job applicants, ex-cons, whatever.) However, some of the patterns that they detect are our biases against classes of people. If the solution is to sanitize the data to such a degree that the ML system can no longer infer that someone is a woman or that someone is over 40, etc., then the training data is probably also useless for detecting the non-obvious patterns that we want the system to discover.
That it is a “game of whack-a-mole” isn’t clear to me, and you are asserting so without any backing, which is why I asked if anyone has real-world examples.
Citing an example where they don’t provide gender but then provide “went to an all-women college” is not an example of proving non-protected information which results in protected-class discrimination, it is an example of telling an ML system protected information in a roundabout way.
Age is a protected class, and it's extremely difficult to sanitise age from a resume.
Let's say the candidate got a BS in EECS in 1987, they've had 6 jobs since graduating, and their first job was cost-optimising floppy disk drives.
Do you feed the ML system this data, which is clearly correlated with age? Do you keep the number of jobs but delete the duration, responding to 6-jobs-in-3-years the same as 6-jobs-in-30-years? How do you sanitise the age-correlated data out of job descriptions naming old tech or defunct employers? Do you delete all but the last 5 years of their employment history, assigning no value to experience beyond that?
Yes, for one example, a system used to recommend sentencing based on how likely a criminal was to re-offend was found to have been heavily biased based on race, despite not being given explicit information about race[1].
Edit: It’s unclear if this system is based on ML, but I think the point stands - if a system can be manually tuned to do this, the risk exists of this happening to a trained model as well. Given other cases of ML models learning an incorrect behavior (such as the system intended to detect skin cancer that ended up being a fancy ruler detector [2]).
Using zip codes to implement discriminatory policies isn’t ML, but has almost certainly occurred. It’s referred to as redlining, and has been around for a while. The New Deal had some pretty indefensible redlining conditions. A lot of people will say the practice continues to this day, but the modern examples are a lot more open to interpretation than some of the historical ones are.
Was the New Deal redlining out-and-out discrimination or a matter of accidentally creating conditions, i.e. based on bad intelligence, that resulted in discrimination. I always understood it to be the former.
The level of plausible deniability regarding The New Deal is mostly a matter of opinion, but personally I think it was pretty transparent in how it intentionally discriminated.
It’s the modern examples that I think are much more questionable. For example, there are more liquor stores in black neighbourhoods. Is this because:
a) A conspiracy to use alcohol to suppress the black population?
b) Those neighbourhoods have a greater demand for liquor stores?
Redlining was basically part of the law, as FHA historically required higher down payments for “less desirable”, aka non-white neighborhoods.
In the countryside or exurb, a similar dynamic arises... “the wrong side of the tracks”. Even without the racial dimension, the trailer parks and the horse farms tend to be in easily identified geographically distinct areas.
In hot markets in my area it’s easy to see. The “desirable” areas surge ahead of the baseline tax assessment comps, and the “undesirable” do not. You can map it in a stable area and it lines up with the old FHA maps.
Decades of a market like that leads to those conditions staying around for awhile. As multi-family real estate is getting appealing to people with money, I think we’ll finally see that legacy fade away.
Other comments have pointed out the risk of redundant encodings /proxy variables.
I like to point out that sources of unfairness are possible, even with 100% decorrelated non-protective variables. For instance, the data collection and labeling of the data may be biased (You label re-admission to jail with current criminals in jail, and just overfitted to the war on drugs).
Or you make a biased decision based on the output of a fair model. Things like not taking into account sample size / uncertainty.
Also, breaking the law is at odds with unethical behavior. Unethical behavior is not necessarily breaking the law, but it is still nasty. For instance, from your Facebook likes (not a protected variable) I could deduce all of race, sex, origin, religion. Not against the law, but still discrimination.
Humans had plenty of algorithms to substantiate racial discrimination using nonobvious indicators. We can, inadvertantly or otherwise, easily program machines to do the same.