The Naive Approach to Hiring People

cstejerean · on Feb 12, 2008

Interesting idea. But I wonder how much it helps to use a classifier for processing resumes. If you have a high number of resumes to process it makes sense to attempt to automate it. But if the volume you are processing is low enough you might be able to get by with letting people make decisions.

For example I can tell way better than Google's spam filtering software whether an email is spam or ham. The only reason to use filtering in that case is to save me the hassle of having to dismiss 50+ spam messages per day.

The human mind is pretty good at picking up patterns (even without trying) so after a while experienced interviewers start to see what kind of people worked out and what kind of people they hired and later regretted it. I guess using a naive classifier would eliminate some bias towards things like top colleges and years of experience.

The real problem with resumes however is that anyone can put absolutely anything on it. So it's a bad source for training a classifier. It would be interesting however to measure things like the probability of your interviewers making correct decisions, or (if you use standard questions) the correlation between answers to questions and performance (in the event of a hire).

Ultimately I think I agree with Joel that the only proper way to decide if someone is worth hiring is to have a technical discussion with them and see how it goes. Getting a feel for how someone writes code is even more important (but I'm not necessarily sure that putting people on the spot in from of a whiteboard is the best way). After someone has been hired I can usually tell withing several minutes of working with them on real production code whether they were a good addition to the company.

raganwald · on Feb 12, 2008

Thank you. No, that is not strong enough: THANK YOU!!!

> It would be interesting however to measure things like the probability of your interviewers making correct decisions, or (if you use standard questions) the correlation between answers to questions and performance (in the event of a hire).

That was my objective when writing the post, although judging by responses on reddit, most people thought I was saying that naive bayesian filters could outperform human hiring managers.

All I was suggesting is that people "Apply what we know about classification to hiring." And one of the things we know is that any classifier--human or machine or human assisted by machine--can be improved by regular investigation of the correlation between the features you observe and the results you obtain.

> The real problem with resumes however is that anyone can put absolutely anything on it.

True, although security by obscurity works for hiring. It's a lot more like CAPTCHAs than it is like Spam: if you are the only one who thinks that Io and Self language experience correlates positively with performance as a Javascript programmer, I think you will find that Io and Self are keywords that will maintain a high correlation.

Then one day Joel Spolsky writes a post about that, someone reads it and puts it in a job listing, a head hunter sees it and adds it to some candidate resumes, and within a year the correlation drops off the map and you have to start looking for something else in resumes or you have to aggressively fizzbuzz it in phone screens.

C'est la vie.

cstejerean · on Feb 13, 2008

Alright, you've cleared up some of the misconceptions I had about your post. Great point regarding the security by obscurity in resumes.

If I get a chance I might write a script to search the web for publicly accessible resumes and track changes in keywords over time. Might be able to pick up some interesting trends.

cstejerean · on Feb 12, 2008

That turned out longer than expected. Perhaps I should refactor it and make it into a blog post.