Am I the only person getting progressively more creeped out by the series of bizarre, unsourced, pseudo-scientific ads TripleByte has been running?
On Reddit right now they're running this weird faux-linear-algebra thing where they imply that they can build a vector of your skills vs. a vector of job requirements and get a meaningful answer via the dot product.
Which is a bit like saying you can predict the weather by taking the dot product of a vector of ocean and air. What even are the units? What does any of this even mean?
Hiring is a challenging, multi-dimensional thing. It involves a high-risk and ideally informed decision by multiple parties. Doing it effectively is hard. Doing it effectively and respectfully is harder still. And yet TripleByte comes in and says, "We sound vaguely like machine learning. We got this."
Honestly, they make HackerRank, which was another extremely sketchy organization making a lot of very questionable decisions, look reasonable by comparison.
No. Sorry, you're not forming a coherent vector space between both skills and demands without published research on the subject.
You do not simply say, "Gosh it seems like I have said words related to ML and therefore its application is plausible."
What's more, the notion that recruiting is in fact a skills demand model is itself fundmentally misleading. Many of the skills you want are domain specific and in fact cannot be expected to be acquired anywhere but on the job. Given how many shops subtly permute "react" or "Golang" to mean a lot of skills in a utility cluster, any space you form is going to be incredibly specific to the employer and difficult to map anywhere else.
>> subtly permute "react" or "Golang" to mean a lot of skills in a utility cluster
This, and the tendency for shops to evaluate candidates' skillset against the shops' most mission critical parts and processes which for whatever reason have not been operationally hardened and locked down.
It works very nicely for predicting what movie you'd like to watch, so it could potentially work with jobs, too. Collecting enough data could be challenging, though
That's an incredibe over-simplification of the utility and usage of embedding. For an embedding to work there must be some legitimate (as opposed to arbitrary or even non-existent) relationship to be teased out.
> For an embedding to work there must be some legitimate relationship to be teased out.
There is most definitely a relationship between candidate skills, job requirements, and interview result/job performance.
The point is to get rid of all these bullshit subjective excuses when people fail an interview/get fired. The answer is simple: they likely weren't good enough. But no one likes hearing that.
> There is most definitely a relationship between candidate skills, job requirements, and interview result/job performance.
This is actually not at all an obvious fact. It's continuously offered as a ground truth, but many people dispute it and a lot of successful organizations do not recruit weighing these factors as heavily as you're suggesting.
> The answer is simple: they likely weren't good enough. But no one likes hearing that.
Possibly, or possibly they were plenty good but so obnoxious or outrageous that they wouldn't be welcome. I've certainly done that more than once in my time building tech organizations. I still remember the guy who effusively praised the beauty of all the women he saw and congratulated me on "the haul". Even touched a woman's hair to compliment it. Too bad he was such a sleazebag, he seemed smart. But even from a cold economic standpoint the cost to the company for an inevitable sexual harassment lawsuit would always eclipse any value he could provide.
The vector of skills vs. vector of job properties thing could be a coherent idea. I'd argue it's not a good idea (and I have created joint-embedding models for multi-modal text and image nearest neighbor search systems). But it could be a coherent idea.
In deep learning anyway, it's a very similar idea in spirit to something like word2vec. In word2vec you learn how to create some arbitrary vector of real numbers, let's say a 256-component vector, that represents words. The components of the vector for each word are learned by forcing the model to solve a prediction task, like predicting whether two words were truly seen in the same context in a given document or not. The model updates the 256-vector of each word such that this prediction task (which can be based on the inner product of those 256-vectors for sets of words) gets better and better. By the end of this, you can in some sense claim a word's mapping into its 256-vector "represents" some "context" intrinsic to that word. And that 256-dimensional "space of context" where the vectors reside is known as an embedding space. So the 256-vector of a word is often called that word's embedding.
For more generic problem, you aren't learning just one type of embedding, but possibly many. In TripleByte's case, I'm guessing they are inputting text characteristics of candidate resumes, self-descriptions, GitHub repos, Stack Overflow profiles, LinkedIn pages, etc. etc., and treating a candidate like a bag-of-words from all those combined sources of data. And maybe they also have other data too, like performance on skills assessments, favorite craft beer, and whether you think dog-friendly offices are unprofessional or you think they are super cool.
On the other side they input a bunch of data about jobs: bag-of-words text from job ads, tech blogs from that company, the company's GitHub repos if any are public facing, profiles of their existing team members, what gourmet coffee they keep in the office, and whether or not they believe in "unlimited" vacation.
If they have labeled training data, like say a bunch of people who work in their client companies that offered to give them resumes and personal data, and then use the existing job those people occupy as a ground truth label for a "positive" matching between a candidate and a job, then they could train a model that learns how to map all that candidate data and also all that job-specific data both into 256-component embedding vectors that are constrained to reside in the same space.
And if they really want for the dot product to be the key way to describe applicant-to-job similarity, they could make the loss function work based on cosine distance between the learned embedding vectors of candidates and the learned embedding vectors of their ground truth actual jobs.
I have no idea if this approach, or something like it, is what TripleByte actually does. But it sounds like they want to market themselves as really fancy machine learning engineers by at least talking about this.
Frankly, in deep learning this is a fairly cookie cutter approach and is often the first thing someone tries. In fact, I just Googled for "machine learning visual searches" and one of the first links was this post from the Squarespace engineering team (I do not work at or have any affiliation with Squarespace):
From the blog post it looks like this was just a sort of fun side project, with a goal of developing it into an actual search service for Squarespace's search engine at some point (maybe they already did?).
The reason why TripleByte's attempts to use this in advertisements fall flat is because it's a commonplace idea -- so common that here we even have a website hosting and content platform demoing how the basic idea works for image searching as some random blog post.
My guess is that other experienced ML engineers would see TripleByte's choice with this ad as at best a little self-aggrandizing and silly, or at worst maybe even intellectually dishonest about the real importance or effectiveness of this type of embedding-based similarity approach.
I don’t work in the “high-tech HR” space, but do have experience with embedding vector methods, joint-embeddings, and trying to solve generic information retrieval tasks, where the underlying data is not text or images, by using embedding methods.
Even in a domain like reverse image search, where this approach has been studied to death, there are big concerns about how much of the problem is solved because a vector similarity approach can approximate the real manifold distance of some underlying true structure vs. just having huge models overfit to a particular class of natural image statistics.
This problem is discussed in [0], which creates a lot of problems for people who want to believe that some deep, internal layer of a neural net can capture semantically relevant features.
But a much bigger problem looms for trying to extend this idea to matching people to jobs. In that problem, you don’t even have the option of overfitting to population statistics because the population is constantly changing and the individuals in the population have an insanely high-dimensional set of internal unobserved variables, like their emotions, goals, current life or family priorities, interests, relative free time, sensitivity to stress, etc.
By comparison, the space of latent variables giving rise to observed natural photos is tiny.
Essentially, if you really wanted to take a scary, Orwellian, big data approach to quantifying a candidate’s degree of match to a job, you would need much more data on the conditional distribution of the observables (resume items, college degree, skills assessment, etc) when given information on the internal state (work ethic, introvert vs extravert, motivation for looking for a job, intelligence metrics, disposition, response to stress, etc) at a given moment of time.
This would let you model the posterior distribution of those hidden, internal characteristics of the applicant, and those characteristics could maybe be used to understand a holistic match to a certain employer-team-role situation.
But that’s a ludicrously high-dimensiomal problem that observable data like resumes or skills assessments does little to solve.
So overall, I probably share your opinion that this is an extremely shallow model.
It’s like learning a vector space model of spaghetti and a vector space model of walls and then claiming your model can predict what will stick when you throw spaghetti at the wall.
I do like that last bit. For full disclosure: I'm sure that for many places their recruiting process is so bad that just enumerating the required skills with some external reference will improve outcomes, but I'm not sure that this is what people think is actually going on.
Not a fan of Triplebyte. I'm a full stack engineer with 10 years of experience building web apps, and while I'm not the best in the world, I'm still pretty damn good. I took Triplebyte's interview a few months ago, and they rejected me with a link to a tutorial on how to build your first webpage (https://learn.shayhowe.com/html-css/). This company knows absolutely nothing about hiring good engineers.
Companies seem to be adding more screening steps to try to reduce their false positive rate -- the rate at which they interview people who aren't hireable.
But most don't seem to understand that mathematically, there's a tradeoff for a higher false negative rate.
Screening false negatives are people who would have done well in an interview, but don't make it to that stage. These are more hidden to the company, but quite expensive for the hiring process, and painful for people who are wrongly rejected. If we put this in probabilistic terms, I hope we can have a deeper conversation about what's happening and how this issue impacts engineers.
I couldn't agree more. I think the illusion that hiring is a deterministic and rational process, combined with the fear of firing, leads to adding more and more filters, which just makes things worse.
It's a really expensive and inefficient system, and tends to worsen as companies grow. It would be much better to hire fast and fire fast.
As I recently blogged about this issue:
You may be one of those companies that makes much of hiring only the top 1% of the top 1% of applicants. This can be good for morale (“We are the best of the best”) but a too-selective hiring process is quite hazardous. If you find a way to effectively measure your false-negative rate, (the no-hires that you should have hired) you may find out that half your current team wouldn’t make it through your current hiring process. If people commonly apply to your company three or four times before getting in, your highly-selective hiring process is probably filtering on random noise more than skill or talent. In other words, your precision and recall[1] are both bad, the bad results are coming from a costly process, and adding more filters just makes things worse.
At least where I work, we are painfully aware of this tradeoff, but are willing to make the sacrifice because of the severe costs of making a bad hire. We are small, and cannot afford to lose 90 days.
I realize this article refutes this point of view, but finance and HR do not see it this way.
I am more happy to devote several hours of each week interviewing than entire days/weeks mentoring and reviewing code for a bad hire, or worse, dealing with the widespread consequences of a culture misfit in a small team.
At least where I work, we are painfully aware of this tradeoff, but are willing to make the sacrifice because of the severe costs of making a bad hire. We are small, and cannot afford to lose 90 days.
Small or large, I think this is fair. But also fair is to take this into account when making claims about any alleged shortage of qualified workers.
Not that you or your company have necessarily made any such claims but plenty of companies that similarly follow a low false positive at the expense of false negatives approach have done so, including in sworn testimony before legislative bodies.
I'd distinguish between screening false positives (which my article focuses on) and hiring false positives. Screening rejections (before the final interview) are usually done with very limited information, so there's more room for bias and noise.
I've been the tail end of the interview funnel when experimenting with our screening strictness. The emotional toll that less stringent screening had on my day-to-day work (faster, rapid, and high frequency on sites) was extremely high. It led to a month or two of burnout as the lead engineer on the team, despite probably leading to finding a hire sooner. The cost is, in my opinion, immeasurable.
What I'm worried about is the b2b crowdsourcing of these rankings between companies. Corporations have shown an affinity for sharing data, like Facebook and Google buying ad targeting data. I feel like it is only a matter of time before your tinder data, resume data, etc., is all fed into your credit score, etc., to the point where a minority of the population is completely screwed over by false negative rates made by bad data scientists.
Any type of score is regulated by the Fair Credit and reporting act. These scores may fall under that regulation, especially if they are operating as a clearinghouse.
That's true for your credit score, but these screenings could effectively blacklist someone from the industry if a bad encounter with TripleByte or someone like them is shared between all employers.
I wouldn't be surprised if TripleByte has already triangulated data with HN or other YC companies to get additional profile information on applicants.
The unpredictability of what will show up on the tests, and the unknown future of my 'grade', is a major turn-off. I'd prefer to see a certification process, where you can study for the test, then take the test, and pay to retake it if you want to improve your score. If the technical knowledge in the tests is that important for their clients, then sharing that openly will increase the pool of successful candidates over the long term.
I kind of resent these attempts at optimally cherry-picking "the right" candidates using data.
Hiring is _intrinsically_ a subjective act.
People are very much a moving target. They change over time. Work experiences, even bad ones, shape one's skills and ability to cope in organizations. Almost everybody has a bad-fit job at one time or another. The experience of a bad-fit is actually important to the growth of the individual and, I think, their coworkers and employers.
This is the core way we look at it. Our models are hopelessly simplistic, and have no chance of modeling the true complexity of human ability. It's important to stay humble and recognize this. However, what we're competing against is people (often non-technical people) making gut screening decisions. And it ends up that even relatively simple models can do a better job than most recruiters making gut calls.
And the fact that we are able to do this while being blind to background (and open to people from all sort of backgrounds) is something that I think is very positive. We totally make mistake and reject good people. But we get better over time, and we also help lots of people get jobs they might not have been able to get without us.
There is a lot of scientific support for pre-employment screening (selection). The most conclusive evidence comes from a series of absolutely enormous studies conducted by the US military in the 80's known as project A.
That being said, there is a great deal of pseudo science being gobbled up by organizations because this is a mostly unregulated industry.
IIRC, the outcome of these studies were that intelligence is the best predictor of job performance. The ASVAB is essentially an IQ test with some domain knowledge questions sprinkled in.
When an organization scales to the point where they need something like TripleByte, they're going to be using some kind of impersonal screening process, though. This is better than some disinterested HR drone scanning resumes for keywords.
I agree that an HR drone scanning for keywords is not good (though some are better than others).
The most savvy candidates, however, tend to skip the step where you throw your resume into a giant vat with the others and hope for the best. Isn't it still the case that most jobs are filled using references and professional networks?
No, not at truly large companies -- Amazon could never fill its halls with just references and such.
Even at smaller companies, however, there's some pushback to network hiring because it isn't particularly great for getting a diverse pool of talent.
To be honest, though there are a lot of dangers with this sort of thing, I am a fan. End of the day, engineers should be great engineers, not great at hacking the job finding process.
> there's some pushback to network hiring because it isn't particularly great for getting a diverse pool of talent.
I'd be surprised if random resumes are really that much more diverse than network hiring. Consider principles like the Erdos number[1] -- the collaborative distance between groups of people in similar fields is strikingly small. I bet, a company with a reasonably large development staff (50 people) is probably no more than 2-degrees separated from 99% of the talent pool in a given region.
Just about every position I've looked at was with a company where a former colleague works.
On a side note, I took the 20-30 min front-end exam that is referenced in the article and a few things bug me (in case anyone from TripleByte is reading):
1. I did "exceptionally well" but I don't know how many I got right or in what percentile I fell in. Why? Firstly, I'm immediately suspect if I actually did that well. For all I know, the "top percent" could just be anyone in the top 50%. And the top 50% only gets 10 questions right. But more importantly, I already stated in the beginning I was taking it for fun, so why can't I learn what I got wrong so I can improve? I imagine there were areas I got wrong that were repeated, which brings me to my next point...
2. Why so React-centric? I happen to use it but plenty of people use Angular or Vue. You can't expect them to know about the React events lifecycle.
3. Okay, so I don't live in SF or NYC. But I know some of those bigger companies have offices in Boston, where I do live. Why can't you make that work if you already know what companies are in your pipeline and which offices they have? Seems super expensive and wasteful to lose out on a great engineer when you know your list of 200 companies totally has an office not in SF or NYC (e.g. Facebook).
4. Okay, so I want to work remote. Why can't I just take your final Google hangouts exam so that way you have on file "okay cool this person is great, we can fast track this person." If you green light companies who are remote-friendly, you don't have to worry about this issue. Plus isn't TripleByte a YC company, working with other YC companies? I know for a fact that GitLab is also YC and is a remote-friendly company. Not to mention you've advertised for remote engineers! https://news.ycombinator.com/item?id=15066073
I dunno, given that the article is all about touting how effective the 30 question exam is at screening out candidates, you'd think you'd want to do something useful with that quiz instead of locking people out, even if they don't fit your current criteria.
Feedback is important, so after the interview, everyone gets detailed personal feedback on every section, regardless of the outcome. We get much higher resolution data after the interview (part of the point of my article).
You may have hit a few React-specific questions, but getting those right is certainly not required.
As far as Boston/remote: interviews are expensive for us, so it makes sense to interview engineers who are excited about working in places where lots and lots of our companies are hiring today. Currently that's SF or NYC. When we expand, you'll be able to pick up from that step.
To clarify, my statement is true for every engineer who goes through the Triplebyte technical interview in our regular process, i.e. to get matched to jobs at the companies we work with.
(That was not your situation -- contact support@ to follow up in private.)
After our two-hour interview everyone should get a detailed feedback email - I write them. If you did our two-hour interview and didn't get detailed feedback, let me know and I'll look you up and figure out what went wrong.
And yet this is far from the first complaint I have seen of you folks not giving candidates the respect of following up with them after they invested their time. The response is always something along the lines of "let me check what happened..."
This seems like a regular complaint I read here on HN when your company posts your latest blog entry for discussion. There is at least one other person on this same discussion with the same experience as well.
We interview a lot of software engineers every week. In my experience, the missing feedback email is rare, and the fix is almost always getting an updated email address, or finding our feedback email randomly caught in a spam folder.
I don't see why the volume matters here, basic courtesy should be baked in to the process.
I would think that if someone has spent 3 or 4 hours interviewing over the course of 3 or 4 weeks then its probably a good assumption that email address that is being used for scheduling is up to date. I would also think that the average software engineer who is anticipating feedback regarding interviews knows to look in their spam folder.
They also don't assess creativity or problem solving which I consider rather essential to software engineering. Based on my limited knowledge of TripleByte tests I would expect that only drones make it through their screens.
Just my $0.02 about triplebyte - I went through all interviews, seemingly doing fine, but after the last informal talk (which also seemed ok to me), there was about 10 days of total silence. Finally, I emailed to enquire and immediately received an 'Unfortunately, ...' letter.
So, in the end, I've had 4 hangouts sessions over 3 weeks, plus time spent preparing, and was rejected with no feedback at all. I'm still curious, was it the bloom filter?
The problem isn't the computers, it's the people. You put HR-bots on the task of listing the job, and they don't know Atom from Adam, so they list all sorts of silly requirements like 15 years of SAP experience, 15 years of Ruby on Rails, 15 years of COBOL, and all for a $20/hr entry position, or a list of certifications that no human could ever accumulate. Then what happens is applicants start keyword spamming their resumes just to get noticed, and now as a technical person I get a stack of resumes that are absolute trash.
Two years ago I was hiring for a sysadmin. My HR department put my requirements up on Indeed. I got 70 resumes that passed their screening. Of those 70 I found 5 that I wanted to interview, 3 that showed up, and none were hirable. I left a company several months ago, couldn't deal with the management anymore, and the past few months of job searching have been excruciating.
We need more technical people screening resumes and comparing to actual job requirements.
Edit: I think I've run afoul of an anti-triplebyte sentiment. I should clarify that I think this post did a good job building a very simple example of the statistical theory behind assessment,but I have no idea whether their product / approach is reasonable or not. Building a good assessment is much more than just statistics, and it sounds from other comments that there are serious concerns about the validity of their tool.
Really enjoyed the build up from simple cases, to more complex models!
If you're interested in the statistics behind estimating skill, and how well questions tell apart novices from experts, check out item response theory :)
My experience hiring machine learning talent over several years has been that people over-hype the cost of a false positive. Both the article's false positive (expending the cost of interviewing on someone who ultimately reveals to be the wrong fit) and also a more fundamental false positive: actually hiring someone who would hypothetically fail a lot of these interview pipelines.
The discussions about making these pipelines more quantitative, with assessments and quizzes, always couches it with a tacit assumption that the worst outcome would be to actually hire someone who fails at one of these interviews. Rejecting a good person sucks, as they say, but not as much as hiring one of the multitude of sneaky, low-skilled fakers out there.
And of course, everybody's got their hot new take on how to spot the supposedly huge population of fakers.
What I have learned is two-fold:
(1) That person who aced all your interviews and finally looked like the perfect person to hire probably just spent 3-6 months utterly failing at a bunch of other interviews, just to get into "interview shape," refresh on all the nonsense hazing-style whiteboard trivia about data structures that they had never needed in years at their job, etc. So it's totally asinine to believe that someone passing through all your filters must be the sort of person who would rarely fail some filters. That person almost surely did fail filters, and the companies where they failed believe they dodged a costly false-positive bullet, while you believe you just made an offer to the greatest engineer. Hopefully you can see the myopia here.
(2) The cost of passing up a good-but-failed-at-interview-trivia engineer is often far greater than the cost of hiring them. For one thing, "suboptimal-at-interviews" engineers are pretty damn good engineers, and they can do things that differ from esoteric algorithm trivia, such as helping your business make money. Another thing is that many engineers can generalize what they learn, generalize from example code or templates, etc., very efficiently. So while they might reveal a weakness by failing part of an interview (and everybody has such weaknesses), why do you really care? They can probably become an expert on that weakness topic in a matter of months if they work on it every day, or if you have existing employees who can mentor them.
But the biggest thing is part of what Paul Graham wrote in "Great Hackers": good engineers tend to cluster and want to work with other good engineers.
So if you're sitting there without already having a few good engineers on your team, then most likely, the cost to mistakenly rejecting a great candidate who happened to have a bad day, or a great candidate who happens to hate writing tree algorithms on whiteboards, leaves you running a huge risk of losing out on a good engineer who could help kickstart the phenomenon of getting the next good engineer.
When your team is in this stage, you absolutely can manage with a few "dud" hires who need a lot of help or who have skill gaps in key areas. The cost of adding them to the team and managing their "suboptimality" is far less than the continued search costs brought on by rejecting good candidates with and overly risk averse hiring threshold, and leaving your team in the state of affairs where it still doesn't have a good engineer to help attract more.
In other words, the loss function penalizes false negatives more severely than the combined penalty from effort spent on true negatives and suboptimality / management costs of false positives.
But all these skeezy interview-as-aservice businesses what you to believe that the opposite is true, that if you accidentally hire a "faker" because your hiring process was too easy, then Cthulhu is going to rise out of the sea and lay waste to your company.
Of course they want you to believe that. That's how they make money. Preying on your fears over what would happen if you just unclench and treat candidates like human beings with strengths and flaws and don't hold them up to ludicrous standards that lead to self-selecting macho 22-year-olds getting hired because they just spent 10 months on leetcode.
When you start to realize this, it becomes obvious that onerous code tests, brainless data structure esoterica, hazing-style coding interviews, and especially businesses that offer to outsource that nonsense, like TripleByte, is all just snake oil junk.
On Reddit right now they're running this weird faux-linear-algebra thing where they imply that they can build a vector of your skills vs. a vector of job requirements and get a meaningful answer via the dot product.
Which is a bit like saying you can predict the weather by taking the dot product of a vector of ocean and air. What even are the units? What does any of this even mean?
Hiring is a challenging, multi-dimensional thing. It involves a high-risk and ideally informed decision by multiple parties. Doing it effectively is hard. Doing it effectively and respectfully is harder still. And yet TripleByte comes in and says, "We sound vaguely like machine learning. We got this."
Honestly, they make HackerRank, which was another extremely sketchy organization making a lot of very questionable decisions, look reasonable by comparison.