At Kellogg they used to say it's best to either be the first to be interviewed or the last—assuming you're a decent candidate and well prepared to begin with.
First because you set the benchmark, and your outstanding qualities become "requirements" for the candidates that follow to meet that benchmark.
Last because of recency bias, so whatever qualities you have are better recalled by interviewers.
Everyone else becomes somewhat forgettable.
(I would guess they mentioned a study about it, but it's been 10 years and I don't have the reference handy.)
In my experience as an interviewer, everyone in the middle does get somewhat mixed up together, especially when I had less than 5 minutes to scribble notes and reset between candidates.
But I would modify "first" and "last" to "towards the start" and "towards the end" e.g., I will (subconsciously) more easily benchmark candidates against a very strong "second interview of the day" than against a lackluster "very first candidate", if that makes sense.
Said differently, whoever is the first candidate that hits it out of the park becomes the benchmark. And whoever happens to be the best relatively strong candidate is more easily recalled than other relatively strong candidates.
A decent way to normalize this effect is to take relatively detailed notes of the exchanges, and make the decision a few hours or a day later rereading the notes.
One of the issues we had interviewing very young candidates (like fresh out of college) was how they all looked awkward, even in their outfit, and few had any kind of confidence during the 15~20 min we spent with them, with some saying really weird things (stuff like "I'm really good at the internet"). But obviously non of that matters long term, we assumed they'd probably all fit in fine once in. Putting some distance and picking up the good and bad stuff from the transcript helped a lot to get past the weird impressions, including the order we saw them and how tired we were when we saw them. To some point.
>Recent research suggests that judgmental anchoring is mediated by a selective increase in the accessibility of knowledge about the judgmental target. Anchoring thus constitutes one instance of the judgmental effects of increased knowledge accessibility. Such knowledge accessibility effects have repeatedly been demonstrated to be fairly durable, which suggests that the effects of judgmental anchoring may also persist over time. Consistent with this assumption, three experiments demonstrate that judgmental anchors influence judgment even if they were present one week before the critical judgment is made. In fact, the magnitude of anchoring was found to remain undiminished over this period of time.
I didn't get access to the full text, but had a look at other papers from the same researcher [0] on what kind of methodology they use.
In the case of recruiting, I think the main factor when moving the decision further down the line is the change in information ("a selective increase in the accessibility of knowledge about the judgmental target"), in two specific ways:
- we actually remember less about the subject, for better or worse. A candidate might have had a weird look, and the notes are probably impacted by that bias, but we can look back at their coding test without that impression and come out with a slightly different conclusion.
- we get to compare to other subjects in a different order. In particular, that helps catching weird expectations. For instance if every candidates has been falling through the same trap, it helps give them a pass and assume the question was at fault. If we had to do that in real time, only the last few ones would get a kinder judgement.
I've never had such a thing but many years ago, not long out of university, in my previous career as an electronics engineer I was asked to design a simple amplifier before the interview proper. The interviewer explained, slightly apologetically, at the end of the interview that he did this just to sort out those who were good at talking but didn't have a thorough grounding in the basics from those who were well grounded but perhaps not so good at blowing their own trumpet. I was pleased to find that I passed that part with flying colours :-)
But I would not want such things to be taken very seriously unless you trying to fill a very narrowly defined post because it is all to easy to create a test that a good candidate would fail.
I think they're very valuable if the position requires any coding at all.
In particular very simple tests (like an API interface, or reversing a string etc.) done in any language they feel comfortable is are usually a trove of info about the candidate. The result doesn't really matter, it doesn't need to run, it doesn't need to be complete, as long as you got to hear a lot about how the candidate thinks, how he moves through the problem, and how much they can write something basic, what they're confident in and what they're not used to do etc.
which makes sense, if someone made an impression on you that impression doesn't disappear in just a few days. At best it may be fuzzier, which could be good or bad.
Interestingly enough, yes, but you also understand it reading the notes. For instance they become sparser and sparser, or tendencies arise.
I would compare it to reviewing one's code a few hours later. We're still in a similar mindset, but there's a bit more distance, and we also catch the bits that don't make sense when reading back afterwards. That works even better when exchanging notes afterwards.
Yes, we hired a few that stood out. They were indeed kinda weird for a few months, some got blander afterwards and some stood a lot more, but all in all they were meeting the bar we had in mind, and the ones that really grew weren't those we expected at first.
In particular we had people who's surface personality were completely different from what we perceived during the interview. Not in any way that made it hard to work with them, but moving from university to a corporate environment was just enough of a gap to change their behavior in significant ways. I think hiring fresh graduates is way harder in that respect, and we were happy to have some flexibility in the work culture. One of the guy moved from a super rural area to be thrown into the megalopolis, and it was a real journey, we had the funniest late to work excuses ("couldn't find my bike cause I parked it near the neighbor's house and he moved it in their garage thinking it was his son's" -- later found his house had a bike parking behind the building)
What I learned in my years as proposal manager was that it is always best to get the first slot amongst the other bidders for your presentation and the last for the negotiation.
I remember while hiring for a new team member commenting about the first candidate that we interviewed that if we had seen him later in the sequence we likely would have hired him but coming first, he just seemed “ok.”
>Semantic Anchoring in Sequential Evaluations of Vices and Virtues
>How do people evaluate sequentially presented items? Prior research suggests that sequential evaluations are subject to anchoring biases, such that the values of subsequently evaluated alternatives are assimilated toward the initially considered option. The present research argues, however, that sequential valuations often lead to contrast rather than assimilation effects, whereby values of the subsequently estimated alternatives are distanced from the initially evaluated option. These contrast effects are attributed to semantic anchoring, which stems from evaluating conceptually related options classified into opposing categories (e.g., vices and virtues).
The secretary problem only applies if you have to reject candidates before you’ve interviewed them all. Specifically it’s formulated as each candidate needs to be either accepted or rejected immediately after their interview.
That situation obviously doesn’t apply if you’re interviewing many candidates on the same day and comparing them to each other. As a result to optimal strategy for the secretary problem also doesn’t apply.
Their recommendation was to either be first or last. I guess my own twist is you don't have to be the very first or last—you have to be great and to try not to be right in the middle of the pack.
But above all, do well and don't overthink this stuff as there's no scientific rigor. It doesn't help to be first or last and suck at it.
Someone's bound to mention it in this thread (or I hope they do, because I thought it was great but can't remember the name or the details) - but there's some formula that's roughly like:
N = total applicants
1. Interview N/10, decline no matter what
2. Hire the next person who's better than everyone seen so far
As I recall it was slightly more complex than that, perhaps only to give it an impression of being rigorous theory, but that was the gist of it.
This only applies if you have to make a hire/skip decision immediately after each interview and can't go back later. (The formulation I usually see is a princess choosing a suitor — for pride/face reasons, she can't pass on a suitor and then go back if it turns out he was the best of the bunch.) That's not typically how job interviews work, though.
I think the magic number is 1/e of the applicant pool (e = Napier's constant, 2.7...)
That mathematical problem (the secretary problem) is often cited, but it's not how real hiring works. The problem defines the only success criterion as hiring the single top candidate, and everything else fails. That's not at all what you're looking for in the real world - there are a spectrum of results, where any one in the top decile will be great and the next decile only slightly lesser and so on.
One-dimensional ranking is one aspect of how it breaks in real-world hiring, but I think almost the bigger might be that the secretary problem assumes you need to decide immediately and can't just wait around until you've seen the whole batch of applicants.
The main difference is that you're trying to solve the problem of turning an interview into a way to choose between candidates, while the secretary problem assumes you already figured that out and is trying to solve something different entirely!
This is the secretary problem [1] from optimal stopping theory, but it doesn't apply to hiring because you can wait until the end to make a decision once you've seen everyone.
The main problem here indicated by OP isn't explore/exploit, it's that the ordering of candidates seemingly distorts their rating of them.
Put another way, we have a tendency to like ppl, but once we calibrate, we start penalizing ppl. But without context, we round up. We seek positive attributes first before resorting to negative, and only in aggregate. It's a local-first favouring tactic. local networks (maybe small-world) are maybe more stabilized by the up-rounding.
Or maybe it's also more about calibration over time.
Most of our social technology is about not being shitty to one another at scale (or maybe "in massive sequence"), so this seems aligned with my understanding of the world. The work of progress is to "not be shitty" at increasing scale (of population, idea complexity, levels of abstraction), and we build institutions that mostly try to do that. Though I think digital has kinda failed on that mission lately, which is another conversation
Seems kinda nice and adaptive and optimistic even. Though yes, downsides in a society that lives at scale, if not mitigated by process or social/digital tech.
It seems like maybe abstraction runs contrary to the concept of not being shitty, because while abstraction is meant to neutrally essentialize people, as often as not, the abstraction process is used to identify aberrations for individual treatment, rather than to streamline creating individual treatments for all.
Put another way, we typically tend to sort first for assholes, then maybe sort for other things later.
Ah interesting. I guess when I think of abstraction, I think of a boundary over which complexity is reduced, and therefore information is lost. My more personal sense is that a ton of human misery comes from poorly chosen abstraction. Or abstraction where there should be none.
The idea of "the tyranny of the database" was where I first encountered this:
> At a higher, more semantic level, a subtle distortion in how we perceive reality took hold: things that were hard to represent in databases became alternately devalued and fetishized. [...] Once in awhile a technical counter-current would take hold and try to push back on the tyranny of the database, but the general trend held firm: if it does not fit in the database, it does not exist.
> You may not think you know this world of databases, but you live in it. [...] Every time a customer service assistant shrugs and says “computer says no” or an organization acts in crazy, inflexible ways, odds-are there’s a database underneath which has a limited, rigid view of reality and it’s simply too expensive to fix the software to make the organization more intelligent.
Or: people who are hiring their future co-workers sort for people who will not make their life hell(or their co-workers life hell), and then evaluate for actual expertise.
Heh I like the sentiment, but this feels to me a little bit odd, like saying "evolution is a cold ruthless inhumane process". There is no human (the root of all things "humane") without the evolution being critiqued by the very human values it made possible :)
What you're calling "primitive" (perhaps with judgement) is part of the system that perpetuates our collective socio-biological process in poorly understood ways.
For all we know, liking ppl until we're at scale is highly adaptive (in an information-theoretic sense) for sociality as a whole, and not some broken primitivity :)
Well, evolution doesn't care anyway if there's a human - much less a humane human, it's indeed a "cold ruthless inhumane process". Evolutionarily speaking we could be replaced by cockroaches in a heartbeat if the conditions arise
> The participants described the first few individuals quite positively, using an average of 6.2 positive words each. But as they progressed through the sequence, their descriptions became significantly more negative, dipping to an average of just 4.7 positive words by the 20th person.
I don't have access to the study, but I'd be curious why they chose to count positive words as opposed to just asking people to rate on a numerical scale? My impression is that sentiment scoring via bag-of-words is not a particularly robust method, especially in 2024. It also sounds like they didn't normalize by description length, so outcome could just as easily be because people's responses got shorter as time went on due to fatigue?
(also, this is a nit with the article rather than the study, but given the methodology I think it is important to distinguish between becoming less positive and becoming more negative, and in this case I would not describe using fewer positive words as "became significantly more negative")
Yeah - antidotally I feel like earlier in a hiring process people are apt to expound on candidates fit for portions of the role and in later stages it basically just thumbs up or down and only more exposition in the case of disagreement
Yeah, when you don't have anything to compare them with you need to describe them. But once you have some strong candidates to compare them with everyone just becomes "are they better or worse than X".
I'd imagine this is tricky. Today everyone in white collar culture is so guarded with language so as not to offend. You have to have an extremely sensitive ear to understand that mildly positive words actually mean extremely negative.
I'd really need to see this study reproduced several times before I take it seriously at all. Ideally there'd also be other similar but non-identical studies pointing to the same effect.
I don't think any of the common problems[1] are relevant for this specific application (understanding the importance of serial position). Like I said, I'd be curious to understand the motivation behind their methodology choice.
>Consider the following scenario: A reviewer evaluates many unqualified applicants for a university program successively, and the next applicant to be reviewed is an average (borderline admit-reject) applicant. Because the evaluator is influenced, or anchored, by recently made decisions, this borderline applicant might be admitted to the program. On the other hand, when an evaluator is anchored by having reviewed many qualified applicants, the same or a similarly borderline applicant might be rejected (Figure 1). In this scenario, individual fairness, stating that individuals with similar characteristics should be treated similarly [8], is impaired, and wrong or inconsistent decisions can have a consequential impact.
I think a flip side I've seen is when the first applicant through an interview loop has basically no chance of getting an offer because (a) people know they're not calibrated and (b) "surely the odds are low that the very first person would be the right person"
This kinda means it's a waste of the candidate's time unless they're _also_ just interviewing for practice.
It really depends. A lot of teams don't want to waste their time reviewing a bunch of candidates and will pick the first person who impresses them. Not saying they won't go through with whatever other interviews they had scheduled, but if the first person knocks it out of the park and the remaining 3 candidates who make it through the interview cycle that week are less impressive, they'll just make an offer to the first person rather than continue interviewing
I just accepted an offer last week. I was told I was the first to interview and they cancelled the others because I was a perfect fit. Interviewed Friday morning, offer came Friday afternoon, I accepted that evening. /anecdata
This is likely only because the initial candidate was already thrown out of the pool, so the 15 day later candidate might as well become first in line at that point.
But the candidate usually never knows what position they are in line. Sure the person on the other side of the table could divulge, but the candidate shouldn't really trust that information.
A person that is aware enough to know they are not calibrated should also not make the mistake of attributing higher/lower odds to something that is mutually exclusive, assuming that the interviews are not systematically set up in a way to put worse qualified candidates first.
This is probably BS pseudo-science but I read a book years ago that talked about the four primary types of shoppers:
1. Tightwads - spend as little as possible (maximize savings)
2. Spendthrifts - spend everything they have (maximize pleasure)
3. Optimizers - who want the best deal possible (money for value)
4. Satisfiers - who set a bar then pick the first option that meets it (minimize decision time)
You see these behaviors in all sorts of decision-making processes as well, so if the hiring manager is a Satisfier type and the first person who walks through the door is great, they'll likely hire them.
If the hiring manager is a tightwad or an optimizer though they'll almost certainly force many candidate through to collect data before making a decision and that first candidate has little chance unless they're truly exceptional.
Alternate explanation. The more bachelor contestants you watch, the more patterns of shitty behavior you identify. One could conceive of a more representative population than influencers trying to make money on the bachelor.
I question describing this as a cognitive bias, let alone an unconscious one, and the experimental protocol used here is awful for attempting to show that it is.
The examples they give: Job interviews, dating, auditions, are all cases of search for the best out of a set of possibilities. The earliest candidates have the initial advantage of there being few in the set, or none, for the first candidate. Since the purpose is to compare the entire set, by definition, the first one is the best you've seen. Even if they objectively aren't great, the rating at that time will tend to be more positive. That isn't a bias, it's a best-effort judgement in the face of incomplete information. You'd need a crystal ball not to do this.
What they're measuring is immediate impressions, while what they should be measuring is decisions made after the search concludes. I've certainly had the experience of interviewing a candidate for a job and telling myself "yeah, might be ok", then a later candidate comes by and I adjust that to "no, this one is much better". If they want to show that being first or early in such a sequence makes it more likely someone will get picked, they've failed to do so with this protocol.
I had the same thought regarding decisions vs impressions.
For what it's worth, my preference during my time as a working actor was to audition first thing after lunch. I figured (based largely on my own experiences as an auditor) that the first people in the day would be forgotten / calibrated against, and those later in the morning would be victims of fatigue, whereas directly after lunch I'd have the best chance to make the best impression on fresh auditors. I had objectively excellent rates of success, so figured that method was good enough. <shrug>
I don't think that generalizes to more extensive interview processes. Auditions are weird: you get a "hello", then five minutes (max) to do two monologues, or one monologue and a song. It's basically a cattle-call, and very hard to stand out. Sometimes the "hello" is actually the most important thing, sometimes what you're wearing is what makes the best impression. As an actor you have essentially zero control, and yet that gives you utter and complete freedom.
Still: first thing after lunch. Works every time, except when it doesn't.
I can't tell from the linked article whether the ratings/descriptions were made immediately, but if so you are correct. It's just not applicable to those examples. This only applies in a "hired on the spot" type of evaluation.
I'm a psychologist, and though HN usually tends to be tough on psychology as a science, I'm amazed that this paper is being discussed so earnestly. Social science in particular should require a bit more skepticism.
> To test this, the researchers conducted a number of studies. In one, they had 992 participants (recruited from Prolific Academic) describe 20 people based on their Facebook profile pictures.
Alternative interpretation:
Paid participants who most likely just want to do the study and get paid get more and more cranky the more pointless words they have to write about unknown people’s Facebook profiles.
How well this actually reflects phenomenon in the real world is actually unknown.
Unrelated but sort of not. If you can don't ever do a job interview after lunch on a Friday.
It will never go well. No one wants to be there and is thinking about the weekend.
Would love to see this get reproduced. Until then, I'm not sold. Feels like people have been arranging themselves in sequence for hundreds of thousands of years at least; you'd think somebody would have noticed if this effect were strong enough to be measured.
It's an anecdote data, but a lot of concert/animecon/othercon people around me knew of similar effect for years.
The opening acts and the last acts are the most sought-after acts for performers because they are the most memorable and on average rated more highly
This is exactly the sort of popular-psychology thing that fails to replicate, isn't it? I'm pattern-matching it against "ego depletion" and the hungry-judge theory, both of which failed to replicate.
No this is well studied, although you're right this is a "popular-psychology thing" since ctrl+f "anchoring" returns nothing in the discussed study paper.
>A number of studies have shown the robustness of the anchoring effect. Anchors can influence judgment even after weeks or months (Reference MussweilerMussweiler, 2001; Reference Yoon, Fong and DimokaYoon & Fong, 2019). The anchoring effect is present even in experts in the judgmental domain (Reference Englich and MussweilerEnglich & Mussweiler, 2001; Reference Englich, Mussweiler and StrackEnglich et al., 2006). An anchor can influence subsequent judgment even if it is clearly implausible (Reference Strack and MussweilerStrack & Mussweiler, 1997) or when it is compared to a different object (Reference Frederick and MochonFrederick & Mochon, 2012; Reference Mochon and FrederickMochon & Frederick, 2013).
I too take an absolutely defensive attitude towards results like these which don't get replicated half the time and are trivial to fake. I'm full on "assume it's fake or bunk until proven otherwise" when it comes to behavioural psychology because the field has such systemic integrity and reproducibility problems. There's no real systemic replication or anti-fraud efforts so why should I trust anything coming from this field?
I'm not trusting any result like this unless it comes from a meta-analysis really. I have heard of this general idea before, that the first and last to interview are more likely to get a job, but the plausibility of the finding doesn't make it any less suspect.
I think it depends a lot on priming, IQ, memory capacity, ability to notice details, etc.
Donald Trump tends to like the last person he spoke to best, but probably because that's the person who comes to mind when asked, and anyway he doesn't really care one way or the other.
Unless the topic of conversation was himself: Donald Trump, in which case he cares a lot, will remember details, and will like whichever person said the nicest things about him, regardless of sequencing.
Agreed, I like to go in the first ~20% but not actually first. That gives you a chance to quickly make any changes from good qualities you noticed in the first ones, but get it done before fatigue sets in. And it's just nice to have it out of the way so you don't have to stress!
This was kinda known I think. I remember also reading about blood sugar levels and risk taking behavior. If you are interviewing / auditioning just before people's lunch time, tough luck. They are decision fatigued, hungry, low blood sugar so their risk taking threshold is significantly higher than how it was when they started, so if they thought there were promising candidates from when they were fresh, well rested and not hungry, it is unlikely that you will surpass them even if you are objectively better.
If you are their first after the lunch however, that would again be a very good spot.
Let's be clear that this was a sequence of unpleasant characters. A reality-TV game show is not the same as a sequence of job interviews. There may not be much to extrapolate here.
Could be worth mixing this with memory. This study had people rating as they went. If they rated after seeing everyone perhaps the ones later would be less negative?
I wonder if this is more due to the pressure to be descriptive than because of any sort of internal dialogue. If I’m being asked to do a study I feel that I need to contribute by adding something ‘new’ and ‘substantive’ for each question (which ultimately leads to choosing negative qualities to describe a person), but if I weren’t expected to have an answer I would’ve had more boring thoughts about the questions.
This is why we must stand out in some way, for example, to a future employer when applying for a job through a big competition. And how important luck is in many similar situations. Therefore, winning a competition, as well as getting or not getting a certain position, does not define you
I changed my name when marrying and moved from the end of the alphabet to the middle. I wondered what effect it would have: am I now more likely to get a cookie before they run out? But what if it’s bullets — they should run out before they get to me.
Spoiler: I’ve not been able to notice any difference.
In a rapid fire succession, I can see this happening. I personally call this “I just want to get this over with” bias. In an interview setting, I have never completed more than 1-2 interviews in a single day so this doesn’t apply. Maybe hiring managers or recruiters have this bias more than me.
A related concept I noticed is that we are "wired" to identify with whoever is presented as the protagonist, even when the information is readily available to recognize them as the antagonist.
EG - I recently watched Peter Pan for the first time, as an adult. Peter is basically a horrible person who deludes and kidnaps children, puts them in incredibly dangerous situations, doesn't care about people other than to make himself look good (eg, he fought the pirates who had kidnapped the Indian princess, then he got so distracted w himself she nearly drowned in the tide). On the other hand, we're told that Hook and the pirates are the bad guys but really Hook is just responding to the fact that Peter had maimed him and fed his hand to a crocodile. There's no indication of anything bad Hook has done prior or since, other than trying to "get" Peter for having done that.
The protagonist of Peter Pan is actually the dad who was trying to protect the kids from nonsense but even he got derailed by the wife, therefore leaving the kids vulnerable to all that has happened to them.
But if you ask most people who are familiar with the story, they'll react to the superficial presentation that Pan is whimsical and aspirational for children while the Dad represents the dull adult world to be escaped from, or something like that.
The crux is that the narrative form suggests how we should feel about whom even if that makes no sense. I think this is what the article here is talking about as well in a different manifestation.
> The story is old, but at the time written, anyone would be a hero for maiming, or killing a pirate.
I’m going to contest this. Peter Pan first appeared in 1902[1], by which time the romanticization of pirates was well under way. The Pirates of Penzance, for example, debuted in 1879[2].
People write movies about scummy, murdering, thieving, amoral gangsters too, going into great detail about their "humanity". So?
This doesn't mean they're good people, and that 120+ years ago they wouldn't be put to death. Anyone killing a murdering, scummy gangster would be a hero.
Just because someone writes a nice fictional comedy story about a scum of the Earth group of pirates, doesn't mean that all pirates are OK. Ask the villagers that get raped, murdered, and killed by pirates what they think.
Pan is not a child, but a predator who "refused to grow up" - and the whole "trying to kill him" is clearly the result of Pan cutting Hook's hand off and sadistically feeding it to a crocodile (IE, while we don't know what the battle between Hook and Pan was about, the fact that Hook celebrated his win by maiming Hook and humiliating him this way is a good datapoint into his character.)
It seems like you're on Pan's side, which I think is in line with the point the comment you're replying to makes.
First because you set the benchmark, and your outstanding qualities become "requirements" for the candidates that follow to meet that benchmark.
Last because of recency bias, so whatever qualities you have are better recalled by interviewers.
Everyone else becomes somewhat forgettable.
(I would guess they mentioned a study about it, but it's been 10 years and I don't have the reference handy.)
In my experience as an interviewer, everyone in the middle does get somewhat mixed up together, especially when I had less than 5 minutes to scribble notes and reset between candidates.
But I would modify "first" and "last" to "towards the start" and "towards the end" e.g., I will (subconsciously) more easily benchmark candidates against a very strong "second interview of the day" than against a lackluster "very first candidate", if that makes sense.
Said differently, whoever is the first candidate that hits it out of the park becomes the benchmark. And whoever happens to be the best relatively strong candidate is more easily recalled than other relatively strong candidates.
Thank you for coming to my TED talk