Dispatch: January 16, 2024

How should interviews inform your hiring? Very, very little

What do we know about the value of interviews?

Interviews, especially unstructured ones, can be deeply misleading: an applicant can excel at the demands of an interview but not equally excel at the demands of the actual job, yet it’s intuitive to wrongly think otherwise.

A majority of surveyed hiring managers believe that the interview is the single most diagnostic screening method available.1 A body of several hundred studies over the past thirty years confirms that the in-person interview heavily influences how applicants are rated and, ultimately, their likelihood of being hired.2

Unfortunately, the overwhelming evidence also indicates that interviews, especially unstructured interviews, are at best weakly predictive of future job performance.3 A meta-analysis by B arrick, Shaffer, & DeGrassi (2009) found that an interviewer’s rating was strongly influenced by three applicant factors during an interview—the applicant’s appearance, impression management, and verbal/nonverbal behavior—but only the latter was notably correlated with later job performance (see Table 1). The factors are sometimes referred to as “self-presentation tactics” (SPTs), reflecting the fact that an applicant can be more or less skilled at deploying the tactics, consciously or unconsciously, to influence an interviewer.

Table 1. Self-presentatio n tactics (SPTs) likely to misguide you. Rc i s the the mean correlation [and 95% CI] with interview rating and later job performance from the Barrick et al. meta-analyses.

INSERT TABLE

Note that the problem is not that skills in the Table 1 SPTs are entirely unrelated to job performance. Skill in each of the SPTs do seem, at least in theory, to reflect underlying constructs that are in fact related to job performance. A candidate self-aware enough to tailor their professional appeal in advance of a business interview, for example, might be more conscientious generally speaking, and thus more likely to be vigilant and reflective on the job. Individuals who are more emotionally stable individuals, generally speaking, are often described as calm, relaxed, and secure5; a cool hand might transfer to more positive nonverbal interview behavior, such as less fidgeting and more eye contact. And there is evidence that verbal fluency tracks general intelligence, with more agile minds being more capable of quickly processing information and generating coherent, clear, articulate responses.6

The problem is that interviewers tend to overestimate the predictive value of each SPT by a very wide margin. And this is a double problem because the overestimation comes at the expense of underestimating factors that actually have more predictive value, factors such as performance on work simulation tasks, prior experience and accomplishments, recommendations, and so forth. Psychologists of decision-making refer to this as d ilution: the tendency for available but non-diagnostic information to weaken the predictive value of available quality information, rather than to be ignored.7 The danger of dilution is not intuitive. People typically welcome more information, with the belief that more information at least cannot h urt—they can just ignore the useless information, after all (is the intuition). Unfortunately, a number of cognitive hiccups can occur. Most notably for present purposes is the hyper-tendency to detect patterns where no patterns exist. Watch a strings of random coin tosses, for example, and it’s almost irresistible to think you see a non-random cause at play when a suspiciously long stretch of heads surface. In the social domain of interviewing this can manifest as overactive sensemaking: the ability of interviewers to make sense of virtually anything the interviewee says, just as random coin flips can seem meaningful.8

An experiment by Dana, Dawes, & Peterson (2013)9 poignantly illustrates. Subjects were given biographical information about a student, including their GPA, and asked to predict the student’s GPA in the next semester. Some subjects were also given the opportunity to interview the student, in one of two conditions: confederate interviewees either answered pre-set questions accurately or, literally, based on a randomization process that spit out nonsense “yes”/”no” replies (unbeknownst to the subject interviewer). Interviewers could not tell the difference and weighted accurate and random answers equally in their interview rating judgments. In either case, subjects who both had access to the student file and conducted interviews were less accurate in predicting future GPA (r = .31) than those subjects who had access to o nly the student file (r = .61). The interviews were not only worthless but actually counterproductive, as subjects proved too adept at making “sense” of even nonsense answers.10

Adding structure to the interview notably improves predictive accuracy; the causal pathway seems to be less about getting better at soliciting good information, but rather more about curbing opportunities to get distracted by irrelevant SPTs.

A structured interview is one with a script designed to measure specific, job-related constructs. The elements of a high structure include:

Evaluation Scale: a standardized, numeric rating scale, with natural language labels (and ideally descriptions, to assist inter-rater reliability).
Behavioral and Situational Questions: scripted behavioral questions to probe past performance (via specific examples) and situational questions to predict future performance (via simulated hypotheticals).
Script Consistency: ask all applicants the same questions in the same way (same order etc.).

Adding structure to an interview can curb the distractions caused by self-presentation tactics (SPTs),11 or even prevent the intentional or unintentional use of SPTs in the first place.12 The Barrick et al. meta-analyses examined structure as a moderator, uncovering that interviewers relied less on all three key SPTs when following a highly structured protocol. The correlation of appearance with positive ratings, for example, dropped precipitously from an rc of .88 to .52 to .18 for low, medium, and high structure; impression management from .46 to .34 to .21; and verbal/nonverbal behavior from .69 to .47 to .37.

Why does structure help? The working theory, as Barrick et al. put it, is that “structured interviews simply give job applicants fewer opportunities to distract the interviewer with information that may be peripheral to the construct being assessed.” First, interviewers maintain control over the content to be discussed. This can limit the amount of air time that an applicant might otherwise “highjack,” so to speak, to spin their favorite self-promoting tale. It’ll help avoid a conversation drifting casually into a dialogue where the applicant learns, and then parrots, what the interviewer wants to hear confirmed. Second, the discipline of adhering to the script and scoring its elements might saturate the attention span of the interviewer, such that he or she pays less attention to SPTs that are present. Third, a script can help avoid primacy or anchoring effects, wherein early parts of a conversation have a disportionate impact on the path along which a conversation would otherwise unfold (since the script prompts return the conversation to pre-planned topics).

But doesn’t the interview at least predict interpersonal skill, at least for jobs where first impressions are key to success?

The Barrick et al. meta-analysis was not able to assess the possible moderating role of the interpersonal demands of the job. There were too few studies reporting on that dynamic. Thus the answer is not clear.

However, even though the sort of work we do at The Lab @ DC is richly interpersonal—politics is personal, yadda yadda—is it actually n ot high on this dimension in the way researchers of this field mean “high.” The theorized range where interviews might be more diagnostic are for extremely interpersonal jobs, and interpersonal with basically strangers. Think of call center employees (an endless series of 15 minute calls with strangers), for example, or a receptionist or a salesperson. We have moments where the first 15 minute impression matters dispositively, but it is actually rare. And when it really does matter, we can deploy someone on the team with this as his or her super skill. The upshot is that a lack of excellence in the first impression is unlikely to be very relevant for most people on the team.

A lack of excellence is different from incompetence, of course. We don’t want outright jerks or bumbling buffoons. But my point is that team-wide we really only n eed one or two all-stars at the first impression “pitch,” but no more than that, since it’s only sporadically that we have to pitch in that way.

What about just basic likeability, which matters for creating a positive work environment?

Life is more enriching when you work with people who are good humored, pleasant, sincere, fun, and so forth. Again the Barrick et al. meta-analyses is silent on whether interviews can accurately predict likeability, due to insufficient research data. So we don’t really know.

From a theoretical perspective, however, the SPTs are likely to cause error-prone predictions on the dimension of likeability as well.13 On the one hand, the interviewer might easily overestimate likeability. An applicant might be able to deliver a favorable first impression, even if lacking the underlying temperament to continue that impression over time and across contexts. That is, an unlikeable person might be able to fake likeability. After all, the point of an interview is to influence someone to hire you, and so all motivation presses to put on as favorable a face as possible. One only need maintain that face for 30-90 minutes in typical interviews, with strangers who know nothing else about you. An extreme example—but it illustrates the point—is that many psychopathic individuals are notorious for their superficial charm and verbal fluency during a first impression, but over repeated interactions the instabilities surface in spades.14

On the other hand, the interviewer might easily underestimate likeability. Key moderators here are likely to be introversion and anxiety. Both shyness and nervousness might be inversely correlated with skill in SPTs. A shy person might speak more softly or divert their eyes for example, features of verbal and nonverbal behavior. A nervous person might suffer a mental lapse or stutter, temporarily displaying lower than normal verbal fluency. A person will become more comfortable once established in the office and personable with his or her coworkers, making this first impression of limited generalizability. When it really matters—over time and across contexts, amidst the team—these initially shy, anxious folk might turn out to be the best team-players and most enjoyable colleagues.

In summary: yes, absolutely, likeability matters—and unlikeability can be a big problem for team cohesion—but it's unknown how much an interview can successfully predict this trait, and there are reasons to worry it might be quite limited. A more promising diagnostic window might be the recommendations, wherein people can speak to an applicant’s temperament across time and context. That psychopath, for instance, is unlikely to have three solid recommenders.

Can you summarize the takeaway insights?

If you're going to use interviews, you need a highly structured interview process. SPTs need to be put in their rightful place. The “rightful place” is not to ignore SPTs entirely, but rather to weight them much lower than how we might intuitively otherwise. My basic insights from the literature go something like this:

The structured interview lets us focus on the c ontent of answers to the questions that we care about. It helps us prevent a conversation from accidentally wandering into non-diagnostic topics, and it helps saturate our attention so we don’t feel as compelled to spin stories about the irrelevant features of the interaction.
We also cared about content in our performance task, so why did we need interviews at all? My basic answer is that, at this finalist stage, we want to: a. confirm that the applicant knows answers deeply, as demonstrated by an ability to explain without notes and in response to probing follow-up questions; b. the simulation task of interacting with a resistant agency partner is only doable verbally; and c. again, some SPTs are slightly diagnostic, and the finalists will likely be distinguished by shades of gray, so slight differences of score can tip the difference.
So which SPTs seem most useful for our purposes? a. Probably just nonverbal and verbal behavior, particularly verbal fluency, and again only a little bit. A candidate who is composed is more likely to possess higher emotional stability. A candidate who speaks coherently, concisely, and articulately is more likely to possess general intelligence. b. Team-wide, having a couple of folk with impression management uber skills is probably useful, but again we don’t need everyone to have this. It’s not a priority of mine for anyone in this applicant pool to have it. 6 of 7 c. Appearance, even professional appearance, seems meaningless. Ignore it. We can teach someone to wear a suit if need be.
An interview might uncover a serious red flag. a. Decision psychologists, in reviewing the limits of statistical models, sometimes speak of the “broken leg problem.” It refers to an example wherein a model might be exceptional at predicting, say, a baseball player’s batting average, unless something unanticipated and catastrophic like a broken leg happened. A human, however, would notice and account for this fact immediately. Hence it is the one, narrow case wherein a human intuitive judgment might outperform a statistical model. b. An applicant utterly incapable of explaining basic concepts; an applicant who is outright offensive and mean; and applicant caught lying on their resume; these and similar facts might disqualify someone from being seriously considered. c. But be very warned though: referring back to overactive sensemaking, people are usually too quick to identify ostensible “broken legs.” A real broken leg should be very serious and very clear.

Next Up

Here is another entry for the dispatch

October 5, 2023