Reference checks are the most-skipped step that almost every SMB hiring manager still claims to do. They’re skipped because the unstructured version of the call produces almost nothing useful, and the structured version takes effort nobody has scheduled. What’s missing is the question set in the middle.
At Join we run reference calls ourselves. The hiring manager makes them, not an HR layer. The framework below is what we use, condensed for SMB recruiters who don’t have a talent partner running the loop in the background.
The validity gap
The most-cited number against reference checks comes from Hunter and Hunter’s 1984 meta-analysis, later folded into the broader Schmidt and Hunter (1998) review of 85 years of selection-method research: unstructured reference checks predict job performance at a validity coefficient of around .26. For comparison, that’s roughly the same as years of education (.10) plus a small bonus, and meaningfully worse than structured interviews (.51) or work samples (.54). On those numbers alone, you can see why teams quietly stop running them.
But that .26 is the validity of the unstructured call. The version where the hiring manager phones the candidate’s last manager, makes small talk, asks if they’d hire the candidate again, and gets a vague yes. That call is the one that justifiably gets skipped.
Hedricks and colleagues (2019) in the International Journal of Selection and Assessment examined how reference-check practices have evolved with structured frameworks, electronic data collection, and consistent scoring. Their finding: structured reference checks reach criterion-related validity in the same range as personality tests, assessment centres, and biodata. The difference between .26 and “rivals personality tests” is not whether the candidate has a manager willing to take a call. It’s whether the questions on the call were written before the call started.
What the industry actually does
SHRM’s research on reference-check practice puts the share of employers conducting some form of reference check at around 92%. Most run them unstructured. Most extract little signal. Among the patterns SHRM reports, the most common failure mode is the same one Hunter and Hunter were measuring forty years ago: phone the reference, ask a few open questions, hear generic praise, hang up.
The thing 92%-doing-it tells you is that reference checks are still standard practice: your candidates expect them, your peers run them, and skipping entirely puts you outside the norm. The thing it doesn’t tell you is how much of the 92% is producing usable hiring information.
Questions that work
Four categories carry almost all of the signal in a structured reference call.
- Specific past behaviour. “Walk me through a time when X had to ship something on an unrealistic deadline. What did they actually do, who did they talk to, and what was the outcome?” This is the reference equivalent of a behavioural interview question, and it surfaces the same kind of signal. The reference can either recall a specific instance or can’t. If they can’t, the past relationship was probably not what either the candidate or the reference claims.
- Peer / manager triangulation. Two references from the same period, ideally one peer and one manager, asked the same structured question, will tell you whether the candidate’s account of their own work holds up to two different vantages. Disagreements are interesting; total agreement on a glossy version is the standard reference-check noise.
- Growth trajectory. “What did X get noticeably better at while working with you? And what would they still need to improve to take the next step?” The reference’s answer here distinguishes between people who grew in role and people who simply persisted. The second half (what would they still need to improve) is the part the reference is least primed for and most likely to answer truthfully.
- The back-channel question. At the end of the call, ask: “Who else worked closely with X that I should talk to?” If the reference offers two more names readily, that’s signal. If they hedge or repeat the existing names, that’s also signal. SHRM has written about backchannel references as a distinct practice and reported, in one of their studies, that companies that combine formal references with informal backchannel ones reduce early turnover by roughly 22% versus formal-only. Doing this ethically is non-trivial; doing it well is high-yield.
The shape of the call: five to seven minutes of small talk and context-setting, fifteen minutes on the structured questions, three minutes on the backchannel ask. Twenty-five minutes total. The candidate has consented in advance. The questions are written down. The notes go into the scorecard, not a freeform memory.
Questions that don’t work
Four categories produce no useful signal and should be cut.
- Closed-ended yes/no. “Was X reliable?” / “Was X a good team player?” The reference will say yes. The call has produced nothing.
- Generic praise prompts. “Tell me about X’s strengths.” Will trigger the standard reference monologue (works hard, gets along with people, asks good questions). Strong candidates and weak candidates produce identical reference monologues. The signal is zero.
- Hypothetical futures. “Do you think X would do well in our role?” The reference does not know your role. Their answer is performative.
- “Would you hire them again?” Legally fraught in many jurisdictions, easy to answer evasively, and even an enthusiastic yes correlates poorly with the candidate’s likely performance in your specific role. The question feels meaningful and produces almost no information.
The test for each question: does the answer differentiate strong candidates from weak ones, or do you expect a similar answer regardless of who the candidate is? If the latter, cut the question.
The 22% lever: backchannel references
Backchannel references are people who worked with the candidate but who the candidate did not nominate as a reference. Usually a former peer, a former manager from a role two stops back, a customer-facing counterpart. Their value is precisely that they were not selected by the candidate to deliver a polished narrative.
The ethical version of the practice:
- Ask the candidate first. “Are there other people we could speak with about your work?” Many candidates will offer names without hesitation. Those names are technically still backchannel relative to their primary references.
- Ask references during the call. The “who else worked closely with X” question above. This routes backchannel discovery through the formal reference, which keeps things consensual.
- Don’t go around the candidate. Cold-calling a former colleague the candidate hasn’t mentioned, without their knowledge, is the version that creates legal and trust problems. The 22% turnover-reduction signal in SHRM’s data is for the ethical version; the unethical version creates a different category of risk.
For SMB roles where the team will live with this hire for years, the backchannel layer is often the highest-yield part of the entire reference process. It’s also the part that requires the most discipline to do without crossing the line.
When to skip the reference check entirely
Reference checks are not free. The candidate spent social capital to line up the references. The reference spent 25 minutes. You spent 25 minutes. There are situations where the signal is already there and the call adds nothing.
- Internal candidates. You already have the reference data. Run the structured-interview loop instead.
- Contract-to-hire conversions. Three to six months of working with the candidate dominates anything a reference can tell you. Skip the call; run a structured 30-day review instead.
- Post-trial-day decisions. If the candidate has done a paid trial day (or a real work sample equivalent), the signal from that day is more valid than the reference call. Spend the time on the trial-day debrief.
- Roles where the work itself surfaces in the interview. Engineering hires with public code, designers with portfolios, sales hires with quota numbers and named accounts. The reference still helps, but as a tie-breaker rather than a primary signal.
Conversely, the situations where the structured reference check is highest-yield: first management hires, finance / legal / compliance hires, customer-facing senior hires, and any hire where the previous role’s behaviour (not skill) is what you’re underwriting.
Compliance corner: what you can and can’t ask
The legal landscape varies materially across the EU. A few floor rules apply broadly:
- Written consent from the candidate is required before contacting references in EU/EEA jurisdictions under GDPR. Most candidates expect this; the consent is usually implicit in the application or explicit in the offer process, but it should be documented in writing somewhere in the file.
- The reference’s own consent matters too. They’re a third party whose personal data you’re collecting. Be clear about purpose and retention.
- Avoid questions touching protected characteristics. Age, religion, family situation, health, union membership. Even if asked indirectly.
- Country specifics: in DE, the AGG (Allgemeines Gleichbehandlungsgesetz) shifts the burden of proof to the employer (§ 22) once a candidate presents indicia of discrimination, and § 15 sets the compensation regime. In FR, the RGPD and the droit du travail apply; in ES, the LOPD-GDD adds local nuance.
- In writing vs by phone: the structured call leaves notes (your scorecard) but not a transcript. The phone format is generally lower-risk than email for the reference, because verbal answers can be calibrated for tone and verified for context.
Where the legal stakes are high enough to matter, run the call yourself and document the structured questions you asked. Don’t outsource the call to an unstructured chat with a former colleague-of-a-colleague.
What this looks like in practice
For most SMB hires Join’s customers run, two references per finalist (one manager, one peer where available), 25-minute structured calls, the four-category question set above, and notes written into the same scorecard the interviews used. Skip the reference step entirely for internal candidates and post-trial-day decisions. Treat the backchannel question at the end of the call as a deliberate practice, not an aside.
The lesson from forty years of meta-analyses, condensed: write the questions first, the same way you write the structured-interview questions. Calibrate before debrief. The framework is the difference between .26 validity and something much closer to useful.