Interviewing Engineers in the Age of AI

There's an irony in how the tech industry hires today. The more fluent an engineer is with AI tools, the less likely they are to have memorized the implementation details of standard algorithms and patterns. They've spent their time on what matters more — architecture, system design, debugging complex interactions, deciding what to build rather than grinding out how to build it. They delegate the mechanical parts to AI the same way a senior engineer delegates boilerplate to a junior.

And yet, under the traditional interview format — write this from scratch, no tools, no documentation — these are exactly the engineers most likely to be rejected. The interview selects against the skill it should be selecting for.

The AI-fluent engineer

▸ Uses AI daily at work

▸ Focuses on architecture & judgment

▸ Delegates mechanical coding to AI

▸ Less memorization of patterns

→

Traditional interview

▸ No tools, no docs, no AI

▸ Tests pattern recall under pressure

▸ Rejected

The better the engineer, the worse the fit

To be fair, interviewing is genuinely hard. An hour-long session is a narrow window into years of accumulated judgment, intuition, and working style. No interview format will fully capture what someone is like to work with over months. That's precisely why the choice of what to test in that limited time matters so much — every minute spent on a memorization exercise is a minute not spent on the signals that actually predict long-term performance.

1. What we test vs. what we need

A typical coding interview might ask: "Implement an LRU cache" or "Write a function that merges k sorted lists." These are well-known patterns. Anyone who has drilled LeetCode or spent five minutes with an AI assistant can produce them. The implementation is mechanical — the right data structure, the right pointer manipulation, the right edge case handling.

The gap

Here's what that interview does not test:

Architectural judgment. Should you decompose the query, reformulate it, use a refinement loop, or just prompt well? The answer depends on error distribution, latency budget, and schema complexity. This is taste — built over years of shipping systems that fail in production, not by memorizing patterns.
Failure diagnosis. A three-step pipeline where each step is 90% accurate produces correct output only 73% of the time. But the failures are correlated in non-obvious ways — a bad decomposition in step one causes a confidently wrong answer in step three that passes validation. Diagnosing this requires intuition no whiteboard exercise reveals.
Boundary judgment. When should the system use an LLM vs. a regex? Ask a human vs. proceed with its best guess? Skip verification to save latency? These decisions determine whether a system works at 95% or 99.5% — the gap between a demo and a product.
Evaluation design. End-to-end accuracy is necessary but insufficient. You need to measure each stage independently, understand error propagation, and catch regression in one component even when another compensates. Almost never tested in interviews.

The mismatch, visualized

Think about skills along two dimensions: how long they take to develop, and how quickly they become visible. Traditional interviews favor skills that are both fast to learn and easy to demonstrate in an hour — but those are precisely the skills that matter least on the job:

The bottom-right quadrant — skills that take years to develop but are observable in an interview — is where we should focus.

How you weight these signals should depend on the candidate. For someone with directly matching experience, their track record speaks — dig into their past systems, the decisions they made, the failures they learned from. For someone without exact prior experience, the interview carries more weight, and the right signal is their intuition: how they decompose an unfamiliar problem, what clarifying questions they ask, how they communicate tradeoffs they've never encountered before. These are the transferable skills that predict success in a new domain. A timed coding drill tests neither.

Matching experience New domain

Evaluate via:

▸ Track record

▸ Past decisions & failures

▸ System deep-dives

Evaluate via:

▸ Intuition & taste

▸ Clarifying questions asked

▸ Communicating tradeoffs

2. What the interview should look like

If the job is building AI systems, the interview should mirror the actual job.

Principles

Let them use AI. Nobody writes production code without AI assistance anymore. How an engineer collaborates with AI — what they delegate, what they verify, what they override — reveals their actual judgment.
Test diagnosis, not implementation. Show a broken system and ask: where is it failing, why, and how would you fix it? This tests reasoning, not recall.
Design under constraints. Present a messy problem with competing requirements. Watch what questions they ask, what tradeoffs they name, whether they reach for complexity or simplicity first.

A concrete format: the two-phase interview

Phase 1 — Take-home

▸ Real, complex case

▸ Full AI access

▸ Time-boxed, unmonitored

▸ Deliver written report

Tests: AI collaboration, problem identification, communication

→

Phase 2 — Live discussion

▸ Walk through their report

▸ Ask why, not what

▸ Change constraints live

▸ Push on uncertainty

Tests: architectural taste, adaptability, failure intuition

The combination tests both depth and adaptability. And because the candidate worked with AI in the take-home, the live discussion naturally reveals how they think versus how their tools think — which is exactly the distinction you're hiring for.

3. Raising the bar, not lowering it

This isn't an argument for easier interviews. It's an argument for harder interviews that test the right things. Implementing an LRU cache from memory is easy — it's a pattern you either drilled or you didn't. Deciding whether your system needs a cache at all, where to put it, what invalidation strategy fits your access pattern, and how to debug it when hit rates drop in production — that's hard. That's what separates a senior engineer from someone following a tutorial.

The engineers who will build the best systems in the age of AI are the ones with the best judgment — about architecture, about tradeoffs, about when to trust AI and when to override it. Our interviews should find those people.