Hiring

Hiring Python Developers: What to Actually Screen For

Resume keywords won't tell you if a Python developer is good. Here's what to actually screen for — the code-review test, role-specific skills, and red flags.

By Tarun Bhukya·May 22, 2026· 15 min read

How to screen Python developers — what to actually look for when hiring

A resume tells you someone has written Python. It doesn't tell you whether they'll make good decisions inside your codebase, for your problem. And that gap — between "knows the language" and "good in your context" — is where most hiring mistakes live.

This guide is for the two people who usually have to make this call: the CTO or tech lead who can run a technical interview but wants a sharper process, and the non-technical founder who's quietly anxious because they can't read code and don't know how to tell good from convincing. You can screen well from either seat. You just need the right questions, not a CS degree.

If you're earlier than the screening stage and still figuring out who and how to engage, our guide to hiring developers for your startup is the better starting point. This one assumes you're ready to evaluate an actual person.

Why Resume Keywords Tell You Almost Nothing

"Knows Django, Pandas, follows PEP 8." Fine. So does almost every candidate who'll reach your inbox. Those are table stakes, not signal.

The keyword list tells you what someone has been near. It says nothing about the thing you actually care about: when this person hits a fork in your codebase — a performance problem, an ambiguous requirement, a tempting shortcut — do they make the call you'd want made? Do they leave the code better than they found it? Can they explain why?

That's what screening is for. Not confirming they've heard of the tools, but watching how they think with them. And the good news for the non-technical founder reading this: judgment and communication are things you can assess without writing a line of code. We'll get there.

Screen for the Job You Actually Have

"Senior Python developer" is almost meaningless on its own. Python runs web backends, data pipelines, and infrastructure scripts — three jobs that reward genuinely different instincts. A brilliant data engineer can be a mediocre API developer, and vice versa. Screen for the work in front of you, not for "strong Python" in the abstract.

Web and API roles

If they're building your backend or API, look for fluency in Django or FastAPI, sound API design (clear contracts, sensible status codes, versioning), and how they handle authentication and authorization. The single most revealing technical instinct here is database awareness — especially whether they recognize an N+1 query pattern (where code quietly fires one database call per row instead of one for the whole set). A web developer who doesn't think about how the database gets hit will write code that works in the demo and falls over in production.

A few questions that surface the right instincts: "Walk me through how you'd design the API for X." (Listen for versioning, error responses, and pagination — not just the happy path.) "How do you handle a slow endpoint?" (You want "I'd profile it and check the queries first," not "I'd add more servers.") "How do you keep auth logic from leaking across the codebase?"

Data and ML pipeline roles

If they're building data pipelines, the bar shifts to Pandas and NumPy at realistic scale, and — more important than raw speed — data correctness. Can they reason about what happens when the input is dirty, duplicated, or out of order? Do they care about reproducibility, so the same pipeline run twice gives the same answer? A pipeline that's fast and subtly wrong is worse than a slow one that's right.

Good probes: "A daily job silently produced wrong numbers for a week before anyone noticed — how would you have caught it sooner?" (You're listening for validation checks, monitoring, and tests on the data itself.) "How do you handle a column that's mostly clean but occasionally garbage?"

DevOps and automation roles

If they're writing automation and operational scripts, screen for reliability and error handling. Does their script fail loudly and safely, or silently leave things half-done? Do they understand idempotency — that running the same script twice shouldn't double-charge a customer or duplicate a record? Operational code lives in the dark and runs unattended; carelessness there is expensive.

Ask: "Your script dies halfway through. What state is the system in, and what happens when it reruns?" The answer tells you whether they've actually operated software in production or only written it.

Match the screen to the role, and you'll stop rejecting great candidates for lacking skills the job never needed.

The Single Best Screen — A Written Code Review

If you take one thing from this article, take this: the most revealing screen we know is a written code review, and almost nobody uses it.

Hand the candidate a short, flawed function — fifty lines or so — with a few genuine problems baked in. Ask them to review it in writing: what they'd change, and why. Here's the kind of snippet that works:

def get_active_users(user_ids, db, tag="active"):
    results = []
    for uid in user_ids:
        user = db.query(f"SELECT * FROM users WHERE id = {uid}")
        if user["status"] == tag:
            results.append(user)
    return results

def add_to_report(user, report=[]):
    report.append(user["name"])
    return report

There are at least four real issues in those few lines. A strong candidate will spot most of them and, more importantly, explain why each matters:

The mutable default argument (report=[]) — that list is created once and shared across every call, so the second report quietly contains the first report's data. A classic Python gotcha that separates people who've been bitten from people who haven't.
The N+1 query pattern — one database round-trip per user instead of a single query for the whole set. Fine for ten users, a disaster for ten thousand.
The SQL injection / unparameterized query — building a query by interpolating uid into an f-string lets a malicious value run arbitrary SQL. It should use a parameterized query.
No error handling and no tests — what happens when a user ID doesn't exist, the query returns nothing, or status is missing from the row?

What this exercise reveals is everything a timed algorithm puzzle hides: real-world judgment, what they prioritize, and — crucially — whether they can explain their reasoning to a teammate or a non-technical founder. A candidate who finds the bugs but can't tell you why they matter is only half the hire. The explanation is the signal.

It also respects everyone's time. It takes you minutes to prepare and them twenty to answer, and you learn more than a two-hour live session usually delivers.

A note on grading: don't score on catches alone. The best answer ranks the issues by severity (the SQL injection and the silent data bug matter more than style), and a great candidate will ask a clarifying question — "what's the expected scale here?" — rather than assuming. Treat that as a plus, not a dodge.

Live Coding, Take-Home, or Code Review — When to Use Which

Each format tests something real, and each has an honest cost.

Live coding tests how someone thinks under pressure and lets you ask follow-ups in real time. The downside: pressure isn't a real working condition, and plenty of strong, careful engineers freeze in a shared editor while a junior who's memorized the patterns shines. Use it to probe reasoning, not to gate on speed.

Take-home projects respect senior time and show real-world habits — structure, tests, documentation. The downside is completion drop-off: the best candidates often have offers in hand and won't spend a weekend on spec work. Keep it short, or pay for it.

Written code review has the best signal-to-effort ratio of the three. It surfaces judgment and communication, it's quick for both sides, and it's the same skill the job actually requires — most engineering time is spent reading and improving existing code, not writing it from a blank page. If you only run one screen, run this one.

If You Can't Read Code — Non-Technical Proxies That Work

If you're a founder who can't read Python, you're not out of options. You're actually well-positioned to test something the code can't: how this person thinks and communicates. Here's what to listen for.

How they reason about a trade-off. Ask something like, "Why would you choose PostgreSQL over MongoDB for this — or the other way around?" You're not grading the answer. You're listening for the shape of it. A strong engineer says some version of "it depends, because..." and walks you through the trade. Be wary of dogma — "always use X" — because real engineering is almost never that certain.

How they explain a past decision in plain language. Ask them to walk you through a hard technical call they made and why. Can they translate it so you actually understand it? An engineer who can only explain their work to other engineers will be a constant source of friction for a non-technical founder. The ones who can translate are gold — they'll keep you genuinely informed for years.

How they handle "I don't know." Ask something slightly outside their depth and watch what happens. Do they admit the edge of their knowledge cleanly, then reason toward an answer? Or do they bluff? Honest "I don't know, but here's how I'd find out" is the single best long-term signal there is. The bluffers cost you most when the stakes are highest, because they won't tell you when they're guessing.

How they estimate. Ask how long a feature you both understand would take. You're not holding them to the number — you're listening for whether they break it down, name what they're unsure about, and account for testing and the unexpected. "About two days" with no caveats is a worse answer than "three to five days, depending on how the payment integration behaves."

You don't need to evaluate the code. You need to evaluate the judgment behind it — and that comes through in plain conversation.

Red Flags

A few patterns reliably predict trouble, technical seat or not:

Dogmatism. "Always use X, never use Y." Reality has more exceptions than that.
Can't explain trade-offs. If every answer is a confident absolute, they haven't been burned enough to know the costs.
No tests in their own work. Ask to see something they've built. If there's not a test in sight, that tells you how they'll treat your codebase.
Can't name a past failure. Everyone who's shipped real software has broken something. A candidate with no scars either hasn't done much or isn't being straight with you.
Blames everyone else. When you ask about a project that went wrong, listen for whether they own any part of it. "The PM gave us bad requirements" with no self-reflection travels with them to your team.
Vague about what they personally built. On team projects, ask what they specifically wrote. Someone who can't separate their work from the group's may be borrowing credit.
Evasive about gaps. Defensiveness about a resume gap or a skill they lack is worse than the gap itself.

And if you're evaluating an agency or a marketplace rather than an individual, watch for a different set: rotating contractors instead of named people, no trial period offered, and a "bench" of unassigned engineers they're clearly trying to place. Those are signs you'll get whoever's free, not whoever's right.

When Screening One Hire Is the Wrong Move

Sometimes the honest answer is that you shouldn't be screening a single developer at all.

If you need capacity fast, don't have the time to run a real process, or need more than one stack covered, hiring solo is slow and risky. A single bad hire can cost you months — and you'll have run the whole gauntlet to get there.

The honest alternative is to borrow a team that's already been screened. That's a large part of what we do: we hire Python developers for our own product the same way we'd help you hire for yours, so when you need senior Python engineers, you get people who've already cleared this bar. Named engineers, a two-week embedded trial so you can watch the actual work before you commit, and no bench — the developer you meet is the one who builds.

That last point is the whole philosophy. We're a team of twenty-plus engineers with no bench and no rotating juniors. When you work with us, you're not getting whoever happened to be unassigned this week. You're getting a named senior engineer you can hold a real relationship with — the same standard we hold for the people who build Formester, our own SaaS product.

Your Python Screening Checklist

Whatever seat you're in, you can run this:

Screen for the role, not the language. Web, data, or DevOps — match the questions to the job.
Lead with a written code review. Hand them a flawed function; grade the reasoning, not just the catches.
Pick the format that fits the seniority. Code review for signal-per-minute; take-home for habits (keep it short); live coding for reasoning, never for raw speed.
If you can't read code, test judgment. Trade-off reasoning, plain-language explanation, estimation, and how they handle "I don't know."
Watch the red flags. Dogmatism, no tests, no past failures, blame, evasiveness — and, for vendors, rotating contractors and a bench.
Know when not to screen solo. If it's slow or risky to hire one, borrow a vetted team instead.

Before you screen anyone, it helps to know exactly what you're hiring for — our guide to scoping a software project keeps you from screening for the wrong role entirely. And if you're weighing one hire against a team, freelancer versus agency and which engagement model fits you are worth a read.

Need Senior Python Engineers Without Running the Gauntlet?

We'll bring you named, vetted Python developers — with a two-week embedded trial so you can see the work before you commit. No bench, no rotating juniors, no guesswork. The people you meet are the people who build.

Hire Python developers, or read how we embed with your team.