The Data Analyst Interview: SQL, Stats, And Story
The Mythic Intel Team · Apr 5, 2025 · 7 min read
A data analyst interview tests three things in sequence: can you pull and shape data with SQL, can you reason about it with statistics, and can you turn the result into a decision someone will act on. Data analyst interview questions almost always include a live SQL exercise, so be fluent with joins, aggregation, and window functions, comfortable explaining descriptive statistics, and ready to walk an analytics case from messy question to clear recommendation.
The pattern across most analyst loops is a SQL screen, a stats and metrics conversation, a case or take-home, and a behavioral round on how you communicate findings. The technical bar is real but bounded: you do not need exotic queries, you need clean, correct ones and the judgment to know what to compute.
SQL: Joins, Aggregation, And Window Functions
SQL interview questions are the heart of the screen. Three areas come up almost every time.
- Joins. Know the difference between an inner join (only matching rows) and a left join (all rows from the left table, NULLs where the right has no match). Most "find customers with no orders" questions are a left join with a
WHERE ... IS NULLfilter. - Aggregation.
GROUP BYwithSUM,COUNT,AVG, plus theWHEREversusHAVINGdistinction:WHEREfilters rows before grouping,HAVINGfilters groups after aggregation. - Window functions. These compute across a set of rows related to the current row without collapsing them into one.
ROW_NUMBER(),RANK(), andDENSE_RANK()differ in how they treat ties:RANK()leaves gaps after a tie (1, 2, 2, 4), whileDENSE_RANK()does not (1, 2, 2, 3).
A classic question is "find the second-highest salary in each department." A window function answers it cleanly:
SELECT department, employee, salary
FROM (
SELECT
department,
employee,
salary,
DENSE_RANK() OVER (
PARTITION BY department
ORDER BY salary DESC
) AS salary_rank
FROM employees
) ranked
WHERE salary_rank = 2;
Be ready to explain why you chose DENSE_RANK() over RANK() here: if two people tie for the top salary, DENSE_RANK() still returns the genuine second salary, whereas RANK() would skip it. That kind of reasoning out loud is what interviewers grade, not just whether the query runs.
Descriptive Statistics: Use The Right Measure
You will be asked to summarize data and to defend your choice of measure.
- Mean is the average, the sum divided by the count. It is pulled toward outliers.
- Median is the middle value when the data is ordered, the 50th percentile, and it is far more resistant to outliers.
- Mode is the most frequent value, useful for categorical data.
- Standard deviation measures the typical distance of values from the mean; variance is its square.
The exam-favorite question is "income is heavily skewed, do you report the mean or the median?" The answer is the median, because a few very high incomes drag the mean upward and misrepresent the typical case. Knowing when the mean lies is a core analyst skill.
A precise point that catches people out: correlation is not a descriptive statistic of a single variable, it describes the relationship between two variables and sits in inferential territory. And correlation is not causation, so if a case hands you a strong correlation, name confounders before drawing a conclusion.
Metrics, A/B Tests, And Dashboards
Beyond raw stats, analysts get judgment questions.
- Metric definition. Define a metric precisely and spot a misleading one. "Average revenue per user" hides a lot when a few whales dominate; a median or a distribution tells the truer story.
- A/B testing basics. Understand a control versus a treatment group, why you need a large enough sample, and what statistical significance does and does not tell you. Significance is not the same as a meaningful effect size.
- Dashboards. Good dashboards answer a specific question for a specific audience and lead with the metric that drives a decision. Be ready to critique a bad chart.
Example question: "We changed the checkout button and conversions went up 2 percent. Did it work?" A strong answer asks how long the test ran, whether the sample was large enough, whether the difference is significant, and whether anything else changed at the same time, before celebrating.
The Analytics Case
The case ties it together: a vague business question, some data, and a request for a recommendation. The method matters more than a clever trick.
- Clarify the question. "Why is churn up" first needs a definition of churn and a time frame.
- Plan the pull. Which tables, which joins, which grain.
- Compute and sanity-check. Watch for NULLs, duplicates from a bad join, and segments that move the average.
- Recommend. End with what the business should do and how confident you are, not just a chart.
Example question: "Monthly active users dropped 8 percent. Find out why." Segment by platform, geography, cohort, and channel, and isolate whether it is a real behavior change or a tracking or seasonality artifact before naming a cause.
How To Rehearse
Write SQL by hand and then explain each query out loud, because in the interview you have to narrate your logic while you type, and say your stats reasoning (mean versus median, correlation versus causation) until it is automatic. A voice-driven trainer like Mythic Intel can research the specific analyst role, verify the SQL and statistics you cite, run the case prompts, and grade your spoken answers on accuracy and structure so the explanations land as cleanly as the queries.