What is Subqueries in SQL Deep Dive?

Query Fundamentalshigh

Subqueries

A subquery is a SELECT statement nested inside another query. It can appear in SELECT, FROM, WHERE, or HAVING. Scalar subqueries return one value, row subqueries return one row, and table subqueries return multiple rows.

Memory anchor

Subqueries are questions within questions. A correlated subquery is like asking 'for each employee, how many colleagues earn more?' — you have to re-check everyone for each employee. EXISTS is like peeking through a door crack — stop as soon as you see anything inside.

Expected depth

Types: Correlated subquery — references columns from the outer query, re-evaluated for each outer row (can be slow). Non-correlated subquery — evaluated once, result used as a constant. EXISTS/NOT EXISTS — short-circuits on first match (faster than IN for large subqueries). IN — compares against a list; not null-safe. Subquery in FROM — derived table, materialized or inlined by the planner. The planner often 'unnests' subqueries and rewrites them as joins for better optimization.

Deep — senior internals

Correlated subqueries have O(n×m) complexity by default (one execution per outer row). The planner can sometimes convert them to joins via 'subquery unnesting' optimization — check the EXPLAIN plan to verify. LATERAL subquery (covered separately) is the explicit form of a correlated subquery in FROM. Scalar subqueries in SELECT that return more than one row cause a runtime error. The planner can push predicates into subqueries (predicate pushdown) if the subquery lacks DISTINCT, LIMIT, or aggregate functions that would change the result. EXISTS with an uncorrelated subquery (e.g., WHERE EXISTS (SELECT 1)) is a tautology — always true if the subquery returns any rows.

🎤Interview-ready answer

I prefer non-correlated subqueries or CTEs over correlated ones for readability and performance. EXISTS is my default for 'does a match exist' checks — it short-circuits and handles NULLs correctly. Correlated subqueries are sometimes unavoidable (e.g., row-by-row comparison logic) but I check EXPLAIN to ensure the planner can convert them to joins.

⚠Common trap

Assuming correlated subqueries are always re-evaluated per row. Modern planners often unnest them into hash joins. But you can't rely on this — always check EXPLAIN and test with large data sets.