JOIN Types
SQL joins combine rows from two tables based on a related column. INNER JOIN returns only matching rows. LEFT JOIN returns all left table rows plus matching right rows (NULLs for non-matches). RIGHT JOIN is the mirror. FULL OUTER JOIN returns all rows from both tables.
Joins are Venn diagrams: INNER is the overlap, LEFT is the left circle + overlap, FULL OUTER is both circles combined. NOT IN with NULLs is like a guest list that includes a blank name — nobody can get in because you can't confirm nobody is the blank name.
INNER JOIN: rows present in both tables. LEFT JOIN: all left rows, NULL for right when no match — use to find rows with no related records. CROSS JOIN: Cartesian product (all combinations). SELF JOIN: table joined to itself (e.g., employee-manager hierarchy). JOIN execution algorithms: Nested Loop (good for small tables or indexed lookups), Hash Join (good for large, unsorted tables), Merge Join (good for pre-sorted data). The query planner picks the algorithm based on estimated row counts and available indexes.
JOIN order matters for performance even though SQL is declarative — the planner uses table statistics (pg_statistic) and cost estimates to determine join order. For n tables, there are n! possible join orders; the planner uses dynamic programming for small n and genetic algorithms (GEQO) for large n. The join condition's selectivity determines how many rows survive — high-selectivity joins filter aggressively early. Anti-joins (NOT EXISTS, NOT IN, EXCEPT) have subtle differences: NOT IN fails unexpectedly with NULLs in the subquery (returns empty result set). NOT EXISTS correctly handles NULLs and is generally faster with correlated subqueries.
I use INNER JOIN when I only want matching rows, LEFT JOIN to include rows with no match (find orphans), and FULL OUTER JOIN for reconciliation queries. The planner chooses Nested Loop, Hash Join, or Merge Join based on statistics. A critical gotcha: NOT IN returns empty if the subquery contains any NULLs — I always use NOT EXISTS instead for anti-joins.
Using NOT IN when the subquery might return NULLs. SELECT * FROM orders WHERE customer_id NOT IN (SELECT id FROM customers) — if any customer id IS NULL, the entire NOT IN returns false for every row, giving an empty result. Use NOT EXISTS instead.