Recursive CTEs
Recursive CTEs traverse hierarchical or graph data (org charts, bill of materials, folder structures) using a self-referencing WITH RECURSIVE clause. A base case provides initial rows; the recursive term references the CTE itself to add next-level rows.
Recursive CTE is a chain of dominoes — the base case tips the first one (root node), and each falling domino (iteration) tips the next row in line until none are left standing.
Structure: WITH RECURSIVE tree AS (SELECT id, parent_id, name FROM org WHERE parent_id IS NULL -- base case UNION ALL SELECT o.id, o.parent_id, o.name FROM org o INNER JOIN tree t ON o.parent_id = t.id -- recursive term). Execution: evaluate base case → evaluate recursive term joining against previous iteration → repeat until no new rows → union all results. UNION removes duplicates; UNION ALL is faster and correct unless the graph has cycles. Depth/path tracking: add a depth counter or accumulated path for breadcrumb trails.
Cycle detection: for graph (not tree) data, recursive CTEs can loop infinitely. Strategies: track visited IDs in an array (array_agg), add a depth limit (WHERE depth < 100), or use PostgreSQL's graph cycle detection syntax (CYCLE id SET is_cycle USING path). Performance: recursive CTEs use a working table iteratively — each iteration scans the previous iteration's results and the base table. Index on the parent_id column is critical. For very deep hierarchies (>50 levels), recursive CTEs can be slow — consider storing the materialized path (ltree extension in PostgreSQL) instead.
Recursive CTEs traverse hierarchical data with a base case plus recursive term joined back to itself. The classic use case is org chart traversal — find all reports under a manager. I always add UNION ALL (not UNION) for speed and a depth counter for cycle protection in graph data. For very deep or frequently queried hierarchies, I consider the ltree extension or storing materialized paths for O(1) ancestor lookups.
Running a recursive CTE on graph data with cycles (not a tree) without cycle detection. The query will loop indefinitely. Add a depth limit (WHERE depth < 1000) or visited-IDs array as a safety valve.