Workloads & Controllersmedium

DaemonSet, Job & CronJob

DaemonSet runs one pod per node. Job runs a task to completion. CronJob runs a Job on a schedule.

Memory anchor

DaemonSet = placing one security guard on every floor of a building (one per node). Job = hiring a temp worker to move boxes, then they leave. CronJob = the cleaning crew that shows up every Tuesday at 9 PM on a schedule.

Expected depth

DaemonSet: exactly one pod on every node (or matching nodes). Use for: log collectors (fluentd, filebeat), monitoring agents (node-exporter), CNI plugins, storage drivers. Automatically adds pods to new nodes. Job: runs pods until the desired number of completions succeed (parallelism, completions fields). Useful for batch processing, database migrations. CronJob: creates Jobs on a cron schedule. concurrencyPolicy: Allow, Forbid, or Replace if the previous run hasn't finished.

Deep — senior internals

DaemonSet pods bypass normal scheduling — they use the node's available resources without competing through the scheduler. DaemonSet update strategies: RollingUpdate (default) and OnDelete (only update when you manually delete the pod). Job failure policy: backoffLimit (retry count), activeDeadlineSeconds (timeout). Job completionMode: NonIndexed (default) vs Indexed (each pod gets an index — for sharded work). CronJob history: successfulJobsHistoryLimit and failedJobsHistoryLimit (defaults 3 and 1). CronJob startingDeadlineSeconds: if a job missed its schedule window, don't start it.

🎤Interview-ready answer

DaemonSet is for infrastructure agents — you need exactly one per node for things like log collection, metrics scraping, or CNI. Job is for finite tasks — database migrations, report generation. CronJob wraps a Job with a cron schedule. Key CronJob gotcha: if the cluster is down during a scheduled time, it catches up based on startingDeadlineSeconds — you may get multiple jobs running at once unless you set concurrencyPolicy: Forbid.

Common trap

Completed Job pods are NOT automatically deleted. They remain in Completed state until the Job's TTL (ttlSecondsAfterFinished) or manual cleanup. Without TTL cleanup, completed pods accumulate and hit cluster object limits.