Skip to main content

Healthchecks and Restart Policies

A container can be "running" but completely broken -- the process is alive but the application has deadlocked, lost its database connection, or entered an error loop. Healthchecks let Docker detect this difference. Combined with restart policies, they give your containers basic self-healing behavior.

How Healthchecks Work

A healthcheck runs a command inside the container at regular intervals. Based on the exit code, Docker marks the container as healthy, unhealthy, or starting:

flowchart TD
A["Container Starts"] --> B["Status: starting"]
B -->|"start_period elapses"| C["Run health command"]
C -->|"Exit 0"| D["Status: healthy"]
C -->|"Exit non-zero"| E["Fail count + 1"]
D -->|"Next interval"| C
E -->|"Below retry limit"| C
E -->|"Retries exceeded"| F["Status: unhealthy"]

style D fill:#e8f5e9,stroke:#2e7d32
style F fill:#ffebee,stroke:#c62828
style B fill:#fff3e0,stroke:#ef6c00

Healthcheck Parameters

ParameterDefaultWhat It Controls
test(none)The command to run (exit 0 = healthy, non-zero = unhealthy)
interval30sTime between health checks
timeout30sHow long to wait for the check to complete
retries3Consecutive failures before marking unhealthy
start_period0sGrace period after startup (failures during this time don't count)

Configuring Healthchecks

In a Dockerfile

FROM node:20-alpine
WORKDIR /app
COPY . .
RUN npm ci --omit=dev

HEALTHCHECK --interval=30s --timeout=5s --retries=3 --start-period=20s \
CMD wget -qO- http://localhost:3000/health || exit 1

USER node
CMD ["node", "server.js"]

In Docker Compose

services:
api:
image: my-api:1.0.0
restart: unless-stopped
healthcheck:
test: ["CMD", "wget", "-qO-", "http://localhost:8080/health"]
interval: 30s
timeout: 5s
retries: 3
start_period: 20s

With docker run

docker run -d \
--name api \
--health-cmd="wget -qO- http://localhost:8080/health || exit 1" \
--health-interval=30s \
--health-timeout=5s \
--health-retries=3 \
--health-start-period=20s \
my-api:1.0.0

Writing Good Health Checks

DoDon't
Check a lightweight /health endpointRun expensive database queries every check
Return quickly (well under the timeout)Depend on external services that might be slow
Use tools already in the image (wget, curl)Require tools not installed in the runtime image
Set start_period for slow-starting appsLeave start_period at 0 for apps that take time to boot

Common Health Commands by Stack

StackHealth Command
HTTP serviceswget -qO- http://localhost:PORT/health || exit 1
PostgreSQLpg_isready -U postgres
Redisredis-cli ping
MySQLmysqladmin ping -h localhost
Generic TCPnc -z localhost PORT

Checking Health Status

# Quick view -- shows (healthy) or (unhealthy) in status column
docker ps

# Detailed health log with timestamps and output
docker inspect -f '{{json .State.Health}}' my-container | python3 -m json.tool

The health log shows the last few check results, including the command output -- useful for understanding why a check is failing.

Restart Policies

Restart policies control what Docker does when a container's main process exits. They operate at the process level -- they react to the process exiting, not to health status.

flowchart TD
A["Container Process Exits"] --> B{"Restart Policy?"}
B -->|"no"| C["Stay stopped"]
B -->|"on-failure"| D{"Exit code != 0?"}
D -->|"Yes"| E["Restart"]
D -->|"No"| C
B -->|"always"| E
B -->|"unless-stopped"| F{"Manually stopped?"}
F -->|"Yes"| C
F -->|"No"| E

style C fill:#f5f5f5,stroke:#9e9e9e
style E fill:#e8f5e9,stroke:#2e7d32
PolicyBehaviorBest For
noNever restart automaticallyOne-off jobs, migrations, manual debugging
on-failureRestart only if exit code is non-zeroWorkers that should stop when done successfully
alwaysAlways restart, even after clean exitUse with caution -- can create infinite restart loops
unless-stoppedLike always, but respects manual docker stopRecommended default for long-running services

Setting a Restart Policy

# At creation time
docker run -d --restart unless-stopped --name api my-api:1.0.0

# Update an existing container
docker update --restart unless-stopped api

Healthchecks + Restart Policies Together

A common misconception: "unhealthy automatically restarts the container." This is not true in plain Docker. Healthchecks report status, restart policies react to process exits. They are complementary:

LayerWhat It Does
HealthcheckDetects application-level problems (deadlock, broken dependency)
Restart policyRecovers from process crashes (exit code)

To get automatic restart on unhealthy status, you need an orchestrator like Docker Swarm or Kubernetes, or an external watchdog that monitors health and restarts unhealthy containers.

Detecting Restart Storms

If a container keeps crashing and restarting, it creates a restart storm -- the container never stays up long enough to be useful:

# Check how many times it has restarted
docker inspect -f '{{.RestartCount}}' my-container

# Check recent logs for the crash reason
docker logs --tail 200 my-container

Common causes: bad environment variable, unreachable dependency, wrong entrypoint, or resource limits too tight.

Key Takeaways

  • Running does not mean healthy. Healthchecks detect application-level failures that "process is alive" cannot.
  • Set start_period for applications that take time to boot -- otherwise health checks fail during startup.
  • Use unless-stopped as the default restart policy for long-running services.
  • Healthchecks and restart policies are complementary, not redundant -- one reports status, the other handles process crashes.
  • Monitor restart count to catch restart storms early.

What's Next

  • Continue to Resource Limits to protect your host from runaway containers.