Healthchecks and Restart Policies

A container can be "running" but completely broken -- the process is alive but the application has deadlocked, lost its database connection, or entered an error loop. Healthchecks let Docker detect this difference. Combined with restart policies, they give your containers basic self-healing behavior.

How Healthchecks Work

A healthcheck runs a command inside the container at regular intervals. Based on the exit code, Docker marks the container as healthy, unhealthy, or starting:

flowchart TD
    A["Container Starts"] --> B["Status: starting"]
    B -->|"start_period elapses"| C["Run health command"]
    C -->|"Exit 0"| D["Status: healthy"]
    C -->|"Exit non-zero"| E["Fail count + 1"]
    D -->|"Next interval"| C
    E -->|"Below retry limit"| C
    E -->|"Retries exceeded"| F["Status: unhealthy"]

    style D fill:#e8f5e9,stroke:#2e7d32
    style F fill:#ffebee,stroke:#c62828
    style B fill:#fff3e0,stroke:#ef6c00

Healthcheck Parameters

Parameter	Default	What It Controls
`test`	(none)	The command to run (exit 0 = healthy, non-zero = unhealthy)
`interval`	30s	Time between health checks
`timeout`	30s	How long to wait for the check to complete
`retries`	3	Consecutive failures before marking unhealthy
`start_period`	0s	Grace period after startup (failures during this time don't count)

Configuring Healthchecks

In a Dockerfile

FROM node:20-alpine
WORKDIR /app
COPY . .
RUN npm ci --omit=dev

HEALTHCHECK --interval=30s --timeout=5s --retries=3 --start-period=20s \
  CMD wget -qO- http://localhost:3000/health || exit 1

USER node
CMD ["node", "server.js"]

In Docker Compose

services:
  api:
    image: my-api:1.0.0
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "wget", "-qO-", "http://localhost:8080/health"]
      interval: 30s
      timeout: 5s
      retries: 3
      start_period: 20s

With `docker run`

docker run -d \
  --name api \
  --health-cmd="wget -qO- http://localhost:8080/health || exit 1" \
  --health-interval=30s \
  --health-timeout=5s \
  --health-retries=3 \
  --health-start-period=20s \
  my-api:1.0.0

Writing Good Health Checks

Do	Don't
Check a lightweight `/health` endpoint	Run expensive database queries every check
Return quickly (well under the `timeout`)	Depend on external services that might be slow
Use tools already in the image (`wget`, `curl`)	Require tools not installed in the runtime image
Set `start_period` for slow-starting apps	Leave `start_period` at 0 for apps that take time to boot

Common Health Commands by Stack

Stack	Health Command
HTTP services	`wget -qO- http://localhost:PORT/health \|\| exit 1`
PostgreSQL	`pg_isready -U postgres`
Redis	`redis-cli ping`
MySQL	`mysqladmin ping -h localhost`
Generic TCP	`nc -z localhost PORT`

Checking Health Status

# Quick view -- shows (healthy) or (unhealthy) in status column
docker ps

# Detailed health log with timestamps and output
docker inspect -f '{{json .State.Health}}' my-container | python3 -m json.tool

The health log shows the last few check results, including the command output -- useful for understanding why a check is failing.

Restart Policies

Restart policies control what Docker does when a container's main process exits. They operate at the process level -- they react to the process exiting, not to health status.

flowchart TD
    A["Container Process Exits"] --> B{"Restart Policy?"}
    B -->|"no"| C["Stay stopped"]
    B -->|"on-failure"| D{"Exit code != 0?"}
    D -->|"Yes"| E["Restart"]
    D -->|"No"| C
    B -->|"always"| E
    B -->|"unless-stopped"| F{"Manually stopped?"}
    F -->|"Yes"| C
    F -->|"No"| E

    style C fill:#f5f5f5,stroke:#9e9e9e
    style E fill:#e8f5e9,stroke:#2e7d32

Policy	Behavior	Best For
`no`	Never restart automatically	One-off jobs, migrations, manual debugging
`on-failure`	Restart only if exit code is non-zero	Workers that should stop when done successfully
`always`	Always restart, even after clean exit	Use with caution -- can create infinite restart loops
`unless-stopped`	Like `always`, but respects manual `docker stop`	Recommended default for long-running services

Setting a Restart Policy

# At creation time
docker run -d --restart unless-stopped --name api my-api:1.0.0

# Update an existing container
docker update --restart unless-stopped api

Healthchecks + Restart Policies Together

A common misconception: "unhealthy automatically restarts the container." This is not true in plain Docker. Healthchecks report status, restart policies react to process exits. They are complementary:

Layer	What It Does
Healthcheck	Detects application-level problems (deadlock, broken dependency)
Restart policy	Recovers from process crashes (exit code)

To get automatic restart on unhealthy status, you need an orchestrator like Docker Swarm or Kubernetes, or an external watchdog that monitors health and restarts unhealthy containers.

Detecting Restart Storms

If a container keeps crashing and restarting, it creates a restart storm -- the container never stays up long enough to be useful:

# Check how many times it has restarted
docker inspect -f '{{.RestartCount}}' my-container

# Check recent logs for the crash reason
docker logs --tail 200 my-container

Common causes: bad environment variable, unreachable dependency, wrong entrypoint, or resource limits too tight.

Key Takeaways

Running does not mean healthy. Healthchecks detect application-level failures that "process is alive" cannot.
Set start_period for applications that take time to boot -- otherwise health checks fail during startup.
Use unless-stopped as the default restart policy for long-running services.
Healthchecks and restart policies are complementary, not redundant -- one reports status, the other handles process crashes.
Monitor restart count to catch restart storms early.

What's Next

Continue to Resource Limits to protect your host from runaway containers.

How Healthchecks Work​

Healthcheck Parameters​

Configuring Healthchecks​

In a Dockerfile​

In Docker Compose​

With docker run​

Writing Good Health Checks​

Common Health Commands by Stack​

Checking Health Status​

Restart Policies​

Setting a Restart Policy​

Healthchecks + Restart Policies Together​

Detecting Restart Storms​

Key Takeaways​

What's Next​