Healthchecks and Restart Policies
A container can be "running" but completely broken -- the process is alive but the application has deadlocked, lost its database connection, or entered an error loop. Healthchecks let Docker detect this difference. Combined with restart policies, they give your containers basic self-healing behavior.
How Healthchecks Work
A healthcheck runs a command inside the container at regular intervals. Based on the exit code, Docker marks the container as healthy, unhealthy, or starting:
flowchart TD
A["Container Starts"] --> B["Status: starting"]
B -->|"start_period elapses"| C["Run health command"]
C -->|"Exit 0"| D["Status: healthy"]
C -->|"Exit non-zero"| E["Fail count + 1"]
D -->|"Next interval"| C
E -->|"Below retry limit"| C
E -->|"Retries exceeded"| F["Status: unhealthy"]
style D fill:#e8f5e9,stroke:#2e7d32
style F fill:#ffebee,stroke:#c62828
style B fill:#fff3e0,stroke:#ef6c00
Healthcheck Parameters
| Parameter | Default | What It Controls |
|---|---|---|
test | (none) | The command to run (exit 0 = healthy, non-zero = unhealthy) |
interval | 30s | Time between health checks |
timeout | 30s | How long to wait for the check to complete |
retries | 3 | Consecutive failures before marking unhealthy |
start_period | 0s | Grace period after startup (failures during this time don't count) |
Configuring Healthchecks
In a Dockerfile
FROM node:20-alpine
WORKDIR /app
COPY . .
RUN npm ci --omit=dev
HEALTHCHECK \
CMD wget -qO- http://localhost:3000/health || exit 1
USER node
CMD ["node", "server.js"]
In Docker Compose
services:
api:
image: my-api:1.0.0
restart: unless-stopped
healthcheck:
test: ["CMD", "wget", "-qO-", "http://localhost:8080/health"]
interval: 30s
timeout: 5s
retries: 3
start_period: 20s
With docker run
docker run -d \
--name api \
--health-cmd="wget -qO- http://localhost:8080/health || exit 1" \
--health-interval=30s \
--health-timeout=5s \
--health-retries=3 \
--health-start-period=20s \
my-api:1.0.0
Writing Good Health Checks
| Do | Don't |
|---|---|
Check a lightweight /health endpoint | Run expensive database queries every check |
Return quickly (well under the timeout) | Depend on external services that might be slow |
Use tools already in the image (wget, curl) | Require tools not installed in the runtime image |
Set start_period for slow-starting apps | Leave start_period at 0 for apps that take time to boot |
Common Health Commands by Stack
| Stack | Health Command |
|---|---|
| HTTP services | wget -qO- http://localhost:PORT/health || exit 1 |
| PostgreSQL | pg_isready -U postgres |
| Redis | redis-cli ping |
| MySQL | mysqladmin ping -h localhost |
| Generic TCP | nc -z localhost PORT |
Checking Health Status
# Quick view -- shows (healthy) or (unhealthy) in status column
docker ps
# Detailed health log with timestamps and output
docker inspect -f '{{json .State.Health}}' my-container | python3 -m json.tool
The health log shows the last few check results, including the command output -- useful for understanding why a check is failing.
Restart Policies
Restart policies control what Docker does when a container's main process exits. They operate at the process level -- they react to the process exiting, not to health status.
flowchart TD
A["Container Process Exits"] --> B{"Restart Policy?"}
B -->|"no"| C["Stay stopped"]
B -->|"on-failure"| D{"Exit code != 0?"}
D -->|"Yes"| E["Restart"]
D -->|"No"| C
B -->|"always"| E
B -->|"unless-stopped"| F{"Manually stopped?"}
F -->|"Yes"| C
F -->|"No"| E
style C fill:#f5f5f5,stroke:#9e9e9e
style E fill:#e8f5e9,stroke:#2e7d32
| Policy | Behavior | Best For |
|---|---|---|
no | Never restart automatically | One-off jobs, migrations, manual debugging |
on-failure | Restart only if exit code is non-zero | Workers that should stop when done successfully |
always | Always restart, even after clean exit | Use with caution -- can create infinite restart loops |
unless-stopped | Like always, but respects manual docker stop | Recommended default for long-running services |
Setting a Restart Policy
# At creation time
docker run -d --restart unless-stopped --name api my-api:1.0.0
# Update an existing container
docker update --restart unless-stopped api
Healthchecks + Restart Policies Together
A common misconception: "unhealthy automatically restarts the container." This is not true in plain Docker. Healthchecks report status, restart policies react to process exits. They are complementary:
| Layer | What It Does |
|---|---|
| Healthcheck | Detects application-level problems (deadlock, broken dependency) |
| Restart policy | Recovers from process crashes (exit code) |
To get automatic restart on unhealthy status, you need an orchestrator like Docker Swarm or Kubernetes, or an external watchdog that monitors health and restarts unhealthy containers.
Detecting Restart Storms
If a container keeps crashing and restarting, it creates a restart storm -- the container never stays up long enough to be useful:
# Check how many times it has restarted
docker inspect -f '{{.RestartCount}}' my-container
# Check recent logs for the crash reason
docker logs --tail 200 my-container
Common causes: bad environment variable, unreachable dependency, wrong entrypoint, or resource limits too tight.
Key Takeaways
- Running does not mean healthy. Healthchecks detect application-level failures that "process is alive" cannot.
- Set
start_periodfor applications that take time to boot -- otherwise health checks fail during startup. - Use
unless-stoppedas the default restart policy for long-running services. - Healthchecks and restart policies are complementary, not redundant -- one reports status, the other handles process crashes.
- Monitor restart count to catch restart storms early.
What's Next
- Continue to Resource Limits to protect your host from runaway containers.