Dependency Order and Healthchecks

In a multi-service stack, "container started" does not mean "service ready." If your API tries to connect to PostgreSQL before it has finished initializing, you get a crash. Compose gives you tools to handle this properly.

The Problem

flowchart LR
    subgraph Without["Without Health Checks"]
        A1["API starts"] -->|"Connects immediately"| B1["DB still initializing"]
        B1 -->|"Connection refused"| C1["API crashes"]
    end

    subgraph With["With Health Checks"]
        A2["DB starts, becomes healthy"] -->|"condition: service_healthy"| B2["API starts"]
        B2 -->|"Connects"| C2["DB is ready ✓"]
    end

    style Without fill:#ffebee,stroke:#c62828
    style With fill:#e8f5e9,stroke:#2e7d32

`depends_on` Without Health Checks (Not Enough)

By default, depends_on only controls startup order, not readiness:

services:
  api:
    image: my-api:1.0.0
    depends_on:
      - db    # Only waits for db container to START, not be READY

  db:
    image: postgres:16

The API container starts as soon as the db container has been created -- but PostgreSQL may still be initializing.

`depends_on` With Health Checks (The Right Way)

Add a health check to the dependency and use condition: service_healthy:

services:
  db:
    image: postgres:16
    environment:
      POSTGRES_PASSWORD: ${DB_PASSWORD}
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 20s

  api:
    image: my-api:1.0.0
    depends_on:
      db:
        condition: service_healthy

Now the API will not start until PostgreSQL is actually accepting connections.

Health Check Parameters

Parameter	What It Does	Recommendation
`test`	Command that returns 0 (healthy) or 1 (unhealthy)	Use the service's native check tool
`interval`	Time between checks	10-30s for most services
`timeout`	Max time for a single check	3-5s
`retries`	Failures before marking unhealthy	3-5
`start_period`	Grace period during startup (failures don't count)	Set to expected initialization time

Common Health Check Commands

Service	Health Check
PostgreSQL	`pg_isready -U postgres`
MySQL	`mysqladmin ping -h localhost`
Redis	`redis-cli ping`
HTTP app	`curl -f http://localhost:8080/health`
MongoDB	`mongosh --eval "db.runCommand('ping')"`
RabbitMQ	`rabbitmq-diagnostics -q check_running`

Dependency Graph

In complex stacks, draw the dependency graph to avoid circular dependencies:

flowchart TD
    proxy["proxy"] --> api
    api["api"] --> db["db"]
    api --> cache["cache"]
    worker["worker"] --> queue["queue"]
    worker --> db

    style db fill:#e8f5e9,stroke:#2e7d32
    style cache fill:#e3f2fd,stroke:#1565c0
    style queue fill:#fff3e0,stroke:#ef6c00

services:
  proxy:
    depends_on:
      api: { condition: service_healthy }

  api:
    depends_on:
      db: { condition: service_healthy }
      cache: { condition: service_healthy }

  worker:
    depends_on:
      queue: { condition: service_healthy }
      db: { condition: service_healthy }

App-Level Retries Are Still Required

Even with health checks, transient failures can happen. Your application should implement connection retry logic:

# Python example: retry database connection
import time, psycopg2

for attempt in range(10):
    try:
        conn = psycopg2.connect("host=db dbname=myapp user=postgres")
        break
    except psycopg2.OperationalError:
        print(f"DB not ready, retry {attempt + 1}/10...")
        time.sleep(2)

Health checks solve startup sequencing. Retries solve transient failures.

Verifying Health Status

# Check health status of all services
docker compose ps

# Detailed health info for a specific container
docker inspect --format='{{json .State.Health}}' myproject-db-1 | python3 -m json.tool

Key Takeaways

depends_on alone only controls start order, not readiness. Always add condition: service_healthy.
Define health checks for every service that other services depend on (databases, caches, queues).
Use start_period to give slow-starting services time to initialize without being marked unhealthy.
Health checks solve startup sequencing. App-level retries solve transient runtime failures. You need both.
Draw your dependency graph to catch circular dependencies early.

What's Next

Continue to Multi-Environment Overrides to manage dev, staging, and production configurations.

The Problem​

depends_on Without Health Checks (Not Enough)​

depends_on With Health Checks (The Right Way)​

Health Check Parameters​

Common Health Check Commands​

Dependency Graph​

App-Level Retries Are Still Required​

Verifying Health Status​

Key Takeaways​

What's Next​