Optimization Strategy Framework

Random optimization leads to random results. This lesson provides a structured framework for identifying bottlenecks, making controlled changes, and proving improvement with data.

The Four-Step Framework

Step 1: Baseline

Before changing anything, capture metrics:

# Image sizes
docker images --format 'table {{.Repository}}\t{{.Tag}}\t{{.Size}}'

# Disk usage
docker system df

# Container resource usage
docker stats --no-stream

# Build time
time docker build -t app:baseline .

# Startup time (time from docker run to health check passing)
time docker compose up -d && \
  while ! docker inspect app --format '{{.State.Health.Status}}' | grep -q healthy; do sleep 1; done

Save these numbers. Without a baseline, you cannot prove improvement.

Step 2: Identify the Bottleneck

Symptom	Bottleneck Area	Investigate With
Deploys take minutes	Image size / pull time	`docker images`, `docker system df`
CI builds are slow	Layer cache invalidation	Build logs, `docker history`
Containers restart randomly	Memory limits / OOM	`docker inspect --format '{{.State.OOMKilled}}'`
Host disk fills up	Unmanaged storage growth	`docker system df -v`, log file sizes
Services fail at startup	Dependency readiness	Health check status, `docker compose logs`

Focus on the one thing that causes the most pain.

Step 3: Optimize (One Change at a Time)

Apply exactly one optimization, then measure:

❌ Bad: "I switched to Alpine, added cache mounts, rewrote the Dockerfile, 
         and enabled BuildKit all at once"

✓ Good: "I switched the base image from node:20 to node:20-alpine"

Why one at a time? If the result is worse, you know exactly what caused it. If you change five things, you cannot attribute the result.

Step 4: Validate

Run the same measurements from Step 1 and compare:

# Compare image sizes
docker images app:baseline app:optimized

# Compare build times
time docker build --no-cache -t app:optimized .

# Compare resource usage
docker stats --no-stream app-baseline app-optimized

Decision Table

Result	Action
Clear improvement, no regressions	✓ Keep -- standardize the change
Marginal improvement, no regressions	Consider keeping, document tradeoffs
No measurable improvement	Rollback -- not worth complexity
Any regression (reliability, security)	Rollback immediately

Standardize Successful Changes

When an optimization proves successful:

Update Dockerfile templates so new services get the optimization automatically
Document the change -- what was done, why, and the measured result
Add CI checks if possible (e.g., max image size threshold)

Optimization Priority Guide

Start with the highest-impact, lowest-risk optimizations:

Priority	Optimization	Impact	Risk
1	Multi-stage builds	High (image size)	Low
2	`.dockerignore`	Medium (build speed)	None
3	Dependency-first layer ordering	High (build speed)	None
4	Log rotation in `daemon.json`	High (disk)	Low
5	Resource limits (`--memory`, `--cpus`)	Medium (stability)	Low
6	BuildKit cache mounts	Medium (build speed)	Low
7	Network segmentation	Medium (security)	Low
8	Health checks with `depends_on`	Medium (reliability)	None

Key Takeaways

Always baseline before optimizing. No baseline = no proof.
Change one thing at a time and measure the result.
Rollback anything that causes regression in reliability or security.
Standardize successful optimizations into templates and CI checks.
Prioritize high-impact, low-risk optimizations first.

What's Next

Continue to Bash Automation Blueprint.

The Four-Step Framework​

Step 1: Baseline​

Step 2: Identify the Bottleneck​

Step 3: Optimize (One Change at a Time)​

Step 4: Validate​

Decision Table​

Standardize Successful Changes​

Optimization Priority Guide​

Key Takeaways​

What's Next​