Safe Prune Orchestration
Unguarded docker system prune can delete images you need for rollback or volumes containing production data. This lesson provides a structured approach to cleanup with safety gates.
The Problem with Blind Pruning
flowchart LR
Blind["docker system prune -a --volumes"] --> I["Deleted rollback images"]
Blind --> V["Deleted database volumes"]
Blind --> D["Production downtime"]
style Blind fill:#ffebee,stroke:#c62828
style D fill:#ffebee,stroke:#c62828
Safe Prune Workflow
flowchart TD
A["1. Inventory<br/>docker system df -v"] --> B["2. Pre-Check<br/>Verify all critical containers running"]
B --> C["3. Safe Prune<br/>Containers → Images → Networks → Cache"]
C --> D["4. Post-Check<br/>Verify services still healthy"]
D -->|"Healthy"| E["Done ✓"]
D -->|"Issues"| F["Investigate & restore"]
style A fill:#e3f2fd,stroke:#1565c0
style C fill:#e8f5e9,stroke:#2e7d32
style F fill:#ffebee,stroke:#c62828
Implementation: Safe Prune Script
#!/usr/bin/env bash
set -euo pipefail
# === Configuration ===
# Images to NEVER prune (exact tags)
PROTECTED_IMAGES=(
"postgres:16"
"redis:7-alpine"
"nginx:1.25-alpine"
)
# Containers that MUST be running before we prune
CRITICAL_CONTAINERS=(
"db"
"api"
"proxy"
)
DRY_RUN=${1:-"--dry-run"}
echo "=== Safe Prune: $(date) ==="
echo "Mode: $DRY_RUN"
echo ""
# Step 1: Pre-flight checks
echo " Pre-flight Checks "
# Verify critical containers are running
FAILED=0
for container in "${CRITICAL_CONTAINERS[@]}"; do
STATUS=$(docker inspect "$container" --format '{{.State.Status}}' 2>/dev/null || echo "missing")
if [[ "$STATUS" != "running" ]]; then
echo "FAIL: $container is $STATUS (must be running)"
FAILED=1
else
echo "OK: $container is running"
fi
done
if [[ "$FAILED" -eq 1 ]]; then
echo ""
echo "ABORT: Critical containers not running. Fix before pruning."
exit 1
fi
# Step 2: Show current state
echo ""
echo " Current Disk Usage "
docker system df
echo ""
# Step 3: Preview or execute
if [[ "$DRY_RUN" == "--execute" ]]; then
echo " Executing Safe Prune "
# Stopped containers (safe)
echo "Removing stopped containers..."
docker container prune -f
# Dangling images (safe)
echo "Removing dangling images..."
docker image prune -f
# Unused images EXCEPT protected ones
echo "Removing unused images (respecting allowlist)..."
for img_id in $(docker images -q --filter "dangling=false"); do
TAGS=$(docker inspect "$img_id" --format '{{range .RepoTags}}{{.}} {{end}}' 2>/dev/null || echo "")
PROTECTED=false
for protected in "${PROTECTED_IMAGES[@]}"; do
if [[ "$TAGS" == *"$protected"* ]]; then
PROTECTED=true
break
fi
done
if [[ "$PROTECTED" == "false" ]]; then
# Only remove if not used by any container
USED=$(docker ps -a -q --filter "ancestor=$img_id" | wc -l)
if [[ "$USED" -eq 0 ]]; then
docker rmi "$img_id" 2>/dev/null || true
fi
fi
done
# Unused networks (safe)
echo "Removing unused networks..."
docker network prune -f
# Build cache (safe)
echo "Removing build cache..."
docker builder prune -f
# Step 4: Post-check
echo ""
echo " Post-Prune Verification "
for container in "${CRITICAL_CONTAINERS[@]}"; do
STATUS=$(docker inspect "$container" --format '{{.State.Status}}' 2>/dev/null || echo "missing")
HEALTH=$(docker inspect "$container" --format '{{if .State.Health}}{{.State.Health.Status}}{{else}}no-healthcheck{{end}}' 2>/dev/null || echo "unknown")
echo "$container: status=$STATUS health=$HEALTH"
done
echo ""
echo " Disk Usage After "
docker system df
else
echo " Preview "
echo "Stopped containers: $(docker ps -aq -f status=exited | wc -l)"
echo "Dangling images: $(docker images -f dangling=true -q | wc -l)"
echo "Protected images: ${PROTECTED_IMAGES[*]}"
echo ""
echo "Run with --execute to perform cleanup"
echo "NOTE: Volumes are NEVER pruned by this script"
fi
Key Safety Rules
| Rule | Why |
|---|---|
| Never auto-prune volumes | Volume data cannot be recovered |
| Check critical containers first | Ensure nothing is in a stopped/restarting state |
| Protect rollback images | You need the previous version to rollback |
| Post-check after pruning | Verify services survived the cleanup |
| Dry-run by default | Prevent accidental execution |
Key Takeaways
- Never blindly prune -- always verify critical services are running first.
- Use an allowlist to protect images needed for rollback.
- Run in dry-run mode by default, require explicit
--executeflag. - Never auto-prune volumes -- handle volume cleanup manually with explicit review.
- Always post-check service health after pruning.
What's Next
- Continue to Compose Deployment with Rollback.