Data Persistence Strategies
Knowing how volumes, bind mounts, and tmpfs work (covered in Volume Management) is the foundation. This lesson focuses on when to use each type and how to design storage patterns that handle permissions, multi-container sharing, and environment differences correctly.
Choosing the Right Storage Type
flowchart TD
A["Need to persist data?"] -->|"No"| B["Container writable layer\n(default, ephemeral)"]
A -->|"Yes"| C{"Data sensitivity?"}
C -->|"Secrets / tokens"| D["tmpfs mount\n(RAM only)"]
C -->|"Normal data"| E{"Who manages the path?"}
E -->|"Docker manages it"| F["Named volume"]
E -->|"I need a specific host path"| G{"Read-only?"}
G -->|"Yes"| H["Bind mount (:ro)"]
G -->|"No"| I["Bind mount"]
style B fill:#f5f5f5,stroke:#9e9e9e
style D fill:#fff3e0,stroke:#ef6c00
style F fill:#e8f5e9,stroke:#2e7d32
style H fill:#e3f2fd,stroke:#1565c0
style I fill:#e3f2fd,stroke:#1565c0
Quick Decision Table
| Use Case | Storage Type | Reason |
|---|---|---|
| Database data (PostgreSQL, MySQL) | Named volume | Managed by Docker, portable, survives container removal |
| Application uploads / user files | Named volume | Persists independently from the container lifecycle |
| Source code in development | Bind mount | Hot-reload requires host filesystem access |
| Configuration files | Bind mount (:ro) | Host-managed, container should not modify |
| TLS certificates | Bind mount (:ro) | Host-managed by cert tool (certbot, etc.) |
| Secrets / API keys (at rest) | tmpfs | Never written to disk |
| Temporary build artifacts | tmpfs or writable layer | Discarded after use |
| Log files (external aggregation) | Bind mount or volume | Depends on log collection strategy |
Database Persistence
Databases are the most critical data to persist correctly.
PostgreSQL
services:
db:
image: postgres:16
restart: unless-stopped
environment:
POSTGRES_PASSWORD_FILE: /run/secrets/db_password
volumes:
- pgdata:/var/lib/postgresql/data
- ./init-scripts:/docker-entrypoint-initdb.d:ro
secrets:
- db_password
volumes:
pgdata:
secrets:
db_password:
file: ./secrets/db_password.txt
MySQL / MariaDB
services:
db:
image: mysql:8
restart: unless-stopped
environment:
MYSQL_ROOT_PASSWORD_FILE: /run/secrets/db_password
volumes:
- mysqldata:/var/lib/mysql
- ./my-custom.cnf:/etc/mysql/conf.d/custom.cnf:ro
volumes:
mysqldata:
Redis (with persistence)
services:
cache:
image: redis:7-alpine
restart: unless-stopped
command: redis-server --appendonly yes
volumes:
- redisdata:/data
volumes:
redisdata:
Always use a named volume for database data directories. Bind mounts can cause permission issues, especially on macOS and Windows, and are not portable.
Permission and Ownership Patterns
Permission mismatches are one of the most common Docker storage problems.
The Problem
flowchart LR
A["Host: file owned by uid 1000"] -->|"Bind mount"| B["Container: process runs as uid 999"]
B --> C["Permission denied!"]
style C fill:#ffebee,stroke:#c62828
Solution 1: Match UIDs
Ensure the container process runs as the same UID as the host file owner:
# Create a user with a specific UID matching the host
RUN addgroup -g 1000 appgroup && \
adduser -u 1000 -G appgroup -s /bin/sh -D appuser
USER appuser
Solution 2: Fix Ownership at Runtime
# Set ownership in an entrypoint script
chown -R app:app /data
exec "$@"
Solution 3: Use Named Volumes
Named volumes are initialized with the correct permissions from the image's filesystem. Docker sets ownership to match what the Dockerfile specifies:
# The /data directory will have correct ownership in the volume
RUN mkdir -p /data && chown -R app:app /data
VOLUME /data
Permission Reference
| Scenario | Solution |
|---|---|
| Bind mount, host user ≠ container user | Match UIDs or use chown in entrypoint |
| Named volume, first use | Docker copies ownership from image -- usually works correctly |
| Read-only config files | Mount with :ro, ensure host file is readable |
| Container runs as root | Works but insecure -- use a non-root user |
Multi-Container Shared Data
Shared Volume Between Services
services:
# Writer uploads files
api:
image: my-api:1.0.0
volumes:
- uploads:/app/uploads
# Reader serves files
cdn:
image: nginx:alpine
volumes:
- uploads:/usr/share/nginx/html/uploads:ro
volumes:
uploads:
File Lock Considerations
When multiple containers write to the same volume, you risk file corruption if they modify the same files simultaneously. Use application-level locking or a shared-nothing architecture where each container writes to its own subdirectory.
Environment-Specific Storage
Development
services:
api:
image: my-api:1.0.0
volumes:
# Source code: bind mount for hot-reload
- ./src:/app/src
# Node modules: named volume to avoid overwriting
- node_modules:/app/node_modules
# Config: bind mount, read-only
- ./config/dev.json:/app/config.json:ro
volumes:
node_modules:
Production
services:
api:
image: my-api:1.0.0
volumes:
# Data: named volume only
- uploads:/app/uploads
# Config: bind mount, read-only
- ./config/prod.json:/app/config.json:ro
# Secrets: tmpfs
tmpfs:
- /tmp:size=50m
volumes:
uploads:
The node_modules Trick
In development, bind-mounting the project root (./:/app) also overwrites /app/node_modules with the host version (or an empty directory if it does not exist on the host). Fix this with a named volume overlay:
volumes:
- ./:/app # Bind mount project root
- node_modules:/app/node_modules # Named volume "covers" bind mount at this path
volumes:
node_modules:
The named volume at /app/node_modules takes precedence over the bind mount at that specific path.
Storage Anti-Patterns
| Anti-Pattern | Problem | Better Approach |
|---|---|---|
| Storing data in writable layer | Lost on docker rm | Use named volumes |
| Bind-mounting database data dir | Permission issues, not portable | Use named volumes |
| Committing data into images | Image size bloated, stale data | Mount data at runtime |
| Using anonymous volumes | Hard to find, easy to accidentally prune | Use named volumes |
| Same volume, multiple writers, no locking | File corruption | Application-level locking or separate directories |
Not setting :ro on config mounts | Container can modify host configs | Always mount configs as :ro |
Key Takeaways
- Named volumes for database data and application state -- Docker-managed, portable, correct permissions.
- Bind mounts for source code (development) and config files (read-only).
- tmpfs for secrets and scratch data that should never touch disk.
- Match UIDs between host and container to avoid permission issues with bind mounts.
- Use the
node_modulesnamed volume trick to prevent bind mounts from overwriting dependency directories. - Mount configuration files as read-only (
:ro) to prevent containers from modifying host files.
What's Next
- Continue to Backup and Restore to learn how to protect your persistent data with backup scripts and strategies.