Data Persistence Strategies

Knowing how volumes, bind mounts, and tmpfs work (covered in Volume Management) is the foundation. This lesson focuses on when to use each type and how to design storage patterns that handle permissions, multi-container sharing, and environment differences correctly.

Choosing the Right Storage Type

flowchart TD
    A["Need to persist data?"] -->|"No"| B["Container writable layer\n(default, ephemeral)"]
    A -->|"Yes"| C{"Data sensitivity?"}
    C -->|"Secrets / tokens"| D["tmpfs mount\n(RAM only)"]
    C -->|"Normal data"| E{"Who manages the path?"}
    E -->|"Docker manages it"| F["Named volume"]
    E -->|"I need a specific host path"| G{"Read-only?"}
    G -->|"Yes"| H["Bind mount (:ro)"]
    G -->|"No"| I["Bind mount"]

    style B fill:#f5f5f5,stroke:#9e9e9e
    style D fill:#fff3e0,stroke:#ef6c00
    style F fill:#e8f5e9,stroke:#2e7d32
    style H fill:#e3f2fd,stroke:#1565c0
    style I fill:#e3f2fd,stroke:#1565c0

Quick Decision Table

Use Case	Storage Type	Reason
Database data (PostgreSQL, MySQL)	Named volume	Managed by Docker, portable, survives container removal
Application uploads / user files	Named volume	Persists independently from the container lifecycle
Source code in development	Bind mount	Hot-reload requires host filesystem access
Configuration files	Bind mount (`:ro`)	Host-managed, container should not modify
TLS certificates	Bind mount (`:ro`)	Host-managed by cert tool (certbot, etc.)
Secrets / API keys (at rest)	tmpfs	Never written to disk
Temporary build artifacts	tmpfs or writable layer	Discarded after use
Log files (external aggregation)	Bind mount or volume	Depends on log collection strategy

Database Persistence

Databases are the most critical data to persist correctly.

PostgreSQL

services:
  db:
    image: postgres:16
    restart: unless-stopped
    environment:
      POSTGRES_PASSWORD_FILE: /run/secrets/db_password
    volumes:
      - pgdata:/var/lib/postgresql/data
      - ./init-scripts:/docker-entrypoint-initdb.d:ro
    secrets:
      - db_password

volumes:
  pgdata:

secrets:
  db_password:
    file: ./secrets/db_password.txt

MySQL / MariaDB

services:
  db:
    image: mysql:8
    restart: unless-stopped
    environment:
      MYSQL_ROOT_PASSWORD_FILE: /run/secrets/db_password
    volumes:
      - mysqldata:/var/lib/mysql
      - ./my-custom.cnf:/etc/mysql/conf.d/custom.cnf:ro

volumes:
  mysqldata:

Redis (with persistence)

services:
  cache:
    image: redis:7-alpine
    restart: unless-stopped
    command: redis-server --appendonly yes
    volumes:
      - redisdata:/data

volumes:
  redisdata:

tip

Always use a named volume for database data directories. Bind mounts can cause permission issues, especially on macOS and Windows, and are not portable.

Permission and Ownership Patterns

Permission mismatches are one of the most common Docker storage problems.

The Problem

flowchart LR
    A["Host: file owned by uid 1000"] -->|"Bind mount"| B["Container: process runs as uid 999"]
    B --> C["Permission denied!"]

    style C fill:#ffebee,stroke:#c62828

Solution 1: Match UIDs

Ensure the container process runs as the same UID as the host file owner:

# Create a user with a specific UID matching the host
RUN addgroup -g 1000 appgroup && \
    adduser -u 1000 -G appgroup -s /bin/sh -D appuser
USER appuser

Solution 2: Fix Ownership at Runtime

# Set ownership in an entrypoint script
chown -R app:app /data
exec "$@"

Solution 3: Use Named Volumes

Named volumes are initialized with the correct permissions from the image's filesystem. Docker sets ownership to match what the Dockerfile specifies:

# The /data directory will have correct ownership in the volume
RUN mkdir -p /data && chown -R app:app /data
VOLUME /data

Permission Reference

Scenario	Solution
Bind mount, host user ≠ container user	Match UIDs or use `chown` in entrypoint
Named volume, first use	Docker copies ownership from image -- usually works correctly
Read-only config files	Mount with `:ro`, ensure host file is readable
Container runs as root	Works but insecure -- use a non-root user

Multi-Container Shared Data

Shared Volume Between Services

services:
  # Writer uploads files
  api:
    image: my-api:1.0.0
    volumes:
      - uploads:/app/uploads

  # Reader serves files
  cdn:
    image: nginx:alpine
    volumes:
      - uploads:/usr/share/nginx/html/uploads:ro

volumes:
  uploads:

File Lock Considerations

warning

When multiple containers write to the same volume, you risk file corruption if they modify the same files simultaneously. Use application-level locking or a shared-nothing architecture where each container writes to its own subdirectory.

Environment-Specific Storage

Development

services:
  api:
    image: my-api:1.0.0
    volumes:
      # Source code: bind mount for hot-reload
      - ./src:/app/src
      # Node modules: named volume to avoid overwriting
      - node_modules:/app/node_modules
      # Config: bind mount, read-only
      - ./config/dev.json:/app/config.json:ro

volumes:
  node_modules:

Production

services:
  api:
    image: my-api:1.0.0
    volumes:
      # Data: named volume only
      - uploads:/app/uploads
      # Config: bind mount, read-only
      - ./config/prod.json:/app/config.json:ro
      # Secrets: tmpfs
    tmpfs:
      - /tmp:size=50m

volumes:
  uploads:

The `node_modules` Trick

In development, bind-mounting the project root (./:/app) also overwrites /app/node_modules with the host version (or an empty directory if it does not exist on the host). Fix this with a named volume overlay:

volumes:
  - ./:/app           # Bind mount project root
  - node_modules:/app/node_modules  # Named volume "covers" bind mount at this path

volumes:
  node_modules:

The named volume at /app/node_modules takes precedence over the bind mount at that specific path.

Storage Anti-Patterns

Anti-Pattern	Problem	Better Approach
Storing data in writable layer	Lost on `docker rm`	Use named volumes
Bind-mounting database data dir	Permission issues, not portable	Use named volumes
Committing data into images	Image size bloated, stale data	Mount data at runtime
Using anonymous volumes	Hard to find, easy to accidentally prune	Use named volumes
Same volume, multiple writers, no locking	File corruption	Application-level locking or separate directories
Not setting `:ro` on config mounts	Container can modify host configs	Always mount configs as `:ro`

Key Takeaways

Named volumes for database data and application state -- Docker-managed, portable, correct permissions.
Bind mounts for source code (development) and config files (read-only).
tmpfs for secrets and scratch data that should never touch disk.
Match UIDs between host and container to avoid permission issues with bind mounts.
Use the node_modules named volume trick to prevent bind mounts from overwriting dependency directories.
Mount configuration files as read-only (:ro) to prevent containers from modifying host files.

What's Next

Continue to Backup and Restore to learn how to protect your persistent data with backup scripts and strategies.

Choosing the Right Storage Type​

Quick Decision Table​

Database Persistence​

PostgreSQL​

MySQL / MariaDB​

Redis (with persistence)​

Permission and Ownership Patterns​

The Problem​

Solution 1: Match UIDs​

Solution 2: Fix Ownership at Runtime​

Solution 3: Use Named Volumes​

Permission Reference​

Multi-Container Shared Data​

Shared Volume Between Services​

File Lock Considerations​

Environment-Specific Storage​

Development​

Production​

The node_modules Trick​

Storage Anti-Patterns​

Key Takeaways​

What's Next​