Skip to main content

Dockerfile Fundamentals

A Dockerfile is a text file that tells Docker how to build an image. Each line is an instruction that adds a layer to the image. The quality of your Dockerfile directly determines the size, security, speed, and reliability of every container you run from it.

How a Docker Build Works

When you run docker build, the Docker client sends your project files (the build context) to the Docker daemon. The daemon then executes each instruction in the Dockerfile, creating a new layer for each step.

flowchart TD
A["Dockerfile + Project Files"] -->|"docker build"| B["Docker Daemon"]
B --> C["Step 1: FROM base image"]
C --> D["Step 2: COPY files"]
D --> E["Step 3: RUN install deps"]
E --> F["Step 4: COPY source"]
F --> G["Step 5: CMD start app"]
G --> H["Final Image"]

style A fill:#f0f4ff,stroke:#4a6fa5
style H fill:#e8f5e9,stroke:#2e7d32

Each step produces a layer. Docker caches these layers, so unchanged steps do not need to rebuild. This is why instruction order matters -- more on that in the next lesson.

Instruction Reference

InstructionWhat It DoesExample
FROMSets the base image (starting filesystem and runtime)FROM node:20-alpine
WORKDIRSets the working directory for all following instructionsWORKDIR /app
COPYCopies files from your project into the imageCOPY package.json ./
ADDLike COPY, but also extracts archives and fetches URLsADD archive.tar.gz /app/
RUNExecutes a command during build (install packages, compile, etc.)RUN npm ci --omit=dev
ENVSets an environment variable that persists in the imageENV NODE_ENV=production
ARGDefines a build-time variable (not available at runtime)ARG VERSION=1.0.0
EXPOSEDocuments which port the container listens on (does not publish it)EXPOSE 3000
USERSets the user for RUN, CMD, and ENTRYPOINT instructionsUSER node
ENTRYPOINTDefines the main executable (always runs)ENTRYPOINT ["node"]
CMDProvides default arguments to ENTRYPOINT, or a default commandCMD ["server.js"]
COPY vs ADD

Use COPY for everything unless you specifically need ADD's extra features (archive extraction or URL fetching). COPY is more explicit and predictable.

The order of instructions in a Dockerfile affects both cache efficiency and readability. Follow this pattern:

flowchart TD
A["1. FROM - Base image"] --> B["2. WORKDIR - Set directory"]
B --> C["3. COPY - Dependency manifests only"]
C --> D["4. RUN - Install dependencies"]
D --> E["5. COPY - Application source code"]
E --> F["6. USER - Switch to non-root"]
F --> G["7. CMD / ENTRYPOINT - Start command"]

style A fill:#e3f2fd,stroke:#1565c0
style D fill:#fff3e0,stroke:#ef6c00
style G fill:#e8f5e9,stroke:#2e7d32

The key insight: copy dependency files first, install dependencies, then copy source code. This way, Docker can reuse the cached dependency layer when only your source code changes (which happens far more often than dependency changes).

Complete Examples

Node.js

FROM node:20-alpine
WORKDIR /app

# Copy dependency manifests first (cache-friendly)
COPY package.json package-lock.json ./
RUN npm ci --omit=dev

# Then copy application source
COPY . .

USER node
CMD ["node", "server.js"]

Python

FROM python:3.12-slim
WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

USER 10001
CMD ["python", "app.py"]

Go (with Multi-Stage)

# Build stage: compile the binary
FROM golang:1.23-alpine AS build
WORKDIR /src
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 go build -o app .

# Runtime stage: only the compiled binary
FROM alpine:3.20
COPY --from=build /src/app /usr/local/bin/app
ENTRYPOINT ["/usr/local/bin/app"]

Java

FROM eclipse-temurin:21-jre
WORKDIR /app
COPY target/app.jar app.jar
ENTRYPOINT ["java", "-jar", "app.jar"]

Base Image Selection

Your base image determines the starting size, available packages, and security surface of your image. Choose carefully:

Base Image TypeSizeWhen to Use
alpine variants (e.g., node:20-alpine)~5-50 MBMost server workloads. Smallest common option
slim variants (e.g., python:3.12-slim)~50-150 MBWhen Alpine's musl libc causes compatibility issues
Full images (e.g., node:20)~300-1000 MBDevelopment only, or when you need many system packages
distroless (e.g., gcr.io/.../distroless)~2-20 MBMaximum security. No shell, no package manager
scratch0 MBStatically compiled binaries (Go, Rust)
Pin your versions

Always use a specific version tag like node:20-alpine instead of node:latest. The latest tag changes without warning and can break your builds. For maximum reproducibility, pin to a specific digest.

The .dockerignore File

Just like .gitignore prevents files from being tracked by Git, .dockerignore prevents files from being sent to the Docker daemon during builds. This makes builds faster and prevents sensitive files from accidentally ending up in your image.

Create a .dockerignore file in your project root:

.git
node_modules
dist
build
coverage
*.log
.env
.DS_Store
tmp

Without this file, Docker sends your entire project directory (including node_modules, .git history, and local secrets) to the daemon -- even if you never COPY them.

ENTRYPOINT vs CMD

These two instructions are often confused. Here is how they work together:

Dockerfiledocker run appdocker run app --help
CMD ["node", "server.js"]Runs node server.jsRuns --help (CMD is replaced)
ENTRYPOINT ["node", "server.js"]Runs node server.jsRuns node server.js --help (args appended)
ENTRYPOINT ["node"] + CMD ["server.js"]Runs node server.jsRuns node --help (CMD is replaced)

Rule of thumb: Use CMD for simple applications. Use ENTRYPOINT + CMD when you want a fixed executable with configurable arguments.

Always use the exec form (JSON array syntax) for reliable signal handling:

# Good: exec form - process receives SIGTERM directly
CMD ["node", "server.js"]

# Bad: shell form - runs through /bin/sh, signals may not reach your process
CMD node server.js

Common Pitfalls

MistakeWhy It HurtsFix
Using latest as base imageBuilds break unpredictablyPin to a specific version tag
COPY . . before RUN npm installEvery code change re-installs all dependenciesCopy package*.json first, install, then copy source
Storing secrets in ENV or COPYSecrets are baked into image layers foreverInject at runtime via env vars or mounted secrets
Running as rootCompromised app has full system accessAdd USER node or USER 10001
RUN apt-get update alonePackage cache becomes staleChain: RUN apt-get update && apt-get install -y ...
No .dockerignoreSlow builds, secrets sent to daemonCreate .dockerignore with standard exclusions

Building and Running Your Image

# Build the image and tag it
docker build -t my-app:1.0.0 .

# Check the image was created
docker images my-app

# Run a container from the image
docker run --rm -p 3000:3000 my-app:1.0.0

# Run with a custom command (overrides CMD)
docker run --rm my-app:1.0.0 --help

Adding Metadata with Labels

Labels add metadata to your image that helps with auditing and traceability. Use the OCI standard label names:

FROM node:20-alpine
LABEL org.opencontainers.image.source="https://github.com/example/repo"
LABEL org.opencontainers.image.revision="abc123"
LABEL org.opencontainers.image.created="2026-02-13T00:00:00Z"

WORKDIR /app
COPY package*.json ./
RUN npm ci --omit=dev
COPY . .
USER node
CMD ["node", "server.js"]

You can inspect labels on any image with docker inspect <image>.

Key Takeaways

  • A Dockerfile is a build contract -- it determines the size, security, and reliability of your containers.
  • Instruction order matters: copy dependency manifests first, install dependencies, then copy source code.
  • Always use a .dockerignore to keep builds fast and prevent secret leakage.
  • Pin your base image versions -- never rely on latest for production builds.
  • Run as a non-root user whenever possible.
  • Use exec form (["node", "server.js"]) for CMD and ENTRYPOINT to ensure proper signal handling.

What's Next