Dockerfile Fundamentals
A Dockerfile is a text file that tells Docker how to build an image. Each line is an instruction that adds a layer to the image. The quality of your Dockerfile directly determines the size, security, speed, and reliability of every container you run from it.
How a Docker Build Works
When you run docker build, the Docker client sends your project files (the build context) to the Docker daemon. The daemon then executes each instruction in the Dockerfile, creating a new layer for each step.
flowchart TD
A["Dockerfile + Project Files"] -->|"docker build"| B["Docker Daemon"]
B --> C["Step 1: FROM base image"]
C --> D["Step 2: COPY files"]
D --> E["Step 3: RUN install deps"]
E --> F["Step 4: COPY source"]
F --> G["Step 5: CMD start app"]
G --> H["Final Image"]
style A fill:#f0f4ff,stroke:#4a6fa5
style H fill:#e8f5e9,stroke:#2e7d32
Each step produces a layer. Docker caches these layers, so unchanged steps do not need to rebuild. This is why instruction order matters -- more on that in the next lesson.
Instruction Reference
| Instruction | What It Does | Example |
|---|---|---|
FROM | Sets the base image (starting filesystem and runtime) | FROM node:20-alpine |
WORKDIR | Sets the working directory for all following instructions | WORKDIR /app |
COPY | Copies files from your project into the image | COPY package.json ./ |
ADD | Like COPY, but also extracts archives and fetches URLs | ADD archive.tar.gz /app/ |
RUN | Executes a command during build (install packages, compile, etc.) | RUN npm ci --omit=dev |
ENV | Sets an environment variable that persists in the image | ENV NODE_ENV=production |
ARG | Defines a build-time variable (not available at runtime) | ARG VERSION=1.0.0 |
EXPOSE | Documents which port the container listens on (does not publish it) | EXPOSE 3000 |
USER | Sets the user for RUN, CMD, and ENTRYPOINT instructions | USER node |
ENTRYPOINT | Defines the main executable (always runs) | ENTRYPOINT ["node"] |
CMD | Provides default arguments to ENTRYPOINT, or a default command | CMD ["server.js"] |
Use COPY for everything unless you specifically need ADD's extra features (archive extraction or URL fetching). COPY is more explicit and predictable.
Recommended Instruction Order
The order of instructions in a Dockerfile affects both cache efficiency and readability. Follow this pattern:
flowchart TD
A["1. FROM - Base image"] --> B["2. WORKDIR - Set directory"]
B --> C["3. COPY - Dependency manifests only"]
C --> D["4. RUN - Install dependencies"]
D --> E["5. COPY - Application source code"]
E --> F["6. USER - Switch to non-root"]
F --> G["7. CMD / ENTRYPOINT - Start command"]
style A fill:#e3f2fd,stroke:#1565c0
style D fill:#fff3e0,stroke:#ef6c00
style G fill:#e8f5e9,stroke:#2e7d32
The key insight: copy dependency files first, install dependencies, then copy source code. This way, Docker can reuse the cached dependency layer when only your source code changes (which happens far more often than dependency changes).
Complete Examples
Node.js
FROM node:20-alpine
WORKDIR /app
# Copy dependency manifests first (cache-friendly)
COPY package.json package-lock.json ./
RUN npm ci --omit=dev
# Then copy application source
COPY . .
USER node
CMD ["node", "server.js"]
Python
FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
USER 10001
CMD ["python", "app.py"]
Go (with Multi-Stage)
# Build stage: compile the binary
FROM golang:1.23-alpine AS build
WORKDIR /src
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 go build -o app .
# Runtime stage: only the compiled binary
FROM alpine:3.20
COPY /src/app /usr/local/bin/app
ENTRYPOINT ["/usr/local/bin/app"]
Java
FROM eclipse-temurin:21-jre
WORKDIR /app
COPY target/app.jar app.jar
ENTRYPOINT ["java", "-jar", "app.jar"]
Base Image Selection
Your base image determines the starting size, available packages, and security surface of your image. Choose carefully:
| Base Image Type | Size | When to Use |
|---|---|---|
alpine variants (e.g., node:20-alpine) | ~5-50 MB | Most server workloads. Smallest common option |
slim variants (e.g., python:3.12-slim) | ~50-150 MB | When Alpine's musl libc causes compatibility issues |
Full images (e.g., node:20) | ~300-1000 MB | Development only, or when you need many system packages |
distroless (e.g., gcr.io/.../distroless) | ~2-20 MB | Maximum security. No shell, no package manager |
scratch | 0 MB | Statically compiled binaries (Go, Rust) |
Always use a specific version tag like node:20-alpine instead of node:latest. The latest tag changes without warning and can break your builds. For maximum reproducibility, pin to a specific digest.
The .dockerignore File
Just like .gitignore prevents files from being tracked by Git, .dockerignore prevents files from being sent to the Docker daemon during builds. This makes builds faster and prevents sensitive files from accidentally ending up in your image.
Create a .dockerignore file in your project root:
.git
node_modules
dist
build
coverage
*.log
.env
.DS_Store
tmp
Without this file, Docker sends your entire project directory (including node_modules, .git history, and local secrets) to the daemon -- even if you never COPY them.
ENTRYPOINT vs CMD
These two instructions are often confused. Here is how they work together:
| Dockerfile | docker run app | docker run app --help |
|---|---|---|
CMD ["node", "server.js"] | Runs node server.js | Runs --help (CMD is replaced) |
ENTRYPOINT ["node", "server.js"] | Runs node server.js | Runs node server.js --help (args appended) |
ENTRYPOINT ["node"] + CMD ["server.js"] | Runs node server.js | Runs node --help (CMD is replaced) |
Rule of thumb: Use CMD for simple applications. Use ENTRYPOINT + CMD when you want a fixed executable with configurable arguments.
Always use the exec form (JSON array syntax) for reliable signal handling:
# Good: exec form - process receives SIGTERM directly
CMD ["node", "server.js"]
# Bad: shell form - runs through /bin/sh, signals may not reach your process
CMD node server.js
Common Pitfalls
| Mistake | Why It Hurts | Fix |
|---|---|---|
Using latest as base image | Builds break unpredictably | Pin to a specific version tag |
COPY . . before RUN npm install | Every code change re-installs all dependencies | Copy package*.json first, install, then copy source |
Storing secrets in ENV or COPY | Secrets are baked into image layers forever | Inject at runtime via env vars or mounted secrets |
Running as root | Compromised app has full system access | Add USER node or USER 10001 |
RUN apt-get update alone | Package cache becomes stale | Chain: RUN apt-get update && apt-get install -y ... |
No .dockerignore | Slow builds, secrets sent to daemon | Create .dockerignore with standard exclusions |
Building and Running Your Image
# Build the image and tag it
docker build -t my-app:1.0.0 .
# Check the image was created
docker images my-app
# Run a container from the image
docker run --rm -p 3000:3000 my-app:1.0.0
# Run with a custom command (overrides CMD)
docker run --rm my-app:1.0.0 --help
Adding Metadata with Labels
Labels add metadata to your image that helps with auditing and traceability. Use the OCI standard label names:
FROM node:20-alpine
LABEL org.opencontainers.image.source="https://github.com/example/repo"
LABEL org.opencontainers.image.revision="abc123"
LABEL org.opencontainers.image.created="2026-02-13T00:00:00Z"
WORKDIR /app
COPY package*.json ./
RUN npm ci --omit=dev
COPY . .
USER node
CMD ["node", "server.js"]
You can inspect labels on any image with docker inspect <image>.
Key Takeaways
- A Dockerfile is a build contract -- it determines the size, security, and reliability of your containers.
- Instruction order matters: copy dependency manifests first, install dependencies, then copy source code.
- Always use a
.dockerignoreto keep builds fast and prevent secret leakage. - Pin your base image versions -- never rely on
latestfor production builds. - Run as a non-root user whenever possible.
- Use exec form (
["node", "server.js"]) forCMDandENTRYPOINTto ensure proper signal handling.
What's Next
- Continue to Layer Cache and Build Context to understand how Docker caches layers and how to make your builds faster.