Container Security Basics
By default, Docker containers run as root. This means if an attacker exploits a vulnerability in your application, they get root access inside the container -- and potentially on the host. This lesson covers the three most impactful hardening controls.
The Three Core Controls
flowchart LR
A["1. Non-Root User<br/>USER directive"] --> B["2. Drop Capabilities<br/>cap_drop: ALL"]
B --> C["3. Read-Only Filesystem<br/>read_only: true"]
style A fill:#e8f5e9,stroke:#2e7d32
style B fill:#e3f2fd,stroke:#1565c0
style C fill:#fff3e0,stroke:#ef6c00
Control 1: Run as Non-Root
In the Dockerfile
FROM node:18-alpine
# Create a non-root user
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
# Set working directory and copy app
WORKDIR /app
COPY . .
# Switch to non-root user
USER appuser
CMD ["node", "server.js"]
At Runtime (Override)
docker run --user 1000:1000 my-app:1.0.0
In Compose
services:
api:
image: my-api:1.0.0
user: "1000:1000"
Verify
docker exec my-app whoami
# Should NOT output: root
docker inspect my-app --format '{{.Config.User}}'
Control 2: Drop Linux Capabilities
Linux capabilities grant specific privileges. Docker adds several by default (e.g., NET_RAW, CHOWN, SETUID). Drop all capabilities and add back only what is needed:
services:
api:
image: my-api:1.0.0
cap_drop:
- ALL
cap_add:
- NET_BIND_SERVICE # Only if binding to ports < 1024
| Capability | What It Permits | Usually Needed? |
|---|---|---|
NET_BIND_SERVICE | Bind to ports below 1024 | Only for port 80/443 |
CHOWN | Change file ownership | Rarely |
SETUID / SETGID | Change user/group identity | Rarely |
NET_RAW | Raw sockets (ping) | Rarely |
SYS_ADMIN | Broad admin operations | Never (almost) |
Verify
docker inspect my-app --format '{{json .HostConfig.CapDrop}}'
docker inspect my-app --format '{{json .HostConfig.CapAdd}}'
Control 3: Read-Only Root Filesystem
Prevent attackers from writing malicious files to the container filesystem:
services:
api:
image: my-api:1.0.0
read_only: true
tmpfs:
- /tmp # App needs a writable temp directory
- /run # Some processes need /run
volumes:
- app-data:/data # Persistent writable storage
The read_only: true flag makes the entire root filesystem immutable. Use tmpfs mounts for directories that need to be writable temporarily.
Verify
docker exec my-app touch /test-file
# Should fail: Read-only file system
Avoid --privileged
--privileged gives the container full access to the host. It disables all security boundaries:
# NEVER do this unless absolutely necessary
docker run --privileged my-app
If a container needs specific kernel access, use --cap-add for the specific capability instead of --privileged.
Avoid Host Namespace Sharing
These flags weaken container isolation:
| Flag | What It Shares | Risk |
|---|---|---|
--network host | Host network stack | Container sees all host traffic |
--pid host | Host process namespace | Container can see/signal host processes |
--ipc host | Host IPC namespace | Container can access host shared memory |
-v /:/host | Entire host filesystem | Container has full host access |
Only use these when explicitly justified and with compensating controls.
Hardened Compose Template
Putting it all together:
services:
api:
image: my-api:1.0.0
user: "1000:1000"
read_only: true
cap_drop: [ALL]
security_opt:
- no-new-privileges:true
tmpfs:
- /tmp
restart: unless-stopped
networks: [back]
networks:
back: {}
The no-new-privileges flag prevents processes from gaining additional privileges through setuid binaries.
Quick Audit Commands
# Check what user a container runs as
docker inspect CONTAINER --format '{{.Config.User}}'
# Check capabilities
docker inspect CONTAINER --format '{{json .HostConfig.CapDrop}}'
docker inspect CONTAINER --format '{{json .HostConfig.CapAdd}}'
# Check if privileged
docker inspect CONTAINER --format '{{.HostConfig.Privileged}}'
# Check mounts
docker inspect CONTAINER --format '{{json .Mounts}}'
Key Takeaways
- Run as non-root by default. Use
USERin Dockerfile oruser:in Compose. - Drop all capabilities with
cap_drop: ALLand only add back what is needed. - Use read-only root filesystem with
read_only: true. Mounttmpfsfor writable temp directories. - Never use
--privilegedunless absolutely necessary. Use specific--cap-addinstead. - Add
no-new-privilegesto prevent privilege escalation via setuid binaries.
What's Next
- Continue to Image Security to learn how to secure your image supply chain.