Container Security Best Practices
A comprehensive guide to container security covering minimal base images, non-root users, image scanning, secret management, read-only filesystems, and Kubernetes security contexts.
Container Security Best Practices
Overview
Container security is not a feature you bolt on after deployment — it is a discipline that starts with your base image choice, runs through every line of your Dockerfile, and extends into your orchestration layer and CI/CD pipeline. Most container breaches exploit preventable misconfigurations: running as root, bloated images with known CVEs, leaked secrets in build layers, and wide-open network policies. This guide covers the full attack surface and gives you concrete, production-tested techniques for hardening Docker containers, with a focus on Node.js workloads.
Prerequisites
- Working knowledge of Docker (building images, running containers, docker-compose)
- Familiarity with Node.js and npm
- Basic understanding of Linux user permissions and file systems
- Docker Engine 20.10+ recommended
- Optional: Kubernetes cluster for the security context sections
- Optional: Trivy or Snyk CLI installed for scanning examples
The Container Attack Surface
Before you can defend your containers, you need to understand where attackers get in. The attack surface breaks down into three layers.
Image Vulnerabilities
Your base image is someone else's Linux distribution. When you pull node:20, you are pulling Debian Bookworm with over 900 packages — most of which your application never touches. Each one of those packages is a potential CVE. A scan of a stock node:20 image typically reveals 200+ known vulnerabilities, with a handful rated critical.
$ trivy image node:20
2026-02-08T10:15:32.441Z INFO Vulnerability scanning...
node:20 (debian 12.4)
=====================
Total: 287 (UNKNOWN: 2, LOW: 172, MEDIUM: 81, HIGH: 27, CRITICAL: 5)
┌──────────────┬────────────────┬──────────┬────────────────────┬──────────────────┬─────────────────────────────────────┐
│ Library │ Vulnerability │ Severity │ Installed Version │ Fixed Version │ Title │
├──────────────┼────────────────┼──────────┼────────────────────┼──────────────────┼─────────────────────────────────────┤
│ libssl3 │ CVE-2024-5535 │ CRITICAL │ 3.0.13-1~deb12u1 │ 3.0.14-1~deb12u1 │ openssl: SSL_select_next_proto... │
│ libexpat1 │ CVE-2024-45490 │ CRITICAL │ 2.5.0-1 │ 2.5.0-1+deb12u1 │ libexpat: Negative Length Parsing │
│ perl-base │ CVE-2023-47038 │ HIGH │ 5.36.0-7+deb12u1 │ │ perl: Write past buffer end via... │
│ zlib1g │ CVE-2023-45853 │ HIGH │ 1:1.2.13.dfsg-1 │ │ MiniZip: integer overflow in... │
└──────────────┴────────────────┴──────────┴────────────────────┴──────────────────┴─────────────────────────────────────┘
Every vulnerability you ship is a vulnerability an attacker can exploit. The single most effective mitigation is reducing the number of packages in your image.
Runtime Exploits
If your container runs as root (the default), a container escape vulnerability gives an attacker root on the host. Kernel exploits like CVE-2024-21626 (Leaky Vessels) demonstrated that container isolation is not absolute. Running as root also means a compromised application can modify binaries, install tools, and pivot to other containers on the same network.
Orchestration Misconfiguration
In Kubernetes, running pods with default security settings means privileged mode is one YAML typo away. Missing network policies allow lateral movement between pods. Unprotected API servers, overly permissive RBAC, and mounted service account tokens give attackers a clear path from a single compromised container to the entire cluster.
Minimal Base Images
The fewer packages in your image, the fewer vulnerabilities you ship. Here are your options, ranked from most to fewest packages.
Alpine Linux
Alpine uses musl libc instead of glibc and weighs in at about 5 MB. The node:20-alpine image is roughly 130 MB — an 85% reduction from node:20.
FROM node:20-alpine
Alpine is my default recommendation for most Node.js workloads. It has a package manager (apk) for when you need to install additional system libraries, and the community support is excellent. The musl libc compatibility issues that plagued earlier versions are largely resolved for Node.js.
Distroless
Google's distroless images strip out everything — no shell, no package manager, no coreutils. You get the language runtime and nothing else.
FROM gcr.io/distroless/nodejs20-debian12
Distroless images are roughly 120 MB for Node.js. The lack of a shell makes debugging harder, but it also makes exploitation harder. An attacker who gains code execution cannot spawn a shell, install tools, or run arbitrary commands. For high-security workloads, this is worth the trade-off.
Scratch
The scratch image is literally empty — zero bytes. You can use it for Go or Rust binaries compiled as static executables, but it is not practical for Node.js since you need the V8 runtime.
# Only viable for statically compiled languages
FROM scratch
COPY myapp /myapp
CMD ["/myapp"]
My recommendation: Use node:20-alpine for most Node.js applications. Switch to distroless if you are building security-critical services and can accept the debugging limitations.
Running as Non-Root
This is the single most important security hardening step, and I am consistently surprised by how many production containers still run as root.
The USER Directive
FROM node:20-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
# Create a non-root user with a specific UID
RUN addgroup -g 1001 -S appgroup && \
adduser -u 1001 -S appuser -G appgroup
# Change ownership of the app directory
RUN chown -R appuser:appgroup /app
# Switch to the non-root user
USER appuser
EXPOSE 3000
CMD ["node", "server.js"]
Why Numeric IDs Matter
Always use numeric user and group IDs, not names. Kubernetes security policies match on numeric IDs, and some container runtimes resolve names differently. Using USER 1001 is more portable and more auditable than USER appuser.
# Prefer this
USER 1001
# Over this
USER appuser
What Breaks When You Drop Root
If your application tries to bind to port 80 or 443, it will fail without root privileges. The fix is simple: bind to a high port (3000, 8080) and let your load balancer or ingress controller handle the port mapping.
# This fails as non-root
Error: listen EACCES: permission denied 0.0.0.0:80
# Fix: bind to a high port instead
PORT=3000 node server.js
If your app writes to /tmp or creates log files, make sure those directories are writable by your non-root user before switching to it.
Multi-Stage Builds for Smaller Attack Surface
Multi-stage builds are a security tool, not just an optimization. By separating your build environment from your runtime environment, you ensure that build tools, source code, and development dependencies never ship to production.
# Stage 1: Build
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build
RUN npm prune --production
# Stage 2: Production
FROM node:20-alpine AS production
WORKDIR /app
RUN addgroup -g 1001 -S appgroup && \
adduser -u 1001 -S appuser -G appgroup
# Copy only production artifacts
COPY --from=builder --chown=appuser:appgroup /app/dist ./dist
COPY --from=builder --chown=appuser:appgroup /app/node_modules ./node_modules
COPY --from=builder --chown=appuser:appgroup /app/package.json ./
USER 1001
EXPOSE 3000
CMD ["node", "dist/server.js"]
The build stage had npm, TypeScript, webpack, and all your devDependencies. None of that makes it into the production image. Fewer binaries means fewer things an attacker can exploit.
Image Scanning with Trivy and Snyk
Scanning your images for known vulnerabilities should be as routine as running your test suite. Two tools dominate this space.
Trivy
Trivy is open source, fast, and scans images, filesystems, and IaC configs. I run it locally during development and in CI.
# Install Trivy
$ brew install trivy
# Scan an image
$ trivy image myapp:latest
# Scan with severity filter
$ trivy image --severity HIGH,CRITICAL myapp:latest
# Scan and fail the build on critical findings
$ trivy image --exit-code 1 --severity CRITICAL myapp:latest
# Scan a Dockerfile for misconfigurations
$ trivy config Dockerfile
Trivy also detects secrets accidentally baked into images:
$ trivy image --scanners vuln,secret myapp:latest
2026-02-08T10:30:15.221Z INFO Secret scanning enabled
myapp:latest (alpine 3.19.1)
============================
Total: 3 (HIGH: 2, CRITICAL: 1)
Secrets:
========
┌──────────┬───────────────────────┬────────────────────────────────────┐
│ Severity │ Rule │ Match │
├──────────┼───────────────────────┼────────────────────────────────────┤
│ CRITICAL │ AWS Access Key ID │ app/config.js:14 │
│ HIGH │ Private Key │ app/certs/server.key:1 │
└──────────┴───────────────────────┴────────────────────────────────────┘
Snyk
Snyk offers deeper vulnerability intelligence and fix suggestions, but requires an account. The free tier covers open source projects.
# Authenticate
$ snyk auth
# Scan a Docker image
$ snyk container test myapp:latest
# Scan with Dockerfile context for upgrade recommendations
$ snyk container test myapp:latest --file=Dockerfile
Testing myapp:latest...
✗ High severity vulnerability found in openssl/libssl3
Description: Buffer Overflow
Introduced through: openssl/[email protected]~deb12u1
Fixed in: 3.0.14-1~deb12u1
Recommendation: Upgrade base image to node:20.11.1-alpine
Organization: my-org
Package manager: apk
Target file: Dockerfile
Project name: docker-image|myapp
Docker image: myapp:latest
Tested 45 dependencies for known issues.
Found 12 issues (3 critical, 4 high, 5 medium).
Pinning Image Versions
Never use :latest in production. It is non-deterministic, unreproducible, and makes it impossible to audit what you deployed.
# Bad - what version is this? Nobody knows.
FROM node:latest
# Better - pinned to major version
FROM node:20-alpine
# Best - pinned to exact version with SHA digest
FROM node:20.11.1-alpine3.19@sha256:a1b2c3d4e5f6...
Pin to the SHA256 digest in production. This guarantees byte-for-byte reproducibility. Even if someone pushes a malicious image to the same tag, the digest will not match and your build will fail.
# Get the digest for an image
$ docker inspect --format='{{index .RepoDigests 0}}' node:20-alpine
node@sha256:a1f1b2c3d4e5f67890abcdef1234567890abcdef1234567890abcdef12345678
Update your pinned versions on a regular schedule — monthly at minimum — and re-scan after each update.
Managing Secrets
Secrets in containers are a minefield. Here is how to navigate it.
What NOT To Do
# NEVER do this - secrets are baked into image layers permanently
ENV DATABASE_URL=postgres://admin:password123@db:5432/myapp
ARG API_KEY=sk-1234567890abcdef
# NEVER do this - .env gets copied into the image
COPY . .
Even if you delete the environment variable in a later layer, docker history will reveal it. Build args are visible in the image metadata. Image layers are append-only; you cannot truly delete anything.
# Anyone with the image can see your secrets
$ docker history myapp:latest
IMAGE CREATED CREATED BY SIZE
a1b2c3d4e5f6 2 hours ago ARG API_KEY=sk-1234567890abcdef 0B
.dockerignore
The first line of defense. Create a .dockerignore that excludes sensitive files from the build context entirely.
.env
.env.*
*.pem
*.key
credentials.json
docker-compose*.yml
.git
node_modules
.npm
*.log
Runtime Secrets
Pass secrets at runtime via environment variables or mounted files. Never bake them into the image.
# Pass at runtime via environment variable
$ docker run -e DATABASE_URL="postgres://admin:pass@db:5432/myapp" myapp:latest
# Pass via env file (not baked into the image)
$ docker run --env-file .env myapp:latest
# Mount a secret file
$ docker run -v /run/secrets/db-password:/app/secrets/db-password:ro myapp:latest
Docker Secrets (Swarm / Compose)
Docker Swarm has built-in secret management. Secrets are encrypted at rest, transmitted over TLS, and mounted as tmpfs files inside the container.
# docker-compose.yml
version: "3.8"
services:
app:
image: myapp:latest
secrets:
- db_password
- api_key
secrets:
db_password:
file: ./secrets/db_password.txt
api_key:
file: ./secrets/api_key.txt
Access secrets in your Node.js application:
var fs = require("fs");
var path = require("path");
function readSecret(name) {
var secretPath = path.join("/run/secrets", name);
try {
return fs.readFileSync(secretPath, "utf8").trim();
} catch (err) {
console.error("Failed to read secret:", name, err.message);
return process.env[name.toUpperCase()] || null;
}
}
var dbPassword = readSecret("db_password");
var apiKey = readSecret("api_key");
Read-Only File Systems
If your application does not need to write to the filesystem at runtime, mount it read-only. This prevents attackers from modifying binaries, dropping malware, or altering configuration files.
# Run with read-only root filesystem
$ docker run --read-only myapp:latest
Most Node.js applications need to write to at least /tmp for temporary files. Mount a tmpfs volume for writable directories:
$ docker run --read-only \
--tmpfs /tmp:rw,noexec,nosuid,size=64m \
--tmpfs /app/logs:rw,noexec,nosuid,size=32m \
myapp:latest
The noexec flag prevents executing binaries from the tmpfs mount, and nosuid prevents setuid binaries. The size flag caps how much data can be written.
In Docker Compose:
services:
app:
image: myapp:latest
read_only: true
tmpfs:
- /tmp:size=64m,noexec,nosuid
- /app/logs:size=32m,noexec,nosuid
Resource Limits
An unrestricted container can consume all available memory, CPU, and PIDs on the host — either through a bug or a deliberate denial-of-service attack. Always set limits.
# Set memory and CPU limits
$ docker run \
--memory=512m \
--memory-swap=512m \
--cpus=1.0 \
--pids-limit=256 \
myapp:latest
--memory=512m— hard memory cap. The container is killed (OOMKilled) if it exceeds this.--memory-swap=512m— setting swap equal to memory effectively disables swap, preventing the container from using disk as memory.--cpus=1.0— limits to one CPU core.--pids-limit=256— prevents fork bombs. No container needs thousands of processes.
In Docker Compose:
services:
app:
image: myapp:latest
deploy:
resources:
limits:
memory: 512M
cpus: "1.0"
pids: 256
reservations:
memory: 256M
cpus: "0.5"
Network Security
Don't Expose Unnecessary Ports
Every exposed port is an attack vector. Only expose what your application actually needs.
# Only expose the application port
EXPOSE 3000
# Do NOT expose debugging ports, database ports, etc.
# EXPOSE 9229 <- Node.js debugger
# EXPOSE 5432 <- PostgreSQL
# EXPOSE 27017 <- MongoDB
Use Internal Networks
In Docker Compose, create internal networks that cannot reach the internet. Backend services like databases should never have external network access.
services:
app:
image: myapp:latest
networks:
- frontend
- backend
ports:
- "3000:3000"
db:
image: postgres:16-alpine
networks:
- backend
# No ports exposed to host
networks:
frontend:
driver: bridge
backend:
driver: bridge
internal: true # No internet access
The internal: true flag on the backend network means the database container cannot make outbound connections to the internet, preventing data exfiltration even if the database is compromised.
Docker Content Trust
Docker Content Trust (DCT) uses digital signatures to verify that images have not been tampered with. When enabled, Docker will only pull signed images.
# Enable content trust
$ export DOCKER_CONTENT_TRUST=1
# Now pulls will fail for unsigned images
$ docker pull unsigned-image:latest
Error: remote trust data does not exist for docker.io/library/unsigned-image
# Sign and push your own images
$ docker trust sign myregistry/myapp:1.0.0
# Inspect trust data
$ docker trust inspect --pretty myregistry/myapp:1.0.0
For CI/CD pipelines, set DOCKER_CONTENT_TRUST=1 as an environment variable. This ensures that every image pulled during the build is verified.
Security Scanning in CI/CD Pipelines
Scanning in CI is where security stops being optional and becomes enforced. Here is a GitHub Actions workflow that scans on every pull request and blocks merges with critical vulnerabilities.
# .github/workflows/security-scan.yml
name: Container Security Scan
on:
pull_request:
branches: [main, master]
push:
branches: [main, master]
jobs:
scan:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Build Docker image
run: docker build -t myapp:${{ github.sha }} .
- name: Run Trivy vulnerability scanner
uses: aquasecurity/trivy-action@master
with:
image-ref: myapp:${{ github.sha }}
format: table
exit-code: 1
severity: CRITICAL,HIGH
ignore-unfixed: true
- name: Run Trivy config scanner
uses: aquasecurity/trivy-action@master
with:
scan-type: config
scan-ref: .
exit-code: 1
severity: CRITICAL,HIGH
- name: Run Dockle linter
uses: erzz/dockle-action@v1
with:
image: myapp:${{ github.sha }}
exit-code: 1
failure-threshold: WARN
- name: Generate SBOM
uses: aquasecurity/trivy-action@master
with:
image-ref: myapp:${{ github.sha }}
format: cyclonedx
output: sbom.json
- name: Upload SBOM artifact
uses: actions/upload-artifact@v4
with:
name: sbom
path: sbom.json
This pipeline does four things: scans for known CVEs, checks Dockerfile best practices, lints the container configuration, and generates a Software Bill of Materials. Any critical or high vulnerability fails the build.
Kubernetes Security Contexts
In Kubernetes, the security context is where you enforce everything discussed above at the pod and container level. These settings override anything in the Dockerfile, which is exactly the point — you enforce security policy at the orchestration layer, not just the image layer.
Pod Security Context
apiVersion: v1
kind: Pod
metadata:
name: secure-app
spec:
securityContext:
runAsNonRoot: true
runAsUser: 1001
runAsGroup: 1001
fsGroup: 1001
seccompProfile:
type: RuntimeDefault
containers:
- name: app
image: myapp:1.0.0@sha256:a1b2c3d4e5f6...
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
runAsNonRoot: true
runAsUser: 1001
resources:
limits:
memory: "512Mi"
cpu: "1000m"
ephemeral-storage: "128Mi"
requests:
memory: "256Mi"
cpu: "250m"
volumeMounts:
- name: tmp
mountPath: /tmp
- name: logs
mountPath: /app/logs
volumes:
- name: tmp
emptyDir:
medium: Memory
sizeLimit: 64Mi
- name: logs
emptyDir:
medium: Memory
sizeLimit: 32Mi
Key settings explained:
runAsNonRoot: true— Kubernetes rejects the pod if the image tries to run as root.allowPrivilegeEscalation: false— prevents processes from gaining more privileges than their parent.readOnlyRootFilesystem: true— the container filesystem is read-only. Writable directories must be explicitly mounted.capabilities.drop: ALL— drops all Linux capabilities. Most applications need zero capabilities. If you need to bind to a low port, addNET_BIND_SERVICEback selectively.seccompProfile.type: RuntimeDefault— applies the container runtime's default seccomp profile, which blocks dangerous syscalls.
Network Policies
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: app-network-policy
spec:
podSelector:
matchLabels:
app: myapp
policyTypes:
- Ingress
- Egress
ingress:
- from:
- podSelector:
matchLabels:
app: ingress-controller
ports:
- protocol: TCP
port: 3000
egress:
- to:
- podSelector:
matchLabels:
app: postgres
ports:
- protocol: TCP
port: 5432
- to: # Allow DNS resolution
- namespaceSelector: {}
podSelector:
matchLabels:
k8s-app: kube-dns
ports:
- protocol: UDP
port: 53
This policy restricts the app pod to only accept traffic from the ingress controller on port 3000, and only connect outbound to the PostgreSQL pod on port 5432 and DNS. Everything else is denied.
Seccomp and AppArmor Profiles
Seccomp
Seccomp (Secure Computing Mode) filters which system calls a container can make. The default Docker seccomp profile blocks about 44 of the 300+ Linux syscalls, including potentially dangerous ones like reboot, mount, and clock_settime.
For high-security workloads, create a custom profile that only allows the syscalls your application actually uses:
{
"defaultAction": "SCMP_ACT_ERRNO",
"architectures": ["SCMP_ARCH_X86_64"],
"syscalls": [
{
"names": [
"read", "write", "open", "close", "stat", "fstat",
"mmap", "mprotect", "munmap", "brk", "access",
"pipe", "select", "sched_yield", "clone", "execve",
"exit", "wait4", "kill", "fcntl", "getdents",
"getcwd", "chdir", "rename", "mkdir", "rmdir",
"socket", "connect", "accept", "sendto", "recvfrom",
"bind", "listen", "epoll_create", "epoll_ctl", "epoll_wait",
"futex", "set_robust_list", "nanosleep", "clock_gettime",
"getpid", "getuid", "getgid", "gettid", "arch_prctl",
"set_tid_address", "exit_group", "openat", "newfstatat",
"ioctl", "pread64", "pwrite64", "getrandom"
],
"action": "SCMP_ACT_ALLOW"
}
]
}
Apply it at runtime:
$ docker run --security-opt seccomp=./seccomp-profile.json myapp:latest
AppArmor
AppArmor provides mandatory access control, restricting what files and capabilities a container can access. Docker applies a default AppArmor profile (docker-default) automatically on systems that support it.
# Check if AppArmor is active
$ docker inspect --format='{{.AppArmorProfile}}' <container_id>
docker-default
# Run with a custom AppArmor profile
$ docker run --security-opt apparmor=my-custom-profile myapp:latest
Supply Chain Security
Verifying Base Images
Only pull base images from trusted registries. Use official images from Docker Hub, Google Container Registry, or Amazon ECR Public.
# Verify image provenance with cosign
$ cosign verify --key cosign.pub myregistry/myapp:1.0.0
Verification for myregistry/myapp:1.0.0 --
The following checks were performed on each of these signatures:
- The cosign claims were validated
- The signatures were verified against the specified public key
[{"critical":{"identity":{"docker-reference":"myregistry/myapp"},...}]
SBOM Generation
A Software Bill of Materials documents every component in your image. This is increasingly required for compliance and makes vulnerability response faster — when the next Log4Shell drops, you can instantly check whether you are affected.
# Generate SBOM with Trivy
$ trivy image --format cyclonedx --output sbom.json myapp:latest
# Generate SBOM with Syft
$ syft myapp:latest -o cyclonedx-json > sbom.json
# Scan an existing SBOM for vulnerabilities
$ trivy sbom sbom.json
Store SBOMs alongside your container images in your registry. Most modern registries support attaching SBOMs as OCI artifacts.
Complete Working Example
Here is a hardened Dockerfile for a Node.js Express application, a docker-compose.yml with full security options, and a health check endpoint.
Application Code
// server.js
var express = require("express");
var fs = require("fs");
var path = require("path");
var os = require("os");
var app = express();
var PORT = process.env.PORT || 3000;
// Health check endpoint
app.get("/health", function(req, res) {
var health = {
status: "healthy",
uptime: process.uptime(),
timestamp: new Date().toISOString(),
pid: process.pid,
user: os.userInfo().username,
memoryUsage: process.memoryUsage().rss
};
res.status(200).json(health);
});
// Read secret from Docker secrets mount
function getSecret(name) {
var secretPath = path.join("/run/secrets", name);
try {
return fs.readFileSync(secretPath, "utf8").trim();
} catch (err) {
return process.env[name.toUpperCase().replace(/-/g, "_")] || null;
}
}
var dbPassword = getSecret("db-password");
app.get("/", function(req, res) {
res.json({ message: "Secure container running", user: os.userInfo().uid });
});
app.listen(PORT, "0.0.0.0", function() {
console.log("Server running on port " + PORT + " as UID " + process.getuid());
});
Hardened Dockerfile
# =============================================================================
# Stage 1: Build dependencies
# =============================================================================
FROM node:20.11.1-alpine3.19@sha256:bf77dc26e48ea95fca9d1aceb5acfa69d2e546b765ec2abfb502975f1a2bf7be AS builder
WORKDIR /app
# Copy dependency manifests first for layer caching
COPY package.json package-lock.json ./
# Install all dependencies (including devDependencies for build)
RUN npm ci --no-audit --no-fund
# Copy source code
COPY . .
# If you have a build step, run it here
# RUN npm run build
# Remove devDependencies for production
RUN npm prune --production
# =============================================================================
# Stage 2: Production image
# =============================================================================
FROM node:20.11.1-alpine3.19@sha256:bf77dc26e48ea95fca9d1aceb5acfa69d2e546b765ec2abfb502975f1a2bf7be AS production
# Install dumb-init for proper PID 1 signal handling
RUN apk add --no-cache dumb-init
# Set NODE_ENV
ENV NODE_ENV=production
WORKDIR /app
# Create non-root user with specific UID/GID
RUN addgroup -g 1001 -S appgroup && \
adduser -u 1001 -S appuser -G appgroup -h /app -s /sbin/nologin
# Copy production dependencies and application code
COPY --from=builder --chown=1001:1001 /app/node_modules ./node_modules
COPY --from=builder --chown=1001:1001 /app/package.json ./
COPY --from=builder --chown=1001:1001 /app/server.js ./
# Create writable directories needed at runtime
RUN mkdir -p /app/logs /tmp && \
chown -R 1001:1001 /app/logs /tmp
# Switch to non-root user
USER 1001
# Expose only the application port
EXPOSE 3000
# Health check - curl is not available in alpine by default, use node
HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
CMD ["node", "-e", "var http = require('http'); var options = { hostname: '127.0.0.1', port: 3000, path: '/health', timeout: 3000 }; var req = http.request(options, function(res) { process.exit(res.statusCode === 200 ? 0 : 1); }); req.on('error', function() { process.exit(1); }); req.end();"]
# Use dumb-init to handle signals properly
ENTRYPOINT ["dumb-init", "--"]
CMD ["node", "server.js"]
Docker Compose with Security Options
# docker-compose.yml
version: "3.8"
services:
app:
build:
context: .
dockerfile: Dockerfile
image: myapp:latest
container_name: secure-app
restart: unless-stopped
# Read-only root filesystem
read_only: true
# Writable temp directories
tmpfs:
- /tmp:size=64m,noexec,nosuid,nodev
- /app/logs:size=32m,noexec,nosuid,nodev
# Security options
security_opt:
- no-new-privileges:true
- seccomp:./seccomp-profile.json
cap_drop:
- ALL
# Resource limits
deploy:
resources:
limits:
memory: 512M
cpus: "1.0"
pids: 128
reservations:
memory: 128M
cpus: "0.25"
# Network configuration
ports:
- "3000:3000"
networks:
- frontend
- backend
# Secrets
secrets:
- db-password
- api-key
# Environment variables (non-sensitive only)
environment:
- NODE_ENV=production
- PORT=3000
- LOG_LEVEL=info
# Health check
healthcheck:
test: ["CMD", "node", "-e", "var h=require('http');h.get('http://127.0.0.1:3000/health',function(r){process.exit(r.statusCode===200?0:1)}).on('error',function(){process.exit(1)})"]
interval: 30s
timeout: 5s
retries: 3
start_period: 15s
depends_on:
db:
condition: service_healthy
db:
image: postgres:16.2-alpine3.19@sha256:1234abcd...
container_name: secure-db
restart: unless-stopped
# Database on internal network only
networks:
- backend
# Security hardening
read_only: true
security_opt:
- no-new-privileges:true
cap_drop:
- ALL
cap_add:
- CHOWN
- DAC_OVERRIDE
- FOWNER
- SETGID
- SETUID
tmpfs:
- /tmp:size=128m,noexec,nosuid,nodev
- /run/postgresql:size=16m,noexec,nosuid,nodev
volumes:
- pgdata:/var/lib/postgresql/data
secrets:
- db-password
environment:
- POSTGRES_DB=myapp
- POSTGRES_USER=appuser
- POSTGRES_PASSWORD_FILE=/run/secrets/db-password
deploy:
resources:
limits:
memory: 1G
cpus: "2.0"
pids: 256
healthcheck:
test: ["CMD-SHELL", "pg_isready -U appuser -d myapp"]
interval: 10s
timeout: 5s
retries: 5
start_period: 30s
networks:
frontend:
driver: bridge
backend:
driver: bridge
internal: true # No internet access for backend services
volumes:
pgdata:
driver: local
secrets:
db-password:
file: ./secrets/db-password.txt
api-key:
file: ./secrets/api-key.txt
CI Security Pipeline
# .github/workflows/container-security.yml
name: Container Security
on:
pull_request:
push:
branches: [main]
env:
IMAGE_NAME: myapp
DOCKER_CONTENT_TRUST: 1
jobs:
build-and-scan:
runs-on: ubuntu-latest
permissions:
security-events: write
contents: read
steps:
- uses: actions/checkout@v4
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Build image
run: |
docker build \
--no-cache \
--tag ${{ env.IMAGE_NAME }}:${{ github.sha }} \
.
- name: Trivy vulnerability scan
uses: aquasecurity/trivy-action@master
with:
image-ref: ${{ env.IMAGE_NAME }}:${{ github.sha }}
format: sarif
output: trivy-results.sarif
severity: CRITICAL,HIGH
exit-code: 1
ignore-unfixed: true
- name: Upload scan results to GitHub Security
uses: github/codeql-action/upload-sarif@v3
if: always()
with:
sarif_file: trivy-results.sarif
- name: Dockerfile lint with Hadolint
uses: hadolint/[email protected]
with:
dockerfile: Dockerfile
failure-threshold: warning
- name: Check for secrets in image
run: |
docker run --rm \
-v /var/run/docker.sock:/var/run/docker.sock \
aquasec/trivy:latest image \
--scanners secret \
--exit-code 1 \
${{ env.IMAGE_NAME }}:${{ github.sha }}
- name: Verify non-root user
run: |
USER=$(docker inspect --format='{{.Config.User}}' ${{ env.IMAGE_NAME }}:${{ github.sha }})
if [ "$USER" = "" ] || [ "$USER" = "root" ] || [ "$USER" = "0" ]; then
echo "ERROR: Container runs as root!"
exit 1
fi
echo "Container runs as user: $USER"
- name: Generate SBOM
uses: aquasecurity/trivy-action@master
with:
image-ref: ${{ env.IMAGE_NAME }}:${{ github.sha }}
format: cyclonedx
output: sbom.json
- name: Upload SBOM
uses: actions/upload-artifact@v4
with:
name: sbom-${{ github.sha }}
path: sbom.json
retention-days: 90
Common Issues & Troubleshooting
1. Permission Denied When Running as Non-Root
Error: EACCES: permission denied, open '/app/logs/app.log'
Cause: The application is trying to write to a directory owned by root. When you switch to USER 1001, the process cannot write to directories it does not own.
Fix: Set ownership before switching users:
RUN mkdir -p /app/logs && chown -R 1001:1001 /app/logs
USER 1001
Or mount the directory as a tmpfs volume in your compose file:
tmpfs:
- /app/logs:size=32m,noexec,nosuid
2. Read-Only Filesystem Breaks npm or Node.js Internals
Error: EROFS: read-only file system, mkdir '/app/.npm'
Error: EROFS: read-only file system, open '/app/node_modules/.cache/...'
Cause: Node.js and npm attempt to write cache files to the application directory. A read-only filesystem blocks these writes.
Fix: Mount tmpfs volumes for cache directories and set npm cache location:
ENV NPM_CONFIG_CACHE=/tmp/.npm
tmpfs:
- /tmp:size=64m,noexec,nosuid
3. Health Check Fails Because curl Is Not Installed
OCI runtime exec failed: exec failed: unable to start container process:
exec: "curl": executable file not found in $PATH: unknown
Cause: Alpine and distroless images do not include curl. A health check using curl will fail.
Fix: Use Node.js for health checks instead of curl:
HEALTHCHECK --interval=30s --timeout=5s --retries=3 \
CMD ["node", "-e", "var h=require('http');h.get('http://127.0.0.1:3000/health',function(r){process.exit(r.statusCode===200?0:1)}).on('error',function(){process.exit(1)})"]
4. Trivy Scan Reports Vulnerabilities in the Base Image You Cannot Fix
Total: 14 (HIGH: 3, CRITICAL: 1)
┌──────────┬───────────────┬──────────┬──────────────────┬──────────────┐
│ Library │ Vulnerability │ Severity │ Installed Version│ Fixed Version│
├──────────┼───────────────┼──────────┼──────────────────┼──────────────┤
│ busybox │ CVE-2023-XXXX │ HIGH │ 1.36.1-r2 │ │
└──────────┴───────────────┴──────────┴──────────────────┴──────────────┘
Cause: Some vulnerabilities in the base image have no fix available yet (the "Fixed Version" column is empty). Your CI pipeline fails even though there is nothing you can do about it.
Fix: Use --ignore-unfixed to skip vulnerabilities with no available fix, and maintain a .trivyignore file for acknowledged risks:
$ trivy image --ignore-unfixed --ignorefile .trivyignore myapp:latest
# .trivyignore
# Accepted risk: busybox CVE with no fix available, no exploit in the wild
CVE-2023-XXXX
# Accepted risk: low-severity zlib issue, mitigated by WAF
CVE-2023-45853
5. Container Killed with OOMKilled After Setting Memory Limits
State: {"Status":"exited","ExitCode":137,"OOMKilled":true}
Cause: The memory limit is set too low for the workload. Node.js reserves memory for the V8 heap, and the default --max-old-space-size may exceed your container limit.
Fix: Align Node.js heap size with your container memory limit. A good rule is to set --max-old-space-size to about 75% of the container memory limit:
# Container has 512MB limit, so set heap to ~384MB
CMD ["node", "--max-old-space-size=384", "server.js"]
6. Capabilities Dropped Too Aggressively, Application Fails to Start
Error: listen EACCES: permission denied 0.0.0.0:80
Cause: You dropped ALL capabilities but your application needs NET_BIND_SERVICE to bind to ports below 1024.
Fix: Either bind to a high port (recommended) or selectively add back the capability you need:
cap_drop:
- ALL
cap_add:
- NET_BIND_SERVICE # Only if you must bind to ports < 1024
The better solution is to use a high port and let your reverse proxy or load balancer handle port 80/443.
Best Practices
Use minimal base images. Start with Alpine or distroless. Every package you do not ship is a vulnerability you do not ship. A stock
node:20image has 287 known vulnerabilities;node:20-alpinetypically has fewer than 10.Never run as root. Use the
USERdirective with a numeric UID. EnforcerunAsNonRoot: truein Kubernetes security contexts. This is non-negotiable for production workloads.Pin image versions with SHA digests. Tags are mutable. Digests are not. Pin your base images to SHA256 digests and update them on a monthly schedule with fresh vulnerability scans.
Scan images in CI, not just locally. Integrate Trivy, Snyk, or Grype into your CI pipeline with
exit-code: 1on critical findings. If the scan fails, the build fails. No exceptions.Keep secrets out of the image. Never use
ENVorARGfor secrets in Dockerfiles. Use Docker secrets, mounted files, or a secrets manager like HashiCorp Vault. Always have a.dockerignorethat excludes.env,*.pem, andcredentials.json.Enable read-only filesystems. Mount the root filesystem as read-only and provide tmpfs volumes for the specific directories that need write access. Use
noexecandnosuidflags on writable mounts.Set resource limits on every container. Memory, CPU, and PID limits prevent denial-of-service attacks and contain the blast radius of runaway processes. A fork bomb in an unlimited container takes down the entire host.
Drop all Linux capabilities, then add back selectively. Most applications need zero capabilities. Start with
cap_drop: ALLand only add back what fails. Document why each capability is needed.Generate and store SBOMs. When the next critical vulnerability is disclosed, you need to know within minutes whether any of your deployed containers are affected. SBOMs make this possible.
Apply network policies. In Kubernetes, use NetworkPolicy resources to restrict pod-to-pod communication. In Docker Compose, use internal networks for backend services. Default-deny is the only sane network policy.
References
- Docker Security Best Practices — Official Docker documentation on security
- CIS Docker Benchmark — Center for Internet Security benchmark for Docker
- Trivy Documentation — Open-source vulnerability scanner
- Snyk Container Security — Commercial container scanning
- Kubernetes Pod Security Standards — Official Kubernetes security context documentation
- Google Distroless Images — Minimal container images
- Sigstore/Cosign — Container image signing and verification
- Hadolint — Dockerfile linter
- Docker Content Trust — Image signing with Notary
- OWASP Container Security Cheat Sheet — OWASP guidance on container hardening
