01
Mental Model: What Docker Actually Is
Docker is a containerization platform that packages three things together:
Application code (Python, Node.js, etc.)
Runtime (interpreter, dependencies)
System libraries (everything your app needs)
→ All bundled into a portable, isolated unit (container) that runs identically on your laptop, CI/CD, and production servers.
Runtime (interpreter, dependencies)
System libraries (everything your app needs)
→ All bundled into a portable, isolated unit (container) that runs identically on your laptop, CI/CD, and production servers.
Docker solves: "It works on my machine"
Without Docker: Dev Machine → Works ✓ CI Server → Fails ✗ (missing library X) Production → Fails ✗ (wrong version Y) With Docker: Dev, CI, Prod → Same container → Works ✓✓✓
Key Analogy
Without Docker
Ship source code + hope the server has the right setup
With Docker
Ship a complete, sealed box. Open anywhere, it works.
02
Core Architecture
Docker system stack
┌─────────────────────────────────┐
│ Docker CLI │
│ (docker run, build, push...) │
└────────────────┬────────────────┘
│ (API calls)
┌────────────────▼────────────────┐
│ Docker Daemon │
│ (dockerd process) │
└────────────────┬────────────────┘
│ (manages)
┌────────────────▼────────────────┐
│ Docker Objects │
├─────────────────────────────────┤
│ • Images (blueprints) │
│ • Containers (running tasks) │
│ • Networks (communication) │
│ • Volumes (persistent data) │
└─────────────────────────────────┘
How it works
- You type a command:
docker run myapp - CLI contacts daemon: "Start a container from myapp image"
- Daemon creates: Isolated process, filesystem, network interface
- Container runs: Your app executes inside
03
Main Components
A. Images (Blueprint Layer)
Immutable templates used to create and run containers
A read-only snapshot of your entire application stack. Think: "snapshot of a hard drive".
Key properties
- Layered — Built incrementally (cached for speed)
- Versioned — Tagged with names like
myapp:1.0,myapp:latest - Portable — Run anywhere Docker exists
Image structure (layers)
┌─────────────────────────────┐ │ Layer 5: Your app code │ ← Copied last ├─────────────────────────────┤ │ Layer 4: Dependencies │ ├─────────────────────────────┤ │ Layer 3: Python 3.11 │ ├─────────────────────────────┤ │ Layer 2: Linux base │ ├─────────────────────────────┤ │ Layer 1: Base image │ ← Cached, reused └─────────────────────────────┘ Each layer = cached & reusable
Commands
Pull a public image
$ docker pull python:3.11
List images on your machine
$ docker images
Build an image from Dockerfile
$ docker build -t myapp:1.0 .
B. Containers (Execution Layer)
Running instance of an image. Isolated, ephemeral by default.
Image
Class definition (blueprint)
Container
Object instance (running process)
Key properties
- Isolated — Own filesystem, processes, network
- Lightweight — Shares host OS kernel (not a full VM)
- Ephemeral — Data is lost when container stops (unless using volumes)
Essential commands
Start a container in background
$ docker run -d -p 8000:8000 myapp
List running containers
$ docker ps
View logs from a container
$ docker logs <container-id>
Stop a container
$ docker stop <container-id>
Execute a command inside a running container
$ docker exec -it <container-id> /bin/bash
C. Volumes (Persistence Layer)
Persistent storage that survives container restarts.
Containers are ephemeral. When a container stops, all data inside is deleted.
If you run PostgreSQL in a container without a volume, your database is gone when the container crashes.
If you run PostgreSQL in a container without a volume, your database is gone when the container crashes.
Named volumes
Managed by Docker. Data stored in Docker's directory on host.
Create and use a named volume
$ docker run -v mydata:/var/lib/postgresql/data postgres
Bind mounts
Direct host directory ↔ container directory mapping.
Mount a host folder into container
$ docker run -v /host/path:/container/path myapp
Volumes keep data safe
Without volume: postgres container stops → database ✗ deleted With volume: postgres container stops → database ✓ persisted new postgres container → same database ✓
D. Networks (Communication)
Enables containers to communicate with each other and the host.
Network types
- bridge — Default. Containers can talk to each other on same network.
- host — Container shares host's network (no isolation)
- overlay — Multi-host networking (Docker Swarm)
Example: Two containers talking
Step 1: Create a network
$ docker network create mynet
Step 2: Start Redis on the network
$ docker run --name redis --network=mynet redis
Step 3: Start app on the same network
$ docker run --name myapp --network=mynet myapp
Magic: Inside myapp container, you can reach Redis at
redis:6379 (by container name)
E. Docker Compose (Multi-Container Orchestration)
YAML file that defines and runs multiple containers together.
Instead of running 5+ docker commands manually, define everything in one file and start with docker-compose up.
Core concepts
- Services — Each container definition
- Volumes — Persistent storage definitions
- Networks — Communication channels (auto-created)
- Env files — Environment variables
04
Dockerfile Best Practices
Most Important Instructions
| Instruction | Purpose |
|---|---|
FROM |
Base image |
WORKDIR |
Set working directory |
COPY |
Copy files |
RUN |
Execute commands |
CMD |
Default run command |
ENTRYPOINT |
Fixed execution |
ENV |
Environment variables |
EXPOSE |
Document ports |
Pattern: Python FastAPI app
Dockerfile (optimized for caching)
FROM python:3.11-slim
WORKDIR /app
# Copy only requirements first (cached if code changes)
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application code
COPY . .
# Environment variables
ENV PYTHONUNBUFFERED=1
ENV PORT=8000
EXPOSE 8000
# Health check (production important)
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')"
# Run the application
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
Pattern: Node.js app with multi-stage build
Dockerfile (multi-stage reduces final size)
# Stage 1: Build
FROM node:18-alpine AS builder
WORKDIR /app
COPY package.json package-lock.json .
RUN npm ci
COPY . .
RUN npm run build
# Stage 2: Production
FROM node:18-alpine
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
EXPOSE 3000
CMD ["node", "dist/index.js"]
Critical rules
1. Order matters: Dependencies first, code last. Code changes often, dependencies don't.
2. Use slim images:
3. Don't run as root: Create a user for security
4. Multi-stage for smaller images: Build stage + runtime stage = smaller final image
5. Add health checks: Production containers need to report their status
2. Use slim images:
python:3.11-slim vs python:3.11 saves ~800 MB3. Don't run as root: Create a user for security
4. Multi-stage for smaller images: Build stage + runtime stage = smaller final image
5. Add health checks: Production containers need to report their status
05
Step-by-Step Implementation
1
Create a Dockerfile
Define the image recipe
2
Build the image
docker build -t myapp:1.0 .3
Run a container locally
docker run -p 8000:8000 myapp:1.04
Push to registry (optional for local dev)
docker push myregistry/myapp:1.0Full example: RAG system with FastAPI
Project structure
my-rag-app/
├── Dockerfile
├── docker-compose.yml
├── main.py (FastAPI app)
├── requirements.txt
└── .dockerignore
requirements.txt
fastapi==0.104.1
uvicorn==0.24.0
langchain==0.0.340
pgvector==0.2.4
psycopg2-binary==2.9.9
pydantic==2.5.0
Dockerfile
FROM python:3.11-slim
WORKDIR /app
# Install system dependencies for pgvector
RUN apt-get update && apt-get install -y \
postgresql-client \
&& rm -rf /var/lib/apt/lists/*
# Copy and install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application
COPY . .
ENV PYTHONUNBUFFERED=1
ENV POSTGRES_URL=postgresql://user:pass@postgres:5432/rag_db
ENV REDIS_URL=redis://redis:6379
EXPOSE 8000
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s \
CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')" || exit 1
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
docker-compose.yml (recommended for local dev + staging)
version: "3.9"
services:
backend:
build:
context: ..
dockerfile: Docker/Dockerfile.backend
container_name: partpricingtool-backend-test
ports:
- "8001:8001"
env_file:
- .env
environment:
OPENAI_API_KEY: ${OPENAI_API_KEY}
SQL_TEST_USERNAME: ${SQL_TEST_USERNAME}
SQL_TEST_PASSWORD: ${SQL_TEST_PASSWORD}
SQL_TEST_HOST: ${SQL_TEST_HOST}
SQL_TEST_DATABASE: ${SQL_TEST_DATABASE}
volumes:
- ../:/app # Allows live reload of backend code
command: uvicorn api.main:app --host 0.0.0.0 --port 8001 --reload
frontend:
build:
context: ..
dockerfile: Docker/Dockerfile.frontend
args:
REACT_APP_API_URL: "http://192.168.1.23:8001"
container_name: partpricingtool-frontend-test
ports:
- "3001:80" # Nginx serves React on port 80 -> exposed to 3001
depends_on:
- backend
networks:
default:
driver: bridge
Example: Complete RAG Stack docker-compose.yml file
docker-compose.yml (API + Postgres + Redis)
version: "3.9"
services:
api:
build: .
container_name: rag_api
ports:
- "8000:8000"
environment:
POSTGRES_URL: postgresql://postgres:password@postgres:5432/rag_db
REDIS_URL: redis://redis:6379
depends_on:
postgres:
condition: service_healthy
redis:
condition: service_healthy
volumes:
- ./:/app # Hot reload for development
networks:
- rag_network
postgres:
image: pgvector/pgvector:pg15
container_name: rag_postgres
environment:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: password
POSTGRES_DB: rag_db
ports:
- "5432:5432"
volumes:
- postgres_data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 10s
timeout: 5s
retries: 5
networks:
- rag_network
redis:
image: redis:7-alpine
container_name: rag_redis
ports:
- "6379:6379"
volumes:
- redis_data:/data
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
timeout: 5s
retries: 5
networks:
- rag_network
volumes:
postgres_data:
redis_data:
networks:
rag_network:
driver: bridge
Run everything
Start all services
$ docker-compose up -d
View logs
$ docker-compose logs -f api
Stop all
$ docker-compose down
Stop and remove volumes
$ docker-compose down -v
06
Docker for AI/ML Engineers
Your use cases → Docker patterns
| Use Case | Docker Role | Key Pattern |
|---|---|---|
| LLM API (FastAPI) | Package API + dependencies | FROM python:3.11 + FastAPI app |
| RAG pipeline | Isolate embedding service + DB | Multi-service compose: app + pgvector + redis |
| Batch jobs (e.g., embedding) | Scheduled containers (cron) | docker run or Kubernetes CronJob |
| GPU models (LLaMA, etc.) | CUDA base image + GPU access | FROM nvidia/cuda:12.1-runtime |
| Reproducibility | Lock environment versions | requirements.txt pinned + Dockerfile |
GPU example (LLM inference)
Dockerfile (GPU-enabled)
FROM nvidia/cuda:12.1-runtime-ubuntu22.04
WORKDIR /app
RUN apt-get update && apt-get install -y python3.11 python3-pip
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["python3", "inference_server.py"]
Run with GPU access
$ docker run --gpus all -p 8000:8000 llm-server
RAG system architecture (with Docker)
Three-tier RAG with Docker Compose
┌─────────────────────────────┐ │ FastAPI (LLM + retrieval) │ ← docker-compose service ├─────────────────────────────┤ │ pgvector + PostgreSQL │ ← docker-compose service ├─────────────────────────────┤ │ Redis (caching) │ ← docker-compose service └─────────────────────────────┘ All networked. All defined in one docker-compose.yml
• Package once, run anywhere — Your LLM pipeline works identically in CI/CD and production
• Version your environment — Lock specific versions of transformers, torch, etc.
• Separate concerns — API service ≠ vector DB ≠ cache
• GPU support — Docker works with NVIDIA GPUs natively
• Batch jobs — Run embedding jobs as containers on a schedule
• Version your environment — Lock specific versions of transformers, torch, etc.
• Separate concerns — API service ≠ vector DB ≠ cache
• GPU support — Docker works with NVIDIA GPUs natively
• Batch jobs — Run embedding jobs as containers on a schedule
07
Best Practices & Common Mistakes
Do This
Do
Order dependencies before code
Install dependencies → copy code. Code changes often, dependencies don't. Exploit caching.
Install dependencies → copy code. Code changes often, dependencies don't. Exploit caching.
Do
Use volumes for databases
Never store critical data inside a container without a volume. Data disappears on restart.
Never store critical data inside a container without a volume. Data disappears on restart.
Do
Add health checks
Production orchestrators (K8s) need to know if your container is alive.
Production orchestrators (K8s) need to know if your container is alive.
Do
Use .dockerignore
Prevent copying
Prevent copying
__pycache__, node_modules, .git into the image.
Do
Pin base image versions
Use
Use
python:3.11, not python:latest. Prevents surprise breakage.
Avoid This
Don't
Running multiple apps in one container
One container = one service. Use docker-compose for multi-service apps.
One container = one service. Use docker-compose for multi-service apps.
Don't
Hardcoding secrets in Dockerfile
Use environment variables or secret management tools.
Use environment variables or secret management tools.
Don't
Forgetting volumes for databases
No volume = data loss on restart. Always use volumes for stateful services.
No volume = data loss on restart. Always use volumes for stateful services.
Don't
Using :latest in production
Unpredictable. Tag images with specific versions.
Unpredictable. Tag images with specific versions.
Don't
Running containers as root
Create a non-root user for security.
Create a non-root user for security.
.dockerignore (prevent bloat)
.dockerignore
.git
.gitignore
__pycache__
.pytest_cache
node_modules
dist
build
*.pyc
.env.local
.DS_Store
.vscode
08
Complete End-to-End Flow
From code to running container
Step 1: Write code
↓
Step 2: Create Dockerfile
↓
Step 3: docker build -t myapp:1.0 .
↓
Step 4: Image created (layers cached)
↓
Step 5: docker run -p 8000:8000 myapp:1.0
↓
Step 6: Container running (isolated process)
↓
Step 7: Push to registry (optional)
↓
Step 8: Deploy to cloud (Railway, Azure, K8s)
For multi-service apps (recommended)
Docker Compose flow
docker-compose.yml (defines all services)
↓
docker-compose up -d (start everything)
↓
Services:
├── API (FastAPI)
├── Database (PostgreSQL + pgvector)
├── Cache (Redis)
└── All networked automatically
↓
docker-compose down (stop all, keep volumes)
docker-compose down -v (stop all, delete volumes)
Quick reference commands
| Command | Purpose |
|---|---|
docker build -t name:tag . |
Build image from Dockerfile |
docker run -d -p 8000:8000 name:tag |
Start container (background) |
docker ps |
List running containers |
docker logs <id> |
View container logs |
docker exec -it <id> /bin/bash |
Shell into running container |
docker stop <id> |
Stop container gracefully |
docker rm <id> |
Delete a stopped container |
docker images |
List all images |
docker-compose up -d |
Start all services defined in compose file |
docker-compose down |
Stop all services (keep volumes) |