01 Mental Model: What Docker Actually Is

Docker is a containerization platform that packages three things together:

Application code (Python, Node.js, etc.)
Runtime (interpreter, dependencies)
System libraries (everything your app needs)

→ All bundled into a portable, isolated unit (container) that runs identically on your laptop, CI/CD, and production servers.
Docker solves: "It works on my machine"
Without Docker:
Dev Machine → Works ✓
CI Server    → Fails ✗  (missing library X)
Production   → Fails ✗  (wrong version Y)

With Docker:
Dev, CI, Prod → Same container → Works ✓✓✓

Key Analogy

Without Docker
Ship source code + hope the server has the right setup
With Docker
Ship a complete, sealed box. Open anywhere, it works.
02 Core Architecture
Docker system stack
┌─────────────────────────────────┐
│         Docker CLI              │
│  (docker run, build, push...)   │
└────────────────┬────────────────┘
                 │ (API calls)
┌────────────────▼────────────────┐
│        Docker Daemon            │
│        (dockerd process)        │
└────────────────┬────────────────┘
                 │ (manages)
┌────────────────▼────────────────┐
│        Docker Objects           │
├─────────────────────────────────┤
│ • Images     (blueprints)       │
│ • Containers (running tasks)    │
│ • Networks   (communication)    │
│ • Volumes    (persistent data)  │
└─────────────────────────────────┘

How it works

  1. You type a command: docker run myapp
  2. CLI contacts daemon: "Start a container from myapp image"
  3. Daemon creates: Isolated process, filesystem, network interface
  4. Container runs: Your app executes inside
03 Main Components
A. Images (Blueprint Layer)
Immutable templates used to create and run containers

A read-only snapshot of your entire application stack. Think: "snapshot of a hard drive".

Key properties

  • Layered — Built incrementally (cached for speed)
  • Versioned — Tagged with names like myapp:1.0, myapp:latest
  • Portable — Run anywhere Docker exists
Image structure (layers)
┌─────────────────────────────┐
│ Layer 5: Your app code      │ ← Copied last
├─────────────────────────────┤
│ Layer 4: Dependencies       │
├─────────────────────────────┤
│ Layer 3: Python 3.11        │
├─────────────────────────────┤
│ Layer 2: Linux base         │
├─────────────────────────────┤
│ Layer 1: Base image         │ ← Cached, reused
└─────────────────────────────┘
Each layer = cached & reusable

Commands

Pull a public image $ docker pull python:3.11
List images on your machine $ docker images
Build an image from Dockerfile $ docker build -t myapp:1.0 .
B. Containers (Execution Layer)
Running instance of an image. Isolated, ephemeral by default.
Image
Class definition (blueprint)
Container
Object instance (running process)

Key properties

  • Isolated — Own filesystem, processes, network
  • Lightweight — Shares host OS kernel (not a full VM)
  • Ephemeral — Data is lost when container stops (unless using volumes)

Essential commands

Start a container in background $ docker run -d -p 8000:8000 myapp
List running containers $ docker ps
View logs from a container $ docker logs <container-id>
Stop a container $ docker stop <container-id>
Execute a command inside a running container $ docker exec -it <container-id> /bin/bash
C. Volumes (Persistence Layer)
Persistent storage that survives container restarts.
Containers are ephemeral. When a container stops, all data inside is deleted.
If you run PostgreSQL in a container without a volume, your database is gone when the container crashes.

Named volumes

Managed by Docker. Data stored in Docker's directory on host.

Create and use a named volume $ docker run -v mydata:/var/lib/postgresql/data postgres

Bind mounts

Direct host directory ↔ container directory mapping.

Mount a host folder into container $ docker run -v /host/path:/container/path myapp
Volumes keep data safe
Without volume:
postgres container stops → database ✗ deleted

With volume:
postgres container stops → database ✓ persisted
new postgres container  → same database ✓
D. Networks (Communication)
Enables containers to communicate with each other and the host.

Network types

  • bridge — Default. Containers can talk to each other on same network.
  • host — Container shares host's network (no isolation)
  • overlay — Multi-host networking (Docker Swarm)

Example: Two containers talking

Step 1: Create a network $ docker network create mynet
Step 2: Start Redis on the network $ docker run --name redis --network=mynet redis
Step 3: Start app on the same network $ docker run --name myapp --network=mynet myapp
Magic: Inside myapp container, you can reach Redis at redis:6379 (by container name)
E. Docker Compose (Multi-Container Orchestration)
YAML file that defines and runs multiple containers together.

Instead of running 5+ docker commands manually, define everything in one file and start with docker-compose up.

Core concepts

  • Services — Each container definition
  • Volumes — Persistent storage definitions
  • Networks — Communication channels (auto-created)
  • Env files — Environment variables
04 Dockerfile Best Practices

Most Important Instructions

Instruction Purpose
FROM Base image
WORKDIR Set working directory
COPY Copy files
RUN Execute commands
CMD Default run command
ENTRYPOINT Fixed execution
ENV Environment variables
EXPOSE Document ports

Pattern: Python FastAPI app

Dockerfile (optimized for caching)
FROM python:3.11-slim

WORKDIR /app

# Copy only requirements first (cached if code changes)
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY . .

# Environment variables
ENV PYTHONUNBUFFERED=1
ENV PORT=8000

EXPOSE 8000

# Health check (production important)
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
    CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')"

# Run the application
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

Pattern: Node.js app with multi-stage build

Dockerfile (multi-stage reduces final size)
# Stage 1: Build
FROM node:18-alpine AS builder

WORKDIR /app
COPY package.json package-lock.json .
RUN npm ci

COPY . .
RUN npm run build

# Stage 2: Production
FROM node:18-alpine

WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules

EXPOSE 3000

CMD ["node", "dist/index.js"]

Critical rules

1. Order matters: Dependencies first, code last. Code changes often, dependencies don't.

2. Use slim images: python:3.11-slim vs python:3.11 saves ~800 MB

3. Don't run as root: Create a user for security

4. Multi-stage for smaller images: Build stage + runtime stage = smaller final image

5. Add health checks: Production containers need to report their status
05 Step-by-Step Implementation
1
Create a Dockerfile
Define the image recipe
2
Build the image
docker build -t myapp:1.0 .
3
Run a container locally
docker run -p 8000:8000 myapp:1.0
4
Push to registry (optional for local dev)
docker push myregistry/myapp:1.0

Full example: RAG system with FastAPI

Project structure
my-rag-app/
├── Dockerfile
├── docker-compose.yml
├── main.py               (FastAPI app)
├── requirements.txt
└── .dockerignore
requirements.txt
fastapi==0.104.1
uvicorn==0.24.0
langchain==0.0.340
pgvector==0.2.4
psycopg2-binary==2.9.9
pydantic==2.5.0
Dockerfile
FROM python:3.11-slim

WORKDIR /app

# Install system dependencies for pgvector
RUN apt-get update && apt-get install -y \
    postgresql-client \
    && rm -rf /var/lib/apt/lists/*

# Copy and install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application
COPY . .

ENV PYTHONUNBUFFERED=1
ENV POSTGRES_URL=postgresql://user:pass@postgres:5432/rag_db
ENV REDIS_URL=redis://redis:6379

EXPOSE 8000

HEALTHCHECK --interval=30s --timeout=10s --start-period=5s \
    CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')" || exit 1

CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
docker-compose.yml (recommended for local dev + staging)
version: "3.9"

services:
  backend:
    build:
      context: ..
      dockerfile: Docker/Dockerfile.backend
    container_name: partpricingtool-backend-test
    ports:
      - "8001:8001"
    env_file:
      - .env
    environment:
      OPENAI_API_KEY: ${OPENAI_API_KEY}
      SQL_TEST_USERNAME: ${SQL_TEST_USERNAME}
      SQL_TEST_PASSWORD: ${SQL_TEST_PASSWORD}
      SQL_TEST_HOST: ${SQL_TEST_HOST}
      SQL_TEST_DATABASE: ${SQL_TEST_DATABASE}
    volumes:
      - ../:/app          # Allows live reload of backend code
    command: uvicorn api.main:app --host 0.0.0.0 --port 8001 --reload

  frontend:
    build:
      context: ..
      dockerfile: Docker/Dockerfile.frontend
      args:
        REACT_APP_API_URL: "http://192.168.1.23:8001"
    container_name: partpricingtool-frontend-test
    ports:
      - "3001:80"               # Nginx serves React on port 80 -> exposed to 3001
    depends_on:
      - backend

networks:
  default:
    driver: bridge

Example: Complete RAG Stack docker-compose.yml file

docker-compose.yml (API + Postgres + Redis)
version: "3.9"

services:
  api:
    build: .
    container_name: rag_api
    ports:
      - "8000:8000"
    environment:
      POSTGRES_URL: postgresql://postgres:password@postgres:5432/rag_db
      REDIS_URL: redis://redis:6379
    depends_on:
      postgres:
        condition: service_healthy
      redis:
        condition: service_healthy
    volumes:
      - ./:/app  # Hot reload for development
    networks:
      - rag_network

  postgres:
    image: pgvector/pgvector:pg15
    container_name: rag_postgres
    environment:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: password
      POSTGRES_DB: rag_db
    ports:
      - "5432:5432"
    volumes:
      - postgres_data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 10s
      timeout: 5s
      retries: 5
    networks:
      - rag_network

  redis:
    image: redis:7-alpine
    container_name: rag_redis
    ports:
      - "6379:6379"
    volumes:
      - redis_data:/data
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 5s
      retries: 5
    networks:
      - rag_network

volumes:
  postgres_data:
  redis_data:

networks:
  rag_network:
    driver: bridge

Run everything

Start all services $ docker-compose up -d
View logs $ docker-compose logs -f api
Stop all $ docker-compose down
Stop and remove volumes $ docker-compose down -v
06 Docker for AI/ML Engineers

Your use cases → Docker patterns

Use Case Docker Role Key Pattern
LLM API (FastAPI) Package API + dependencies FROM python:3.11 + FastAPI app
RAG pipeline Isolate embedding service + DB Multi-service compose: app + pgvector + redis
Batch jobs (e.g., embedding) Scheduled containers (cron) docker run or Kubernetes CronJob
GPU models (LLaMA, etc.) CUDA base image + GPU access FROM nvidia/cuda:12.1-runtime
Reproducibility Lock environment versions requirements.txt pinned + Dockerfile

GPU example (LLM inference)

Dockerfile (GPU-enabled)
FROM nvidia/cuda:12.1-runtime-ubuntu22.04

WORKDIR /app

RUN apt-get update && apt-get install -y python3.11 python3-pip

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

CMD ["python3", "inference_server.py"]
Run with GPU access $ docker run --gpus all -p 8000:8000 llm-server

RAG system architecture (with Docker)

Three-tier RAG with Docker Compose
┌─────────────────────────────┐
│  FastAPI (LLM + retrieval)  │ ← docker-compose service
├─────────────────────────────┤
│  pgvector + PostgreSQL      │ ← docker-compose service
├─────────────────────────────┤
│  Redis (caching)            │ ← docker-compose service
└─────────────────────────────┘

All networked. All defined in one docker-compose.yml
• Package once, run anywhere — Your LLM pipeline works identically in CI/CD and production

• Version your environment — Lock specific versions of transformers, torch, etc.

• Separate concerns — API service ≠ vector DB ≠ cache

• GPU support — Docker works with NVIDIA GPUs natively

• Batch jobs — Run embedding jobs as containers on a schedule
07 Best Practices & Common Mistakes

Do This

Do Order dependencies before code
Install dependencies → copy code. Code changes often, dependencies don't. Exploit caching.
Do Use volumes for databases
Never store critical data inside a container without a volume. Data disappears on restart.
Do Add health checks
Production orchestrators (K8s) need to know if your container is alive.
Do Use .dockerignore
Prevent copying __pycache__, node_modules, .git into the image.
Do Pin base image versions
Use python:3.11, not python:latest. Prevents surprise breakage.

Avoid This

Don't Running multiple apps in one container
One container = one service. Use docker-compose for multi-service apps.
Don't Hardcoding secrets in Dockerfile
Use environment variables or secret management tools.
Don't Forgetting volumes for databases
No volume = data loss on restart. Always use volumes for stateful services.
Don't Using :latest in production
Unpredictable. Tag images with specific versions.
Don't Running containers as root
Create a non-root user for security.

.dockerignore (prevent bloat)

.dockerignore
.git
.gitignore
__pycache__
.pytest_cache
node_modules
dist
build
*.pyc
.env.local
.DS_Store
.vscode
08 Complete End-to-End Flow
From code to running container
Step 1: Write code
        ↓
Step 2: Create Dockerfile
        ↓
Step 3: docker build -t myapp:1.0 .
        ↓
Step 4: Image created (layers cached)
        ↓
Step 5: docker run -p 8000:8000 myapp:1.0
        ↓
Step 6: Container running (isolated process)
        ↓
Step 7: Push to registry (optional)
        ↓
Step 8: Deploy to cloud (Railway, Azure, K8s)

For multi-service apps (recommended)

Docker Compose flow
docker-compose.yml (defines all services)
        ↓
docker-compose up -d (start everything)
        ↓
Services:
  ├── API (FastAPI)
  ├── Database (PostgreSQL + pgvector)
  ├── Cache (Redis)
  └── All networked automatically
        ↓
docker-compose down    (stop all, keep volumes)
docker-compose down -v (stop all, delete volumes)

Quick reference commands

Command Purpose
docker build -t name:tag . Build image from Dockerfile
docker run -d -p 8000:8000 name:tag Start container (background)
docker ps List running containers
docker logs <id> View container logs
docker exec -it <id> /bin/bash Shell into running container
docker stop <id> Stop container gracefully
docker rm <id> Delete a stopped container
docker images List all images
docker-compose up -d Start all services defined in compose file
docker-compose down Stop all services (keep volumes)