|
| 1 | +--- |
| 2 | +title: How Docker Works |
| 3 | +lang: en |
| 4 | +layout: post |
| 5 | +audio: false |
| 6 | +translated: false |
| 7 | +generated: true |
| 8 | +--- |
| 9 | + |
| 10 | +Let’s dive into how Docker works, focusing on its use of Linux’s filesystem and how it isolates applications from each other. |
| 11 | + |
| 12 | +--- |
| 13 | + |
| 14 | +### What is Docker, at its Core? |
| 15 | +Docker is a platform that allows you to package, distribute, and run applications inside lightweight, isolated environments called **containers**. Unlike virtual machines (VMs), which include an entire operating system, Docker containers share the host OS kernel and use its features to provide isolation and efficiency. |
| 16 | + |
| 17 | +--- |
| 18 | + |
| 19 | +### How Docker Works |
| 20 | +Docker leverages several Linux kernel features to create and manage containers: |
| 21 | +1. **Namespaces** - For isolation. |
| 22 | +2. **Control Groups (cgroups)** - For resource management. |
| 23 | +3. **Union Filesystems** - For efficient storage and layering. |
| 24 | + |
| 25 | +Here’s how these pieces come together: |
| 26 | + |
| 27 | +--- |
| 28 | + |
| 29 | +#### 1. Linux Namespaces: Isolation Mechanism |
| 30 | +Namespaces create isolated "views" of system resources, ensuring that processes in one container don’t interfere with those in another. Key namespaces Docker uses include: |
| 31 | + |
| 32 | +- **PID Namespace**: Each container has its own process ID space. Process ID 1 inside a container is isolated from the host’s PID 1 (usually `init` or `systemd`). |
| 33 | +- **Network Namespace**: Containers get their own network stack (IP address, ports, routing tables). This is why two containers can listen on port 8080 without conflict. |
| 34 | +- **Mount Namespace**: Each container has its own view of the filesystem, isolated from the host and other containers. |
| 35 | +- **UTS Namespace**: Containers can have their own hostname and domain name. |
| 36 | +- **IPC Namespace**: Isolates inter-process communication (e.g., shared memory, message queues). |
| 37 | +- **User Namespace** (optional): Maps container users to host users, enhancing security. |
| 38 | + |
| 39 | +**Example**: If you run `ps` inside a container, you only see processes within that container’s PID namespace, not the host’s processes. |
| 40 | + |
| 41 | +--- |
| 42 | + |
| 43 | +#### 2. Control Groups (cgroups): Resource Limits |
| 44 | +Cgroups limit and monitor resource usage (CPU, memory, disk I/O, etc.) for each container. This prevents one container from hogging all system resources and starving others. |
| 45 | + |
| 46 | +- **How it works**: Docker assigns a cgroup to each container. You can set limits like: |
| 47 | + ```bash |
| 48 | + docker run --memory="512m" --cpus="0.5" myapp |
| 49 | + ``` |
| 50 | + This restricts the container to 512 MB of RAM and half a CPU core. |
| 51 | + |
| 52 | +- **Isolation**: While namespaces isolate visibility, cgroups isolate resource consumption. |
| 53 | + |
| 54 | +--- |
| 55 | + |
| 56 | +#### 3. Union Filesystems: Layered Storage |
| 57 | +Docker uses a **union filesystem** (e.g., OverlayFS, AUFS) to manage container images and their filesystems efficiently. This is how it ties into the Linux filesystem: |
| 58 | + |
| 59 | +- **Image Layers**: A Docker image is built from stacked, read-only layers. Each layer represents a set of changes (e.g., installing a package, copying files) defined in your `Dockerfile`. |
| 60 | + - Example: `FROM openjdk:17` is one layer, `COPY app.jar` adds another. |
| 61 | + - Layers are cached and reused, saving disk space and speeding up builds. |
| 62 | + |
| 63 | +- **Container Filesystem**: When you run a container, Docker adds a thin, writable layer on top of the read-only image layers. This is called a **copy-on-write (CoW)** mechanism: |
| 64 | + - Reads come from the image layers. |
| 65 | + - Writes (e.g., log files, temp data) go to the writable layer. |
| 66 | + - If a file in a lower layer is modified, it’s copied to the writable layer first (hence "copy-on-write"). |
| 67 | + |
| 68 | +- **Isolation**: Each container gets its own writable layer, so changes in one container don’t affect others, even if they share the same base image. |
| 69 | + |
| 70 | +- **On Disk**: On the host, these layers are stored in `/var/lib/docker` (e.g., `/var/lib/docker/overlay2` for OverlayFS). You don’t interact with this directly—Docker manages it. |
| 71 | + |
| 72 | +--- |
| 73 | + |
| 74 | +### How Apps Are Isolated from Each Other |
| 75 | +Here’s how the above components work together to isolate applications: |
| 76 | + |
| 77 | +1. **Process Isolation (PID Namespace)**: |
| 78 | + - Each container runs its app as an independent process tree, unaware of other containers or the host. |
| 79 | + |
| 80 | +2. **Network Isolation (Network Namespace)**: |
| 81 | + - Containers have separate network interfaces. Docker’s default "bridge" network assigns each container a unique IP, and NAT handles external communication. |
| 82 | + - Example: Two Spring Boot apps can both bind to port 8080 inside their containers without conflict. |
| 83 | + |
| 84 | +3. **Filesystem Isolation (Mount Namespace + UnionFS)**: |
| 85 | + - Each container sees only its own filesystem, built from the image layers plus its writable layer. |
| 86 | + - If Container A writes to `/tmp`, Container B doesn’t see it. |
| 87 | + |
| 88 | +4. **Resource Isolation (cgroups)**: |
| 89 | + - One app can’t exhaust the host’s CPU or memory and crash another. |
| 90 | + |
| 91 | +5. **Shared Kernel**: |
| 92 | + - Containers share the host’s Linux kernel, but namespaces ensure they don’t step on each other’s toes. Syscalls are filtered or redirected as needed. |
| 93 | + |
| 94 | +--- |
| 95 | + |
| 96 | +### Simplified Workflow |
| 97 | +1. **Build**: You create an image with a `Dockerfile`, defining the app and its dependencies. Docker stacks these into layers using a union filesystem. |
| 98 | +2. **Run**: Docker creates a container by: |
| 99 | + - Setting up namespaces for isolation. |
| 100 | + - Allocating cgroups for resource limits. |
| 101 | + - Mounting the image layers plus a writable layer. |
| 102 | + - Starting your app as the entrypoint (e.g., `java -jar app.jar`). |
| 103 | +3. **Cleanup**: When the container stops, the writable layer is discarded (unless saved as a new image with `docker commit`). |
| 104 | + |
| 105 | +--- |
| 106 | + |
| 107 | +### Why Linux? |
| 108 | +Docker relies on Linux kernel features (namespaces, cgroups, union filesystems) that are deeply integrated into how it operates. On macOS or Windows, Docker runs a lightweight Linux VM (via HyperKit or WSL2) to provide these capabilities, then maps them to the host OS. |
| 109 | + |
| 110 | +--- |
| 111 | + |
| 112 | +### Visual Analogy |
| 113 | +Think of Docker containers as apartments in a building: |
| 114 | +- **Namespaces**: Walls and doors keep tenants separate. |
| 115 | +- **Cgroups**: The landlord limits water or electricity per unit. |
| 116 | +- **UnionFS**: All apartments share a common blueprint (image layers), but each tenant can decorate their space (writable layer). |
| 117 | + |
| 118 | +--- |
| 119 | + |
| 120 | +Does this clarify how Docker works and isolates apps? Let me know if you’d like a deeper dive into any part! |
0 commit comments