Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chronograf can crash when using Docker bind mounts #781

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Paraphraser
Copy link

@Paraphraser Paraphraser commented Jan 21, 2025

Assume a Docker "bind mount" is used to map Chronograf's persistent store. Examples:

  • a docker run command:

     $ docker run -v ./chronograf:/var/lib/chronograf chronograf
    
  • these lines in a docker compose service definition:

     volumes:
       - ./chronograf:/var/lib/chronograf
    

Prior to starting the container, Docker tries to ensure that the external path to the persistent store exists via the equivalent of:

$ sudo mkdir -p ./chronograf

The practical result is that any path component that didn't exist beforehand is created and owned by root.

Make two assumptions (typical "first launch" conditions):

  1. That ./chronograf did not exist so Docker has just created the chronograf folder with root ownership; and
  2. That Docker launches the container as root (the default).

In the absence of passing CHRONOGRAF_AS_ROOT, the first-time user is then in the situation where:

  1. the persistent store is owned by root;

  2. the container launches as root but downgrades its privileges to user chronograf (userID 999);

  3. the executable is then unable to write into its persistent store. It crashes with the error message:

    time="«timestamp»" level=error msg="Unable to create bolt clientUnable to open boltdb; is there a chronograf already running?  open /var/lib/chronograf/chronograf-v1.db: permission denied"
    
  4. Depending on how the container was launched, it then either halts or goes into a restart loop (eg if restart: unless-stopped).

Currently, there are two solutions to this:

  1. The user passes the CHRONOGRAF_AS_ROOT environment variable with the value true; or

  2. The user manually adjusts ownership on the persistent store:

    $ sudo chown -R 999:999 ./chronograf
    

Option 1 defeats the purpose of running with reduced privileges. Option 2 isn't documented so it is an example of "hidden knowledge". The user has to:

  • recognise that the service is not running (which is not always immediately obvious to inexperienced users);
  • know to consult docker logs -f chronograf (the -f being particularly important if the container is in a restart loop);
  • be able to interpret the error message correctly (ie that "permission denied" is the critical element);
  • realise that changing ownership on the persistent store is the correct response; and
  • know to use userID 999 in the chown (or 100:101 for the alpine container).

It would be preferable if the container handled these situations correctly for itself, which is the main goal of this Pull Request.

This problem does not occur if a named volume mount is used rather than a bind mount. That is because of the "copy" step whereby Docker recursively copies the internal path to the external path before the Unix-bind-mount association is formed. The last path component of the volume mount (ie the _data folder) is then owned by userID 999. Even if CHRONOGRAF_AS_ROOT is true, root can still write into that folder.

If the container is launched without an explicit volume mapping, a new anonymous volume mount is created each time the container is recreated, but otherwise behaves the same as a named volume mount. This is a side-effect of the Dockerfile declaration:

VOLUME /var/lib/chronograf

Removing the VOLUME statement would avoid this side-effect. In that case, /var/lib/chronograf would only exist inside the container while it was running and would not persist. Neither would there be a steady accumulation of unused anonymous volume mounts.

Although the default for Docker is to launch the container as root, it is also possible to use either the -u option (docker run) or user: clause (docker compose) to have Docker launch the container as some other user. In this situation, with the exception of userID 999, the container will lack the privileges to write to /var/lib/chronograf so it will abort with the permission error mentioned above, and the user will also have to know which userID to employ to set up the persistent store.

This Pull Request tries to deal with that possibility by writing a hint into the log. For example, if the container is launched as userID 1000 but doesn't have write permission for /var/lib/chronograf, the user would see:

You need to change ownership on chronograf's persistent store. Run:
  sudo chown -R 1000:1000 /path/to/persistent/store

Assume a Docker "bind mount" is used to map Chronograf's persistent store. Examples:

* a `docker run` command:

	```
	$ docker run -v ./chronograf:/var/lib/chronograf chronograf
	```

* these lines in a `docker compose` service definition:

	```
	volumes:
	  - ./chronograf:/var/lib/chronograf
	```

Prior to starting the container, Docker tries to ensure that the *external* path to the persistent store exists via the equivalent of:

```
$ sudo mkdir -p ./chronograf
```

The practical result is that any path component that didn't exist beforehand is created and owned by root.

Make two assumptions (typical "first launch" conditions):

1. That `./chronograf` did not exist so Docker has just created the `chronograf` folder with root ownership; and
2. That Docker launches the container as root (the default).

In the absence of passing `CHRONOGRAF_AS_ROOT`, the first-time user is then in the situation where:

1. the persistent store is owned by root;
2. the container launches as root but downgrades its privileges to user `chronograf` (userID 999);
3. the executable is then unable to write into its persistent store. It crashes with the error message:

	```
	time="«timestamp»" level=error msg="Unable to create bolt clientUnable to open boltdb; is there a chronograf already running?  open /var/lib/chronograf/chronograf-v1.db: permission denied"
	```

4. Depending on how the container was launched, it then either halts or goes into a restart loop (eg if `restart: unless-stopped`).

Currently, there are two solutions to this:

1. The user passes the `CHRONOGRAF_AS_ROOT` environment variable with the value `true`; or
2. The user manually adjusts ownership on the persistent store:

	```
	$ sudo chown -R 999:999 ./chronograf
	```

Option 1 defeats the purpose of running with reduced privileges. Option 2 isn't documented so it is an example of "hidden knowledge". The user has to:

* recognise that the service is not running (which is not always immediately obvious to inexperienced users);
* know to consult `docker logs -f chronograf` (the `-f` being particularly important if the container is in a restart loop);
* be able to interpret the error message correctly (ie that "permission denied" is the critical element);
* realise that changing ownership on the persistent store is the correct response; and
* know to use userID 999 in the `chown`.

It would be preferable if the container handled these situations correctly for itself, which is the main goal of this Pull Request.

This problem does not occur if a *named volume mount* is used rather than a *bind mount*. That is because of the "copy" step whereby Docker recursively copies the internal path to the external path before the Unix-bind-mount association is formed. The last path component of the volume mount (ie the `_data` folder) is then owned by userID 999. Even if `CHRONOGRAF_AS_ROOT` is `true`, root can still write into that folder.

If the container is launched *without* an explicit volume mapping, a new *anonymous volume mount* is created each time the container is recreated, but otherwise behaves the same as a *named volume mount*. This is a side-effect of the Dockerfile declaration:

```
VOLUME /var/lib/chronograf
```

> Removing the `VOLUME` statement would avoid this side-effect. In that case, `/var/lib/chronograf` would only exist inside the container while it was running and would not persist. Neither would there be a steady accumulation of unused anonymous volume mounts.

Although the default for Docker is to launch the container as root, it is also possible to use either the `-u` option (`docker run`) or `user:` clause (`docker compose`) to have Docker launch the container as some other user. In this situation, with the exception of userID 999, the container will lack the privileges to write to `/var/lib/chronograf` so it will abort with the permission error mentioned above, and the user will also have to know which userID to employ to set up the persistent store.

This Pull Request tries to deal with that possibility by writing a hint into the log. For example, if the container is launched as userID 1000 but doesn't have write permission for `/var/lib/chronograf`, the user would see:

```
You need to change ownership on chronograf's persistent store. Run:
  sudo chown -R 1000:1000 /path/to/persistent/store
```

Signed-off-by: Phill Kelley <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant