chronograf can crash when using Docker bind mounts #781

Paraphraser · 2025-01-21T01:21:50Z

Assume a Docker "bind mount" is used to map Chronograf's persistent store. Examples:

a docker run command:

 $ docker run -v ./chronograf:/var/lib/chronograf chronograf

these lines in a docker compose service definition:

 volumes:
   - ./chronograf:/var/lib/chronograf

Prior to starting the container, Docker tries to ensure that the external path to the persistent store exists via the equivalent of:

$ sudo mkdir -p ./chronograf

The practical result is that any path component that didn't exist beforehand is created and owned by root.

Make two assumptions (typical "first launch" conditions):

That ./chronograf did not exist so Docker has just created the chronograf folder with root ownership; and
That Docker launches the container as root (the default).

In the absence of passing CHRONOGRAF_AS_ROOT, the first-time user is then in the situation where:

the persistent store is owned by root;
the container launches as root but downgrades its privileges to user chronograf (userID 999);

the executable is then unable to write into its persistent store. It crashes with the error message:

time="«timestamp»" level=error msg="Unable to create bolt clientUnable to open boltdb; is there a chronograf already running?  open /var/lib/chronograf/chronograf-v1.db: permission denied"

Depending on how the container was launched, it then either halts or goes into a restart loop (eg if restart: unless-stopped).

Currently, there are two solutions to this:

The user passes the CHRONOGRAF_AS_ROOT environment variable with the value true; or
The user manually adjusts ownership on the persistent store:
```
$ sudo chown -R 999:999 ./chronograf
```

Option 1 defeats the purpose of running with reduced privileges. Option 2 isn't documented so it is an example of "hidden knowledge". The user has to:

recognise that the service is not running (which is not always immediately obvious to inexperienced users);
know to consult docker logs -f chronograf (the -f being particularly important if the container is in a restart loop);
be able to interpret the error message correctly (ie that "permission denied" is the critical element);
realise that changing ownership on the persistent store is the correct response; and
know to use userID 999 in the chown (or 100:101 for the alpine container).

It would be preferable if the container handled these situations correctly for itself, which is the main goal of this Pull Request.

This problem does not occur if a named volume mount is used rather than a bind mount. That is because of the "copy" step whereby Docker recursively copies the internal path to the external path before the Unix-bind-mount association is formed. The last path component of the volume mount (ie the _data folder) is then owned by userID 999. Even if CHRONOGRAF_AS_ROOT is true, root can still write into that folder.

If the container is launched without an explicit volume mapping, a new anonymous volume mount is created each time the container is recreated, but otherwise behaves the same as a named volume mount. This is a side-effect of the Dockerfile declaration:

VOLUME /var/lib/chronograf

Removing the VOLUME statement would avoid this side-effect. In that case, /var/lib/chronograf would only exist inside the container while it was running and would not persist. Neither would there be a steady accumulation of unused anonymous volume mounts.

Although the default for Docker is to launch the container as root, it is also possible to use either the -u option (docker run) or user: clause (docker compose) to have Docker launch the container as some other user. In this situation, with the exception of userID 999, the container will lack the privileges to write to /var/lib/chronograf so it will abort with the permission error mentioned above, and the user will also have to know which userID to employ to set up the persistent store.

This Pull Request tries to deal with that possibility by writing a hint into the log. For example, if the container is launched as userID 1000 but doesn't have write permission for /var/lib/chronograf, the user would see:

You need to change ownership on chronograf's persistent store. Run:
  sudo chown -R 1000:1000 /path/to/persistent/store

Assume a Docker "bind mount" is used to map Chronograf's persistent store. Examples: * a `docker run` command: ``` $ docker run -v ./chronograf:/var/lib/chronograf chronograf ``` * these lines in a `docker compose` service definition: ``` volumes: - ./chronograf:/var/lib/chronograf ``` Prior to starting the container, Docker tries to ensure that the *external* path to the persistent store exists via the equivalent of: ``` $ sudo mkdir -p ./chronograf ``` The practical result is that any path component that didn't exist beforehand is created and owned by root. Make two assumptions (typical "first launch" conditions): 1. That `./chronograf` did not exist so Docker has just created the `chronograf` folder with root ownership; and 2. That Docker launches the container as root (the default). In the absence of passing `CHRONOGRAF_AS_ROOT`, the first-time user is then in the situation where: 1. the persistent store is owned by root; 2. the container launches as root but downgrades its privileges to user `chronograf` (userID 999); 3. the executable is then unable to write into its persistent store. It crashes with the error message: ``` time="«timestamp»" level=error msg="Unable to create bolt clientUnable to open boltdb; is there a chronograf already running? open /var/lib/chronograf/chronograf-v1.db: permission denied" ``` 4. Depending on how the container was launched, it then either halts or goes into a restart loop (eg if `restart: unless-stopped`). Currently, there are two solutions to this: 1. The user passes the `CHRONOGRAF_AS_ROOT` environment variable with the value `true`; or 2. The user manually adjusts ownership on the persistent store: ``` $ sudo chown -R 999:999 ./chronograf ``` Option 1 defeats the purpose of running with reduced privileges. Option 2 isn't documented so it is an example of "hidden knowledge". The user has to: * recognise that the service is not running (which is not always immediately obvious to inexperienced users); * know to consult `docker logs -f chronograf` (the `-f` being particularly important if the container is in a restart loop); * be able to interpret the error message correctly (ie that "permission denied" is the critical element); * realise that changing ownership on the persistent store is the correct response; and * know to use userID 999 in the `chown`. It would be preferable if the container handled these situations correctly for itself, which is the main goal of this Pull Request. This problem does not occur if a *named volume mount* is used rather than a *bind mount*. That is because of the "copy" step whereby Docker recursively copies the internal path to the external path before the Unix-bind-mount association is formed. The last path component of the volume mount (ie the `_data` folder) is then owned by userID 999. Even if `CHRONOGRAF_AS_ROOT` is `true`, root can still write into that folder. If the container is launched *without* an explicit volume mapping, a new *anonymous volume mount* is created each time the container is recreated, but otherwise behaves the same as a *named volume mount*. This is a side-effect of the Dockerfile declaration: ``` VOLUME /var/lib/chronograf ``` > Removing the `VOLUME` statement would avoid this side-effect. In that case, `/var/lib/chronograf` would only exist inside the container while it was running and would not persist. Neither would there be a steady accumulation of unused anonymous volume mounts. Although the default for Docker is to launch the container as root, it is also possible to use either the `-u` option (`docker run`) or `user:` clause (`docker compose`) to have Docker launch the container as some other user. In this situation, with the exception of userID 999, the container will lack the privileges to write to `/var/lib/chronograf` so it will abort with the permission error mentioned above, and the user will also have to know which userID to employ to set up the persistent store. This Pull Request tries to deal with that possibility by writing a hint into the log. For example, if the container is launched as userID 1000 but doesn't have write permission for `/var/lib/chronograf`, the user would see: ``` You need to change ownership on chronograf's persistent store. Run: sudo chown -R 1000:1000 /path/to/persistent/store ``` Signed-off-by: Phill Kelley <[email protected]>

[PR 781](influxdata/influxdata-docker#781) was submitted on 2025-01-21 but is has now been over 40 days without any response. It isn't clear whether it is simply taking the time it needs to take, or if this is a signal that it will never be processed. The basic problem occurs with Docker "bind mounts" which are the convention for IOTstack containers. If Chronograf launches from a clean slate, Docker will create `./volumes/chronograf` with root ownership. Although the container *launches* as root, it does not take the opportunity to enforce its ownership conventions prior to downgrading its privileges to that of (internal) user `chronograf` (ID=999). The result is the container can't write to its persistent store, crashes and goes into a restart loop. This PR provides an augmented entry point script which sets ownership correctly prior to launching the `chronograf` process. This PR applies the patch for IOTstack users via a local Dockerfile. It can be unwound if/when PR781 is processed. Signed-off-by: Phill Kelley <[email protected]>

Paraphraser mentioned this pull request Mar 5, 2025

2025-03-05 Chronograf - master branch - PR 1 of 2 SensorsIot/IOTstack#787

Merged

Paraphraser mentioned this pull request Mar 5, 2025

2025-03-05 Chronograf - old-menu branch - PR 2 of 2 SensorsIot/IOTstack#788

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chronograf can crash when using Docker bind mounts #781

chronograf can crash when using Docker bind mounts #781

Uh oh!

Paraphraser commented Jan 21, 2025 •

edited

Loading

Uh oh!

Uh oh!

chronograf can crash when using Docker bind mounts #781

Are you sure you want to change the base?

chronograf can crash when using Docker bind mounts #781

Uh oh!

Conversation

Paraphraser commented Jan 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Paraphraser commented Jan 21, 2025 •

edited

Loading