Skip to content

Conversation

mwilck
Copy link
Contributor

@mwilck mwilck commented Sep 25, 2025

PR on top of #189. Relaunch of #190 after @XiaoNi87 indicated to me that my changes would be acceptable to him.

This PR changes the logic of the "mdcheck" tool and the related systemd services mdcheck_start.service and mdcheck_continue.service.

The current behavior is like this:

  • mdcheck without arguments starts a RAID check on all arrays on the system, starting at position 0. This is started from mdcheck_start.service, started by a systemd timer once a month.
  • mdcheck --continue looks for files /var/lib/mdcheck/MD_UUID_$UUID, reads the start position from them, and starts a check from that position on the array with the respective UUID. This is started from a systemd timer every night.

In either case, mdcheck won't do anything if the kernel is already running a sync_action on a given array. The check runs for a given period of time (default 6h) and saves the last position in the MD_UUID file, to be taken up when mdcheck --continue is called next time. When the entire array has been checked, the MD_UUID_ file is deleted. When all checks are finished, mdcheck_continue.timer is stopped, to be restarted when mdcheck_start.timer expires next time.

Before the recent commit 8aa4ea9 ("systemd: start mdcheck_continue.timer before mdcheck_start.timer"), this could lead to a race condition when the check for a given array didn't complete throughout the month. mdcheck_start.service would start and reset the check position to 0 before mdcheck_continue.service could pick up at the last saved position. 8aa4ea9 works around this by starting mdcheck_continue.service a few minutes before mdcheck_start.timer.

Yet the general problem still exists: both services trigger checks on the kernel's part which they can only passively monitor. So if a user plays with the timer settings (which he is in his rights to do), another similar race might happen.

This patch set changes the behavior as follows:

Only mdcheck_continue.service actually starts and stops kernel-based sync actions. This service will continue at the saved start position if an MD_UUID* file exists, or start a new check at position 0 otherwise. Starting at 0 can be inhibited by creating a file /var/lib/mdcheck/Checked_$UUID. These files will be created by mdcheck when it finishes checking a given array. Thus future invocations of mdcheck_continue.service will not restart the check on this array.

mdcheck_start.service runs mdcheck --restart, which simply removes all Checked_* markers from /var/lib/mdcheck, so that the next invocation of mdcheck_continue.service will start new checks on all arrays which don't have already running checks.

The general behavior of the systemd timers and services is like before, but the mentioned race condition is avoided, even if the user modifies the timer settings arbitrarily.

Unlike #190, this PR preserves the behavior of the mdcheck script when called without arguments. mdcheck historically had just two modes of operation: default (no arguments) and --continue. This set introduces new modes --restart (was in #190 already), --maybe-start (the behavior of mdcheck without args in #190), and --force-start (a complete restart of checks on all arrays from 0, like traditional mdcheck). For backward compatibility reasons, --force-start becomes the default behavior.

More details in the commit descriptions.

Differences to #190:

  • changed ordering; the large and possibly controversial patches are on top
  • small fixes to the logging patch (5168022)
  • improved commit message of "simplify start / continue logic" patch
  • added patch to restore backward-compatible behavior of mdcheck

The way mdcheck is currently written, it loops over /dev/md?*, which
will contain RAID partitions and arrays that don't support sync
operations, such as RAID0. This is inefficient and makes the script
difficult to trace.

Instead, loop over the sync_action files which actually matter for
checking.

Signed-off-by: Martin Wilck <[email protected]>
This syntax used to be marked as deprecated [1]. In current bash
man pages, it isn't even mentioned any more. Use the POSIX compatible
syntax "$((X+=1))" instead [2, 3].

[1] https://stackoverflow.com/questions/41081417/difference-between-a-b-and-a-b-in-bash
[2] https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_04
[3] https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap01.html#tag_17_01_02_01

Signed-off-by: Martin Wilck <[email protected]>
If the script is run from systemd, simply writing to stderr will
create more concise output. Otherwise, use logger like before.

Use the same logic for other messages printed by the script during
runtime, except for error messages related to the invocation.

Signed-off-by: Martin Wilck <[email protected]>
The current logic of "mdcheck" is susceptible to races when multiple
mdcheck instances run simultaneously, as checks can be initiated from both
"mdcheck_start.service" and "mdcheck_continue.service".

The previous commit 8aa4ea9 ("systemd: start mdcheck_continue.timer before
mdcheck_start.timer") fixed this for the default configuration by inverting
the ordering of timers. But users can customize the timer settings, which
can cause the race described in the commit message of 8aa4ea9 to reappear.

This patch avoids this kind of race entirely, by changing the logic as follows:

* When `mdcheck` has finished checking a RAID array, it will create a
  marker `/var/lib/mdcheckChecked_$UUID`.
* A new option `--restart` is introduced. `mdcheck --restart` removes all
  `/var/lib/mdcheck/Checked_*` markers.
  This is called from `mdcheck_start.service`, which is typically started
  by a timer in large time intervals (default once per month).
* `mdcheck --continue` works as it used to. It continues previously started
  checks (where the `/var/lib/mdcheck/MD_UUID_$UUID` file is present and
  contains a start position) for the check.
  This usage is *not recommended any more*.
* `mdcheck` with no arguments is like `--continue`, but it also starts new
  checks for all arrays for which no check has previously been
  started, *except* for arrays for which a marker
  `/var/lib/mdcheck/Checked_$UUID` exists.
  `mdcheck_continue.service` calls `mdcheck` this way. It is called in
  short time intervals, by default once per day.
* Combining `--restart` and `--continue` is an error.

This way, the only systemd service that actually triggers a kernel-level
RAID check is `mdcheck_continue.service`, which avoids races.

When all checks have finished, `mdcheck_continue.service` is no-op.
When `mdcheck_start.service` runs, the checks re re-enabled and will be
started from 0 by the next `mdcheck_continue.service` invocation.

Signed-off-by: Martin Wilck <[email protected]>
The previous patch changed the behavior of mdcheck when invoked
without --continue or --restart. Previously it would have started
a new check from zero on all arrays. After the previous patch,
it acted like --continue on arrays where a check had been started already,
and started a new check only for arrays where no Checked_* marker
was present.

Introduce a new run mode --maybe-start for this behavior, and another
mode --force-start for the original behavior. --maybe-start is the
mode which will be called by mdcheck_continue.service now.

For backward compatibilty reasons, --force-start is the default when no
mode argument (--continue, --maybe-start, --restart, or --force-start) is
specified. Because this is may interrupt checks that have already made
progress, the tool will print a warning and pause for a short time
when it's invoked in this manner.

Signed-off-by: Martin Wilck <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant