Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The Plan, phase 1 #1341

Open
wants to merge 16 commits into
base: master
Choose a base branch
from
Open

The Plan, phase 1 #1341

wants to merge 16 commits into from

Conversation

emilazy
Copy link
Collaborator

@emilazy emilazy commented Feb 16, 2025

It’s finally time.

This PR eliminates user activation, addressing #96 by following the first phase of the plan from #1016 (comment), fulfilling a goal I started drafting patches for in 2023 before I was even a nix-darwin maintainer.

I strongly recommend reviewing this one commit‐by‐commit. See the commit messages for much more detail and for things I’m still uncertain about. I’ve been running this on my own system for a while now, but it could definitely use a staged roll‐out before we actually hit the merge button. We should also probably also reach out to authors of third‐party modules that will be affected once it’s mostly baked, though at least anything that breaks should mostly break loudly. (Adding text to our existing activationScripts.{checks,userDefaults,userLaunchd,homebrew}.text definitions is an exception – those will run as root now. But I can’t think of a particularly graceful way to handle that case, and anyway it’s not exactly something we fully support in the first place.)

Closes: #96


{
options = {
system.primaryUser = lib.mkOption {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have an option to disable primary user, so that dogfooding finds problems with not having a primary user sooner?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized the default is null, so never mind.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, leaving it at the default will ensure your system doesn’t depend on any primary user and give a hopefully‐helpful error message if you set anything that requires it.

Copy link

@spikespaz spikespaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in practice, it may be useful for us to
nest this into users.users.*.homebrew instead, at the expense of
being an unsupported setup if used to its full potential. Since
that would be a breaking change to the inteface anyway, I think
adding homebrew.user for now is acceptable.

I understand the motive here, but are you sure it's not better to make more breaking changes in one go? I'm worried that going through another great migration would agitate some.

@emilazy
Copy link
Collaborator Author

emilazy commented Feb 18, 2025

I previously thought that too – that we should figure out the correct interfaces and write good migration documentation and tools and get all the breaking changes over with in one go – but the end result was that this work stalled out for years because there’s a lot of API design and implementation work to get things to a good state: we need to decide which things should be global or per‐user; we need to decide what should stay in nix-darwin and what should be deduplicated with Home Manager; for things like our system.defaults options, they’re currently not fully declarative in the way that Nix‐managed options should be, so if we’re breaking everyone’s configuration then we ought to make that important breaking change at the same time; and so on.

If we tried to do it all at once, this work would have spent even longer in development hell and be unreviewably big. For the longest time i wasn’t sure how to resolve that, but after discussing on Matrix I realized that introducing a system.primaryUser option, though strange from an interface design perspective, would allow us to minimize the breaking changes in the first phase: the only required change for the vast majority of users should be having to set system.primaryUser = "username"; in their configuration, which I think is an acceptable level of fuss for the improvement in internal architecture and coherent model it gives us, and allows people who want a truly user‐agnostic system manager to opt for that world now at the expense of reduced functionality. That takes the pressure off any major interface renovations and lets us work on them independently and batch them up as we see fit.

In the case of Homebrew, homebrew.user defaults to the primary user, so it’s purely a bonus for people who want to use Homebrew without setting system.primaryUser (or to use a different user entirely for Homebrew, I suppose).

@hraban
Copy link
Contributor

hraban commented Mar 3, 2025

Most of the system.defaults stuff should really move to home-manager anyway, as you said. It's a natural fit and it works well.

@mirkolenz
Copy link

I've been running this branch on my machine for the past week and had zero issues with it, thanks for these great improvements!

@khaneliman
Copy link
Contributor

Been running this for a few days since I got back from vacation. Noticing issues with nix store on reboots, haven't pinned down what's going on atm. Not sure if I need to do something different, I just tested copying the branch and rebasing it on master. Running my normal rebuild mentions the sudo requirement, doing a sudo rebuild mentions I need a primary user, adding a primary user and then it rebuilds.

Just noting symptoms:

  • Dock icons missing that point to nix store
  • Firefox error about profile missing on boot (assuming same issue with dock about files being available on login)
  • launchd services in a retry backoff until i unload them and load them manually
2025-03-08 14:57:39.712350 <Warning> Could not find and/or execute program specified by service: 2: No such file or directory: /nix/store/fa3jr2wxgbiyk37h0imzhhj13c3p6gqv-yabai-7.1.10/bin/yabai
2025-03-08 14:57:39.712351 <Error> Service could not initialize: access(/nix/store/fa3jr2wxgbiyk37h0imzhhj13c3p6gqv-yabai-7.1.10/bin/yabai, X_OK) failed with errno 2 - No such file or directory, error 0x6f - Invalid or missing Program/ProgramArguments
2025-03-08 14:57:39.712352 <Error> initialization failure: 24D70: xpcproxy + 36768 [1088][E4C3A3C3-0D57-3E4D-8388-880BC7F5F19E]: 0x6f
2025-03-08 14:57:39.712353 <Notice> Service setup event to handle failure and will not launch until it fires.
2025-03-08 14:57:39.712354 <Error> Missing executable detected. Job: 'org.nixos.yabai' Executable: '/nix/store/fa3jr2wxgbiyk37h0imzhhj13c3p6gqv-yabai-7.1.10/bin/yabai'

@emilazy
Copy link
Collaborator Author

emilazy commented Mar 10, 2025

That’s very confusing. I’m not sure how that could result from these changes. Are you sure you get the same result if you run the base commit that you rebased on top of? Were there any conflicts during the rebase that you had to resolve?

@khaneliman
Copy link
Contributor

khaneliman commented Mar 10, 2025

That’s very confusing. I’m not sure how that could result from these changes. Are you sure you get the same result if you run the base commit that you rebased on top of? Were there any conflicts during the rebase that you had to resolve?

Yeah, sorry... not a lot of information, I know... Probably can just disregard it for now... I'll try bisecting and see where it might have come from. Even switching back to master I get the behavior still.

@WeetHet
Copy link

WeetHet commented Mar 11, 2025

2025-03-08 14:57:39.712350 <Warning> Could not find and/or execute program specified by service: 2: No such file or directory: /nix/store/fa3jr2wxgbiyk37h0imzhhj13c3p6gqv-yabai-7.1.10/bin/yabai
2025-03-08 14:57:39.712351 <Error> Service could not initialize: access(/nix/store/fa3jr2wxgbiyk37h0imzhhj13c3p6gqv-yabai-7.1.10/bin/yabai, X_OK) failed with errno 2 - No such file or directory, error 0x6f - Invalid or missing Program/ProgramArguments
2025-03-08 14:57:39.712352 <Error> initialization failure: 24D70: xpcproxy + 36768 [1088][E4C3A3C3-0D57-3E4D-8388-880BC7F5F19E]: 0x6f
2025-03-08 14:57:39.712353 <Notice> Service setup event to handle failure and will not launch until it fires.
2025-03-08 14:57:39.712354 <Error> Missing executable detected. Job: 'org.nixos.yabai' Executable: '/nix/store/fa3jr2wxgbiyk37h0imzhhj13c3p6gqv-yabai-7.1.10/bin/yabai'

This is the exact reason nix-darwin uses /bin/wait4path. You can replace the binary inside of the service with a /bin/sh that calls wait4path with the yabai executable or try copying it outside of the store

@khaneliman
Copy link
Contributor

2025-03-08 14:57:39.712350 <Warning> Could not find and/or execute program specified by service: 2: No such file or directory: /nix/store/fa3jr2wxgbiyk37h0imzhhj13c3p6gqv-yabai-7.1.10/bin/yabai
2025-03-08 14:57:39.712351 <Error> Service could not initialize: access(/nix/store/fa3jr2wxgbiyk37h0imzhhj13c3p6gqv-yabai-7.1.10/bin/yabai, X_OK) failed with errno 2 - No such file or directory, error 0x6f - Invalid or missing Program/ProgramArguments
2025-03-08 14:57:39.712352 <Error> initialization failure: 24D70: xpcproxy + 36768 [1088][E4C3A3C3-0D57-3E4D-8388-880BC7F5F19E]: 0x6f
2025-03-08 14:57:39.712353 <Notice> Service setup event to handle failure and will not launch until it fires.
2025-03-08 14:57:39.712354 <Error> Missing executable detected. Job: 'org.nixos.yabai' Executable: '/nix/store/fa3jr2wxgbiyk37h0imzhhj13c3p6gqv-yabai-7.1.10/bin/yabai'

This is the exact reason nix-darwin uses /bin/wait4path. You can replace the binary inside of the service with a /bin/sh that calls wait4path with the yabai executable or try copying it outside of the store

This is using the nix-darwin service. I do see that it doesn't contain wait4path like it used to. Thanks for reminding me about that. I'll see why it's missing now.

Copy link
Contributor

@Samasaur1 Samasaur1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's finally happening! This looks really good in general — left a couple notes and questions but I think many of these changes may just have to be breaking.

@@ -15,6 +15,8 @@ in
{
homebrew.enable = true;

homebrew.user = "test-homebrew-user";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be misunderstanding how the tests work, but doesn't this need the user test-homebrew-user to exist?

Although I suspect that a later commit will address this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test systems don’t actually get activated; we just build the system and check the resulting package. (Sadly.) So this is fine.

@@ -1,6 +1,8 @@
{ config, pkgs, lib, ... }:

{
system.primaryUser = "test-defaults-user";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same concern about the user needing to exist

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Same answer.)

# Set up user launchd services in ~/Library/LaunchAgents
echo "setting up user launchd services..."

${concatStringsSep "\n" (launchdVariables config.launchd.user.envVariables)}
${concatStringsSep "\n" (launchdVariables "sudo -u ${user} --" config.launchd.user.envVariables)}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using sudo -u ${user} does feel like one of those weird corner-cases where launchd will just fail to do what you expect it to — I like the idea of using launchctl asuser, especially considering it hasn't been replaced with anything else (other than using a domain target for the given user), but I do notice the man page says

This executes the given command in as similar an execution context as possible to that of the target user's bootstrap. Adopted attributes include the Mach bootstrap namespace, exception server and security audit session. It does not modify the process' credentials (UID, GID, etc.) or adopt any user-specific environment variables. It affects only the Mach bootstrap context and directly- related attributes.

(emphasis mine). This might mean that the way to do it is

sudo -u ${user} -- launchctl asuser $(id -u ${user})

which is ugly but seems like it carries over more of the appropriate context.

That said, if this is working as-is in multi-user setups I'm also inclined to leave it as is until we swap to the new launchctl commands. My concern is mostly around activation for users who aren't graphically logged-in. This is already sometimes a pain point for me when I darwin-rebuild over SSH, and so I imagine that we may run into similar issues here if user A (logged in graphically) runs darwin-rebuild and the launchd activation doesn't work properly for user B (maybe not logged in, maybe logged in in a separate graphical session, etc.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I haven’t tested this extensively; if someone can confirm that it seems to work as‐is then I’d be inclined to just leave it until we start using the recommended commands rather than adding more deprecated ones. I’m certainly curious how this goes with actiavtion over SSH.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mirkolenz @khaneliman and anyone else who's running this branch: any chance you manage any multi-user nix-darwin systems? Have you noticed any strangeness with launchd activation, especially if you've happened to run activations as one user while a different user is actively signed in?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or activation over SSH

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also occurring to me that even if this doesn't work perfectly, it should continue to work at least as well as the situation before this PR. I'd say that if anyone reports an issue with this, we can try and track it down, but otherwise we can just leave it as you have it here

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All systems I tested this on are single-user only, sorry...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in, because people will usually be doing sudo darwin-rebuild … as their primary user, and so the sudo sudo --user=primary-user will cancel out the UID stuff and the Mach bootstrap context will be inherited from their graphical session? I suppose so. But ideally this should work even if you set a different primary user, too, so it’d be good to know one way or another. I just haven’t had the time to set up a VM to test a separate admin user, or SSH activation, or whatever. (I tested all this on my development machine because I love danger.)

@@ -9,6 +9,10 @@ let
"defaults write ${domain} '${key}' $'${strings.escape [ "'" ] (generators.toPlist { } value)}'";

defaultsToList = domain: attrs: mapAttrsToList (writeDefault domain) (filterAttrs (n: v: v != null) attrs);
userDefaultsToList = domain: attrs: map
(cmd: "sudo --user=${escapeShellArg config.system.primaryUser} -- ${cmd}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason you use sudo -u ${user} for launchctl but sudo --user=${user} here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason at all. I’ve moved to --user= across the board.

config.nix.enable
&& !config.nixpkgs.flake.setNixPath
&& config.system.stateVersion < 6
&& options.environment.darwinConfig.highestPrio == (mkOptionDefault {}).priority
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hate that I can't think of a better way to check that this option wasn't changed from the default

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only way I've done it, too.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it’s awful and I hate it. But I hate not giving good error messages when doing complicated migrations even more.

Comment on lines +25 to +27
default =
config.users.users.${config.system.primaryUser}.home or "/Users/${config.system.primaryUser}";
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this break anything if config.system.primaryUser is set to null (the default)? If I'm reading it correctly, it's only an issue if primaryUserHome is read by something, in which case primaryUser needs to be set anyway, but I wonder what the error message would look like. Maybe it's caught by nix.nixPath requiring primaryUser to be set? Not sure that works for environment.darwinConfig though

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’ll be an evaluation error if anything depends on it. But, yes, that stuff should set system.requiresPrimaryUser (as environment.darwinConfig does) and therefore hopefully give a helpful error message before anything that uses it would get evaluated. "${null}" is also an error so I’m not sure allowing it to be null would improve anything. Maybe this could be a more descriptive throw in that case? But I haven’t triggered it so far in testing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe

lib.warnIf (config.system.primaryUser == null) "`system.primaryUser` is unset but `system.primaryUserHome` is being read" config.users.users.${config.system.primaryUser}.home or "/Users/${config.system.primaryUser}";

?

On the other hand, it seems like this problem will only be hit due to a future mistake on our part (reading primaryUserHome in an option without adding that option to requiresPrimaryUser), so we could just leave this as is.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making it a throwIf might be a good idea, and seems pretty harmless. I’ll think about it. The main thing is that I wish we could make incorrect uses of config.system.primaryUser fail loudly in the same way, but that’s harder.

system.activationScripts.postActivation.text = mkAfter ''
nix-channel --remove darwin || true

${optionalString (config.system.primaryUser != null) ''
sudo \
--user=${config.system.primaryUser} \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not shell escaping the primary user

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly anyone who tries to set a username that requires shell escaping has it coming. But, yeah, fixed.

@@ -65,7 +65,7 @@ in
ln -sfn /run/current-system /nix/var/nix/gcroots/current-system
fi

${config.system.activationScripts.etcChecks.text}
${config.system.activationScripts.checks.text}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The one thing I am concerned about here is if checks has stuff that needs to be run in a graphical context. Based on a quick search through our in-tree checks it seems like this change would cause errors for the activation daemon from modules/users/default.nix in some cases, it might cause errors for our other checks, and might for out-of-tree checks if people were assuming that checks would only run in a graphical session.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The activation daemon only activates system configurations that were previously current. So in theory it should never be trying to delete users and so on. What other checks do you think might be problematic here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good point. I don't have any problematic checks I can think of, and they should be read-only anyway, so I'm good with this.

${cfg.activationScripts.checks.text}
${cfg.activationScripts.createRun.text}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also a potential point where out-of-tree checks could break, if people assume that /run will already exist. I think this is still a worthwhile change to make, but worth pointing out.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could put it before checks if this is considered a big problem; I just figured it’s best to change the system as little as possible if any checks are failing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is definitely an improvement that we should make, just noting the potential issue here in case we see it in the future.

emilazy added 8 commits March 12, 2025 00:43
This adds an optional explicit `homebrew.user` option that allows users
to avoid setting `system.primaryUser`, partly as a proof of concept
of what the interfaces should look like in the future. Homebrew only
officially support one global installation, so a singleton matches
upstream’s expectations; in practice, it may be useful for us to
nest this into `users.users.*.homebrew` instead, at the expense of
being an unsupported setup if used to its full potential. Since
that would be a breaking change to the inteface anyway, I think
adding `homebrew.user` for now is acceptable. (I think one native
Apple Silicon and one Rosetta 2 Homebrew installation – under
`/opt/homebrew` and `/usr/local` respectively – may be exceptions
to this lack of upstream support, but that would be complicated to
support even with `users.users.*.homebrew`.)

I’m not entirely sure where in system activation this should
go. Probably after the user defaults and launch agents stuff, to match
the existing logic in user activation, and I lean towards doing it
as late as possible; too early and we might not have the users and
groups required to bootstrap a Homebrew installation set up, but
as Homebrew installations could be fiddly and fail, doing it in the
middle could leave a partially‐activated system.

Probably it should be done in a launch agent or something instead, but
this is my best guess as to the appropriate place for now. The downside
is that activation scripts generally won’t be able to assume that the
Homebrew prefix is populated according to the current configuration,
but they probably shouldn’t be depending on that anyway?
I’m not *completely* certain that this handles user agents
correctly. There is a deprecated command, `launchctl asuser`, that
executes a command in the Mach bootstrap context of another user`.
<https://scriptingosx.com/2020/08/running-a-command-as-another-user/>
claims that this is required when loading and unloading user agents,
but I haven’t tested this. Our current launchd agent logic is pretty
weird and broken already anyway, so unless this actively regresses
things I’d lean towards keeping it like this until we can move
over entirely to `launchctl bootstrap`/`launchctl kickstart`, which
aren’t deprecated and can address individual users directly. Someone
should definitely test it more extensively than I have, though.
System activation scripts shouldn’t (and soon won’t be able to)
rely on `$HOME` being the primary user’s.
These can’t be relied upon in a post‐user‐activation
world. Technically a breaking change, if anyone has their home
directory outside of `/Users` or is using `root` for this, but, well,
I did my best and these are legacy defaults anyway.
emilazy added 8 commits March 12, 2025 00:46
(With some tweaks to handle `nix.enable` and order it at a more
sensible position in the `$PATH`.)

The installers actually install Nix into `root`’s profile for some
reason, which means that the path’s prioritization backfires when
the script runs as root and we’re managing the Nix installation. When
running `darwin-rebuild` as a normal user, this wasn’t a problem.

Maybe we should just have a check to make sure there’s no conflicting
Nix in `root`’s profile – it seems pretty bad for `root` to
get the wrong Nix – but it would trigger for almost everyone,
which seems kind of annoying. I guess we could automatically
remove it from `root`’s profile if it matches what’s in
`/nix/var/nix/profiles/default`…

This reverts commit 02232f7.
The `activate-system` daemon will now run all the checks, which seems
like probably a good idea anyway?
The checks should no longer depend on `/run`, so this avoids modifying
the system before they run.
@emilazy emilazy force-pushed the push-vuqkuqtwsoox branch from a387b4b to 0d9daef Compare March 12, 2025 00:48
Copy link
Contributor

@Samasaur1 Samasaur1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that covers everything I saw! Another pair of eyes certainly wouldn't hurt but this LGTM

@retrry
Copy link

retrry commented Mar 17, 2025

Hey, I've just not long ago moved to flakes and don't have a lot of experience here. Could someone give me rough instructions how I would test it with flake setup? I would like to help 🙂 based on what I am seeing in the PR already started migrating away from services and config options which require primaryUser.

@mirkolenz
Copy link

You can test it by adding the following snippet to your flake:

inputs.nix-darwin = {
  url = "github:lnl7/nix-darwin/pull/1341/merge";
  inputs.nixpkgs.follows = "nixpkgs"; # if you want to lock it to the same rev as your main nixpkgs instance
};

@retrry
Copy link

retrry commented Mar 17, 2025

So it is a bit strange for me with launch agents 🙂 I've had couple launch agents defined via launchd.user.agents. Those options require primaryUser set (these are only two options, which I've had left, which require primaryUser). I've tried two things:

  1. Just remove those options and not set primaryUser. The problem is that those launchAgents do not get removed.
  2. Set primaryUser, rebuild, remove launchd options, rebuild - those launch agents are still not removed.
  3. Reverted to master config, rebuild, remove those options, rebuild - now they get removed.

@Samasaur1
Copy link
Contributor

3 sounds like expected behavior. 1, unfortunately, also sounds like expected behavior, since the code to handle removing those LaunchAgents has itself been removed. 2 is surprising to me though, unless the LaunchAgent plists are put at a different path?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tracking issue for switching to system-wide activation
8 participants