(7/N) Use nexus_generation, update it #8936

smklein · 2025-08-28T00:50:07Z

Actually update nexus generation within the top-level blueprint and Nexus zones
Deploy new and old nexus zones concurrently

Blueprint Preparation & System Description

Queries and returns the set of "active" and "not yet" Nexuses. This information is consumed by the planner.
Tracks "active" and "not yet" zones in SystemDescription for tests

Blueprint Planner

Automatically determine nexus generation when provisioning new Nexus zones, based on existing deployed zones
Update the logic for provisioning nexus zones, to deploy old and new nexus images side-by-side
Update the logic for expunging nexus zones, to only do so when running from a "newer" nexus
Add a planning stage to bump the top-level "nexus generation", if appropriate, which would trigger the old Nexuses to quiesce.

Blueprint Execution

Update the stage of the blueprint which creates db_metadata_nexus records. Previously, this created active records for all in-service Nexuses. Now, it creates active and not_yet records, depending on the value of the nexus_generation set in the zone record compared to the top-level nexus_generation

Blippy

Adds a check verifying that Nexus zones with the same generation all use the same image

Fixes #8843, #8854

smklein · 2025-08-28T00:52:16Z

dev-tools/reconfigurator-cli/tests/output/cmds-example-stdout

-* only placed 0/3 desired internal_dns zones
-* only placed 0/3 desired nexus zones
-
+error: generating blueprint: could not find active nexus zone in parent blueprint


TODO(me) insert this into input, so we don't have as much churn

Okay, found the problem here. The reconfigurator-cli explicitly creates an "ExampleSystem" with "no_zones" or "no_disks" enabled, then tries to create a blueprint.

In a model where we need an old Nexus zone to make a new Nexus zone, this breaks.

I'm updating the planner logic to allow this case. With that, the reconfigurator-cli diff is significantly smaller.

smklein · 2025-08-29T21:52:06Z

nexus/reconfigurator/planning/src/planner.rs

+            }
+        }
+
+        // Confirm that we have new nexuses at the desired generation number


I think we also need to confirm that the new db_metadata_nexus records have hit the DB - otherwise, the "old Nexuses" could quiesce without giving the "new Nexuses" enough context to do a handoff.

This is done.

Adds schema for nexus generations, leaves the value at "1". These schemas will be used more earnestly in #8936 Fixes #8853

davepacheco

Thanks for this. Sorry for all my questions. The details here seem very tricky.

davepacheco · 2025-09-05T16:48:01Z

nexus/db-queries/src/db/datastore/db_metadata.rs

+        let mut active = vec![];
+        let mut not_yet = vec![];
+        for (_, zone) in
+            blueprint.all_omicron_zones(BlueprintZoneDisposition::is_in_service)


nitty but it feels to me like this logic belongs in the caller. Maybe they could pass in the list of Nexus instances that should be "active" vs. "not_yet"?

Doing this moves all the complexity of parsing the blueprint into deploy_db_metadata_nexus_records, in the reconfigurator executor, so I've moved quite a few tests there too.

Done in 70d4ab9

davepacheco · 2025-09-05T16:54:54Z

nexus/db-queries/src/db/datastore/db_metadata.rs

+        for (_, zone) in
+            blueprint.all_omicron_zones(BlueprintZoneDisposition::is_in_service)
+        {
+            if let BlueprintZoneType::Nexus(ref nexus) = zone.zone_type {


I don't think this logic for determining whether each Nexus zone is active or not yet is quite right. Suppose:

the planner decides it's time to hand off and bumps nexus_generation

"old" Nexus goes to execute this blueprint -- it can continue executing blueprints for a while in this state (if there are sagas still running)

but this will cause us to write "new" Nexus instances with state active instead of not_yet

Right?

Great catch. I fixed this in 9514deb, and added a test specifically for this "add-Nexus-after-quiesce-started" behavior.

davepacheco · 2025-09-05T17:18:46Z

nexus/reconfigurator/preparation/src/lib.rs

+        let active_nexus_zones = datastore
+            .get_active_db_metadata_nexus(opctx)
+            .await
+            .internal_context("fetching active nexuses")?
+            .into_iter()
+            .map(|z| z.nexus_id())
+            .collect::<Vec<_>>();
+        let not_yet_nexus_zones = datastore
+            .get_not_yet_db_metadata_nexus(opctx)
+            .await
+            .internal_context("fetching 'not yet' nexuses")?
+            .into_iter()
+            .map(|z| z.nexus_id())
+            .collect::<Vec<_>>();


Feels like we could combine these into one database query? Not a big deal.

Done in 690ea16

davepacheco · 2025-09-05T17:20:14Z

nexus/types/src/deployment/planning_input.rs

+    /// This is used to identify which Nexus is currently executing the planning
+    /// operation, which is needed for safe shutdown decisions during handoff.


Suggested change

/// This is used to identify which Nexus is currently executing the planning

/// operation, which is needed for safe shutdown decisions during handoff.

/// This is used to determine which Nexus instances are currently in control, which is needed for safe shutdown decisions during handoff.

Done in f15fa4b

davepacheco · 2025-09-05T17:21:49Z

nexus/types/src/deployment/planning_input.rs

        }
    }

+    pub fn add_active_nexuses(


Suggested change

pub fn add_active_nexuses(

pub fn set_active_nexuses(

("add" to me would imply that we're appending this set, but we're not)

Done in f15fa4b

davepacheco · 2025-09-05T19:31:37Z

nexus/reconfigurator/planning/src/planner.rs

    fn is_zone_ready_for_update(
        &self,
-        zone_kind: ZoneKind,
        mgs_updates: &PlanningMgsUpdatesStepReport,
    ) -> Result<bool, TufRepoContentsError> {


I think this could just return bool now, except I'm worried you do still need some logic here to avoid deploying new Nexus zones before the rest of the system has been updated. I'm not sure that's necessary? But it seems safer and I assumed it was what we'd keep doing.

davepacheco · 2025-09-05T19:34:26Z

nexus/reconfigurator/planning/src/planner.rs

+        Ok(true)
+    }
+
+    fn lookup_current_nexus_image(&self) -> Option<BlueprintZoneImageSource> {


In practice, when would this function ever return None?

davepacheco · 2025-09-05T19:34:32Z

nexus/reconfigurator/planning/src/planner.rs

-                                != new_repo.zone_image_source(kind)?
-                        {
-                            return Ok(false);
+    fn lookup_current_nexus_generation(&self) -> Option<Generation> {


In practice, when would this function ever return None?

davepacheco · 2025-09-05T19:39:55Z

nexus/reconfigurator/planning/src/planner.rs

+                else {
+                    // If we don't know the current Nexus zone ID, or its
+                    // generation, we can't perform the handoff safety check.
+                    report.unsafe_zone(
+                        zone,
+                        Nexus {
+                            zone_generation: zone_nexus_generation,
+                            current_nexus_generation: None,
+                        },
+                    );
+                    return false;
+                };


I feel like this case should be impossible now. That would imply there was literally no Nexus zone in the blueprint at the current generation?

davepacheco · 2025-09-05T19:48:13Z

nexus/reconfigurator/planning/src/planner.rs

+                // We need to prevent old Nexus zones from shutting themselves
+                // down. In other words: it's only safe to shut down if handoff
+                // has occurred.
+                //
+                // That only happens when the current generation of Nexus (the
+                // one running right now) is greater than the zone we're
+                // considering expunging.
+                if current_gen <= zone_nexus_generation {
+                    report.unsafe_zone(
+                        zone,
+                        Nexus {
+                            zone_generation: zone_nexus_generation,
+                            current_nexus_generation: Some(current_gen),
+                        },
+                    );
+                    return false;
+                }


Is all of the logic thus far asking: is this one of the "active" Nexus zones? (Could we just check the planning input?)

Following up on this thread from the older PR: #8863 (comment)
I don't follow why we need to check this. It actually seems wrong. It means we can never shut down any Nexus zone except if it's post-handoff. But there's nothing unsafe about shutting down a single Nexus zone, right? And at some point we're going to want SP updates to use this function to check whether it's safe to shut down all the zones on the host whose SP is being bounced (#8482). Won't this check then prevent us from doing any SP updates on a host that's hosting Nexus?

The only place this is called is from do_plan_zone_updates(), but I feel like maybe that function just needs to ignore Nexus zones altogether since they're updated specially.

This change helps for zones like Nexus, which may have multiple deployments using distinct images (see: #8936)

smklein mentioned this pull request Aug 28, 2025

(6/N) Add image to planning report for zones #8935

Merged

smklein commented Aug 28, 2025

View reviewed changes

smklein force-pushed the image_reporting branch from 9b1f7e3 to 86cee63 Compare August 28, 2025 19:22

smklein force-pushed the nexus_gen_usage branch 2 times, most recently from d1bd3fb to bf8f274 Compare August 28, 2025 22:37

smklein force-pushed the image_reporting branch from 86cee63 to 91a884e Compare August 28, 2025 22:53

smklein force-pushed the nexus_gen_usage branch 2 times, most recently from 62a6819 to 30ecc07 Compare August 28, 2025 23:03

smklein force-pushed the image_reporting branch from 91a884e to 73d7258 Compare August 28, 2025 23:03

smklein mentioned this pull request Aug 28, 2025

(2/N) Add schema for nexus generations #8944

Merged

smklein force-pushed the nexus_gen_usage branch from 30ecc07 to 8c0f5c9 Compare August 29, 2025 01:26

smklein force-pushed the image_reporting branch from 73d7258 to 8e77874 Compare August 29, 2025 01:26

smklein force-pushed the nexus_gen_usage branch from 8c0f5c9 to a2a7fb5 Compare August 29, 2025 15:48

smklein force-pushed the image_reporting branch from 8e77874 to 5578ec2 Compare August 29, 2025 15:48

smklein force-pushed the nexus_gen_usage branch from a2a7fb5 to 1067194 Compare August 29, 2025 21:05

smklein force-pushed the image_reporting branch 2 times, most recently from 0b2efdd to 9c09f60 Compare August 29, 2025 21:22

smklein force-pushed the nexus_gen_usage branch from 1067194 to bb4a47e Compare August 29, 2025 21:22

smklein commented Aug 29, 2025

View reviewed changes

smklein force-pushed the nexus_gen_usage branch from bb4a47e to df27c58 Compare August 29, 2025 22:17

smklein force-pushed the image_reporting branch from 9c09f60 to c073f52 Compare August 29, 2025 22:17

smklein force-pushed the nexus_gen_usage branch from df27c58 to 54480df Compare August 30, 2025 00:27

smklein force-pushed the image_reporting branch from c073f52 to 1d6d7a6 Compare August 30, 2025 00:43

smklein force-pushed the nexus_gen_usage branch 2 times, most recently from 6da9f39 to c4c748f Compare August 30, 2025 00:51

smklein force-pushed the image_reporting branch from 1d6d7a6 to 997c67b Compare August 30, 2025 00:51

smklein added a commit that referenced this pull request Aug 30, 2025

(2/N) Add schema for nexus generations (#8944)

69c5945

Adds schema for nexus generations, leaves the value at "1". These schemas will be used more earnestly in #8936 Fixes #8853

smklein force-pushed the nexus_gen_usage branch from c4c748f to 6750df3 Compare September 2, 2025 16:25

smklein force-pushed the image_reporting branch from 997c67b to 5524608 Compare September 2, 2025 16:25

smklein mentioned this pull request Sep 2, 2025

start updating quiesce for new Nexus handoff #8875

Merged

smklein mentioned this pull request Sep 2, 2025

Add nexus_generation to blueprint #8863

Open

smklein force-pushed the nexus_gen_usage branch from 0f1a341 to 74b7b0c Compare September 2, 2025 22:47

smklein marked this pull request as ready for review September 2, 2025 23:05

(7/N) Use nexus_generation, update it

6b7e9ed

smklein force-pushed the nexus_gen_usage branch from 74b7b0c to 6b7e9ed Compare September 2, 2025 23:22

smklein added 7 commits September 3, 2025 12:42

Merge with main

db263c1

drop keys for InlineErrorChain

bbb204c

Merge

67505ec

Merge

e1f2fe1

update nexus internal API

529874d

Actually create NotYet records when appropriate

961338c

Merge with main

fed7aa0

davepacheco reviewed Sep 5, 2025

View reviewed changes

smklein added 11 commits September 5, 2025 13:26

Merge

bf79e74

Merge

51b3103

Fix database_nexus_access_create to use active Nexus

9514deb

Make clippy happy

ce31570

Simplify database_nexus_access_create, move tests

70d4ab9

Unify queries for all db_metadata_nexus states

690ea16

comments, enums, set_..._nexuses

f15fa4b

determine_nexus_generation no longer returns an option

5525529

maybe when we say the gen bumps, we should do that

c8126af

merge

056829f

merge

e3615a9

smklein added a commit that referenced this pull request Sep 10, 2025

(6/N) Add image to planning report for zones (#8935)

bf74f7a

This change helps for zones like Nexus, which may have multiple deployments using distinct images (see: #8936)

Base automatically changed from image_reporting to main September 10, 2025 00:05

smklein added 4 commits September 10, 2025 09:47

move some generation checks out of planner, into blippy

a38a811

get_zones_not_yet_propagated_to_inventory comments

b4bcca7

clarifying planner comments

a6548ca

proposed

448a39d

		/// This is used to identify which Nexus is currently executing the planning
		/// operation, which is needed for safe shutdown decisions during handoff.

	/// This is used to identify which Nexus is currently executing the planning
	/// operation, which is needed for safe shutdown decisions during handoff.
	/// This is used to determine which Nexus instances are currently in control, which is needed for safe shutdown decisions during handoff.

(7/N) Use nexus_generation, update it #8936

Are you sure you want to change the base?

(7/N) Use nexus_generation, update it #8936

Uh oh!

Conversation

smklein commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Blueprint Preparation & System Description

Blueprint Planner

Blueprint Execution

Blippy

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davepacheco left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

smklein commented Aug 28, 2025 •

edited

Loading