Skip to content

Conversation

smklein
Copy link
Collaborator

@smklein smklein commented Aug 28, 2025

  • Actually update nexus generation within the top-level blueprint and Nexus zones
  • Deploy new and old nexus zones concurrently

Blueprint Preparation & System Description

  • Queries and returns the set of "active" and "not yet" Nexuses. This information is consumed by the planner.
  • Tracks "active" and "not yet" zones in SystemDescription for tests

Blueprint Planner

  • Automatically determine nexus generation when provisioning new Nexus zones, based on existing deployed zones
  • Update the logic for provisioning nexus zones, to deploy old and new nexus images side-by-side
  • Update the logic for expunging nexus zones, to only do so when running from a "newer" nexus
  • Add a planning stage to bump the top-level "nexus generation", if appropriate, which would trigger the old Nexuses to quiesce.

Blueprint Execution

  • Update the stage of the blueprint which creates db_metadata_nexus records. Previously, this created active records for all in-service Nexuses. Now, it creates active and not_yet records, depending on the value of the nexus_generation set in the zone record compared to the top-level nexus_generation

Blippy

  • Adds a check verifying that Nexus zones with the same generation all use the same image

Fixes #8843, #8854

* only placed 0/3 desired internal_dns zones
* only placed 0/3 desired nexus zones

error: generating blueprint: could not find active nexus zone in parent blueprint
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO(me) insert this into input, so we don't have as much churn

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, found the problem here. The reconfigurator-cli explicitly creates an "ExampleSystem" with "no_zones" or "no_disks" enabled, then tries to create a blueprint.

In a model where we need an old Nexus zone to make a new Nexus zone, this breaks.

I'm updating the planner logic to allow this case. With that, the reconfigurator-cli diff is significantly smaller.

@smklein smklein force-pushed the nexus_gen_usage branch 2 times, most recently from d1bd3fb to bf8f274 Compare August 28, 2025 22:37
@smklein smklein force-pushed the nexus_gen_usage branch 2 times, most recently from 62a6819 to 30ecc07 Compare August 28, 2025 23:03
@smklein smklein force-pushed the image_reporting branch 2 times, most recently from 0b2efdd to 9c09f60 Compare August 29, 2025 21:22
}
}

// Confirm that we have new nexuses at the desired generation number
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we also need to confirm that the new db_metadata_nexus records have hit the DB - otherwise, the "old Nexuses" could quiesce without giving the "new Nexuses" enough context to do a handoff.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done.

@smklein smklein force-pushed the nexus_gen_usage branch 2 times, most recently from 6da9f39 to c4c748f Compare August 30, 2025 00:51
smklein added a commit that referenced this pull request Aug 30, 2025
Adds schema for nexus generations, leaves the value at "1".

These schemas will be used more earnestly in
#8936

Fixes #8853
@smklein smklein marked this pull request as ready for review September 2, 2025 23:05
Copy link
Collaborator

@davepacheco davepacheco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this. Sorry for all my questions. The details here seem very tricky.

let mut active = vec![];
let mut not_yet = vec![];
for (_, zone) in
blueprint.all_omicron_zones(BlueprintZoneDisposition::is_in_service)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitty but it feels to me like this logic belongs in the caller. Maybe they could pass in the list of Nexus instances that should be "active" vs. "not_yet"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing this moves all the complexity of parsing the blueprint into deploy_db_metadata_nexus_records, in the reconfigurator executor, so I've moved quite a few tests there too.

Done in 70d4ab9

for (_, zone) in
blueprint.all_omicron_zones(BlueprintZoneDisposition::is_in_service)
{
if let BlueprintZoneType::Nexus(ref nexus) = zone.zone_type {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this logic for determining whether each Nexus zone is active or not yet is quite right. Suppose:

  • the planner decides it's time to hand off and bumps nexus_generation
  • "old" Nexus goes to execute this blueprint -- it can continue executing blueprints for a while in this state (if there are sagas still running)
  • but this will cause us to write "new" Nexus instances with state active instead of not_yet

Right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch. I fixed this in 9514deb, and added a test specifically for this "add-Nexus-after-quiesce-started" behavior.

Comment on lines 207 to 220
let active_nexus_zones = datastore
.get_active_db_metadata_nexus(opctx)
.await
.internal_context("fetching active nexuses")?
.into_iter()
.map(|z| z.nexus_id())
.collect::<Vec<_>>();
let not_yet_nexus_zones = datastore
.get_not_yet_db_metadata_nexus(opctx)
.await
.internal_context("fetching 'not yet' nexuses")?
.into_iter()
.map(|z| z.nexus_id())
.collect::<Vec<_>>();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like we could combine these into one database query? Not a big deal.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 690ea16

Comment on lines 130 to 131
/// This is used to identify which Nexus is currently executing the planning
/// operation, which is needed for safe shutdown decisions during handoff.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// This is used to identify which Nexus is currently executing the planning
/// operation, which is needed for safe shutdown decisions during handoff.
/// This is used to determine which Nexus instances are currently in control, which is needed for safe shutdown decisions during handoff.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in f15fa4b

}
}

pub fn add_active_nexuses(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub fn add_active_nexuses(
pub fn set_active_nexuses(

("add" to me would imply that we're appending this set, but we're not)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in f15fa4b

fn is_zone_ready_for_update(
&self,
zone_kind: ZoneKind,
mgs_updates: &PlanningMgsUpdatesStepReport,
) -> Result<bool, TufRepoContentsError> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could just return bool now, except I'm worried you do still need some logic here to avoid deploying new Nexus zones before the rest of the system has been updated. I'm not sure that's necessary? But it seems safer and I assumed it was what we'd keep doing.

Ok(true)
}

fn lookup_current_nexus_image(&self) -> Option<BlueprintZoneImageSource> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In practice, when would this function ever return None?

!= new_repo.zone_image_source(kind)?
{
return Ok(false);
fn lookup_current_nexus_generation(&self) -> Option<Generation> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In practice, when would this function ever return None?

Comment on lines +2327 to +2338
else {
// If we don't know the current Nexus zone ID, or its
// generation, we can't perform the handoff safety check.
report.unsafe_zone(
zone,
Nexus {
zone_generation: zone_nexus_generation,
current_nexus_generation: None,
},
);
return false;
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this case should be impossible now. That would imply there was literally no Nexus zone in the blueprint at the current generation?

Comment on lines +2340 to +2356
// We need to prevent old Nexus zones from shutting themselves
// down. In other words: it's only safe to shut down if handoff
// has occurred.
//
// That only happens when the current generation of Nexus (the
// one running right now) is greater than the zone we're
// considering expunging.
if current_gen <= zone_nexus_generation {
report.unsafe_zone(
zone,
Nexus {
zone_generation: zone_nexus_generation,
current_nexus_generation: Some(current_gen),
},
);
return false;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is all of the logic thus far asking: is this one of the "active" Nexus zones? (Could we just check the planning input?)

Following up on this thread from the older PR: #8863 (comment)
I don't follow why we need to check this. It actually seems wrong. It means we can never shut down any Nexus zone except if it's post-handoff. But there's nothing unsafe about shutting down a single Nexus zone, right? And at some point we're going to want SP updates to use this function to check whether it's safe to shut down all the zones on the host whose SP is being bounced (#8482). Won't this check then prevent us from doing any SP updates on a host that's hosting Nexus?

The only place this is called is from do_plan_zone_updates(), but I feel like maybe that function just needs to ignore Nexus zones altogether since they're updated specially.

smklein added a commit that referenced this pull request Sep 10, 2025
This change helps for zones like Nexus, which may have multiple
deployments using distinct images (see:
#8936)
Base automatically changed from image_reporting to main September 10, 2025 00:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants