Skip to content

Add nexus_generation to blueprint #8863

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Add nexus_generation to blueprint #8863

wants to merge 2 commits into from

Conversation

smklein
Copy link
Collaborator

@smklein smklein commented Aug 19, 2025

  • Adds nexus_generation to the blueprint, for Nexus zones, and also as a top-level field
    • When provisioning a new nexus zone: if the matches any existing zone, use that nexus_generation value
    • Otherwise: choose a generation number higher than all existing instances.
  • Changes deployment of Nexus zones, to proactively provision new zones alongside old ones, rather than doing a replacement.
    • This PR does not implement the handoff process. However, it does permit "new Nexus zones" to expunge old Nexus zones which have an older nexus_generation, if any of the "new Nexuses" are running.
  • Adds a do_plan_nexus_generation_update method to the planner, which decides when the top-level Nexus generation number should be incremented.

Fixes #8853, #8843

@smklein smklein force-pushed the nexus_generation branch 2 times, most recently from 85446aa to 7a3d744 Compare August 20, 2025 20:54
@@ -148,7 +148,7 @@ async fn cmd_reconfigurator_export(
// See Nexus::blueprint_planning_context().
eprint!("assembling reconfigurator state ... ");
let state = nexus_reconfigurator_preparation::reconfigurator_state_load(
opctx, datastore,
opctx, datastore, None,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and below: There are many spots where we're plumbing through a current_nexus_zone: Option<OmicronZoneUuid> as None.

It's important for the planner to know "which Nexus are we currently running on" when making decisions about deploying new zones -- but there are contexts where "we aren't running from ANY Nexus zone".

Rather than creating a fake UUID, I left this field optional.

@@ -81,6 +81,7 @@ pub struct Blueprint {
pub creator: String,
pub comment: String,
pub target_release_minimum_generation: Generation,
pub nexus_generation: Generation,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the "top-level nexus_generation" described by RFD 588, representing the active set.

(See also: nexus/types/src/deployment.rs for the more normalized-to-Rust version of this struct)

@@ -524,6 +527,7 @@ pub struct BpOmicronZone {

pub image_source: DbBpZoneImageSource,
pub image_artifact_sha256: Option<ArtifactHash>,
pub nexus_generation: Option<Generation>,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the "zone-specific field, only set for Nexus zone types".

///
/// If `must_have_nexus_zones` is false, then these settings
/// are permitted to use default values.
pub fn sled_add_zone_nexus_internal(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some tests which want to "add a Nexus zone, even though we don't have existing Nexus zones".

They previously called sled_add_zone_nexus_with_config directly, but I want them to converge on this common pathway as much as possible, to share nexus_generation calculation logic.

To mitigate:

  • This API exposes a must_have_nexus_zones argument, which can toggle "whether or not we must copy data from existing Nexus zones or not"
  • Most callers will use sled_add_zone_nexus, which uses must_have_nexus_zones = true
  • Callers in test cases that want to spawn Nexuses from nothing can use must_have_nexus_zones = false.


// Set the current Nexus zone ID if we found one
if let Some(nexus_zone_id) = current_nexus_zone_id {
input_builder.set_current_nexus_zone_id(Some(nexus_zone_id));
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This helps all our tests "pretend they're running from an arbitrary Nexus".

Some tests which modify the set of running Nexuses can, and do, call set_current_nexus_zone_id later to modify this value further.

Comment on lines +951 to +1036
for image_source in image_sources {
let num_zones_to_add = self.num_additional_zones_needed(
zone_kind,
&image_source,
report,
);
if num_zones_to_add == 0 {
continue;
}
// We need to add at least one zone; construct our `zone_placement`
// (or reuse the existing one if a previous loop iteration already
// created it).
let zone_placement = zone_placement.get_or_insert_with(|| {
// This constructs a picture of the sleds as we currently
// understand them, as far as which sleds have discretionary
// zones. This will remain valid as we loop through the
// `zone_kind`s in this function, as any zone additions will
// update the `zone_placement` heap in-place.
let current_discretionary_zones = self
.input
.all_sled_resources(SledFilter::Discretionary)
.filter(|(sled_id, _)| {
!report.sleds_waiting_for_ntp_zone.contains(&sled_id)
})
.map(|(sled_id, sled_resources)| {
OmicronZonePlacementSledState {
sled_id,
num_zpools: sled_resources
.all_zpools(ZpoolFilter::InService)
.count(),
discretionary_zones: self
.blueprint
.current_sled_zones(
sled_id,
BlueprintZoneDisposition::is_in_service,
)
.filter_map(|zone| {
DiscretionaryOmicronZone::from_zone_type(
&zone.zone_type,
)
})
.collect(),
}
});
OmicronZonePlacement::new(current_discretionary_zones)
});
self.add_discretionary_zones(
zone_placement,
zone_kind,
num_zones_to_add,
image_source,
report,
)?;
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is mostly just an indented version of what was here before -- but num_additional_zones_needed considers the image_source now, as "new" and "old" nexuses should be treated distinctly.

@@ -5541,8 +5680,8 @@ pub(crate) mod test {
/// Ensure that dependent zones (here just Crucible Pantry) are updated
/// before Nexus.
#[test]
fn test_update_crucible_pantry() {
static TEST_NAME: &str = "update_crucible_pantry";
fn test_update_crucible_pantry_before_nexus() {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test has already been opinionated about when / how Nexus updates -- I've changed the name to reflect that.

Comment on lines +5791 to +5900
// Nexus should deploy new zones, but keep the old ones running.
let expected_new_nexus_zones =
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old behavior was to "request more Nexus zones, and observe we don't deploy them".

Well, with these new changes, that behavior doesn't happen -- we should launch the new Nexus zones pretty much immediately, it just will be a while before quiesce/handoff occurs.

As such: I don't bother increasing the "target_zone_count". I just observe the new and old nexus zones co-exist.

#[derive(
Clone, Debug, Deserialize, Serialize, PartialEq, Eq, Diffable, JsonSchema,
)]
pub struct DiscretionaryZonePlacement {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tweaked this part of the the reporting system, because it's important to know which image we're using for Nexus. Just "kind = Nexus" isn't enough to know, "is this an old version, or a new version".

I mostly modified this for visibility when debugging tests - am open to further changes here.

Comment on lines +231 to +234
/// If a Nexus instance notices it has a nexus_generation less than
/// this value, it will start to quiesce (see: RFD 588).
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be implemented in #8856 , and is not implemented this PR.

@smklein smklein force-pushed the nexus_generation branch 4 times, most recently from e688fc6 to 8235551 Compare August 21, 2025 19:50
//
// Doing so will be a signal for all running Nexus instances at
// lower versions to start quiescing, and to perform handoff.
fn do_plan_nexus_generation_update(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also definitely worth scrutiny - it also has pretty comprehensive tests in test_nexus_generation_update below.

@smklein smklein marked this pull request as ready for review August 21, 2025 20:32
parent = blueprint;
}

panic!("did not converge after {MAX_PLANNING_ITERATIONS} iterations");
}

struct BlueprintGenerator {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this struct to help the actual contents of test_nexus_generation_update be easier to write... but after doing so, I'd be kinda on-board to move more tests over to using this explicitly.

IMO it helps make the test much more concise when blueprint generation is as one-liner.

Comment on lines +1836 to +1839
report.set_waiting_on(
NexusGenerationBumpWaitingOn::NewNexusBringup,
);
return Ok(report);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually do not have test coverage for this case, and would like to add it before this PR merges. I have struggled to do it through zone manipulation - because Nexus is discretionary, we'll be eager to add the new Nexus zones if we can (and why not? They should wait on boot for handoff).

To force this to happen, I'm thinking I'll need to construct a scenario where we expunge a sled so that we cannot actually place this new Nexus, and observe that the handoff does not occur while we're operating at a reduced capacity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reconfigurator: Add nexus_generation to blueprint, Nexus zones
1 participant