Skip to content

Conversation

zeeshanlakhani
Copy link
Contributor

@zeeshanlakhani zeeshanlakhani commented Aug 29, 2025

This change strengthens the multicast implementation with always-allocated group IDs, better API validation, and comprehensive test improvements for Omicron integration.

This update no longer generates multicast group IDs optionally. They are always allocated during group creation, following how multicast groups are configured in the Omicron CP

In Omicron, multicast groups are created first, without members, and then members are added as instances are configured for a multicast group.

Replication configuration is only written to tables when members are added, but IDs are always generated for the 1:1 mapping between underlay and external (overlay) associated groups.

Includes:

  • Core ID Management Changes:

    • Remove Option - IDs are always allocated during
      group creation
    • Establish 1:1 mapping between underlay and external (overlay) groups
    • External groups now use IDs from corresponding NAT target (Omicron
      keeps the true relational mapping)
  • API Changes and Validation:

    • Remove sources field from internal group
      APIs (MulticastGroupCreateEntry, MulticastGroupUpdateEntry)
    • New response types for External/Underlay: MulticastUnderlayGroupResponse and MulticastExternalGroupResponse, and unified for lists MulticastGroupResponse
    • Internal groups cannot have sources or NAT targets - cleaner
      separation of concerns
    • External groups retain sources for proper SSM (Source-Specific
      Multicast) validation
    • Now fail outright on reset if cleanup is not used properly, which
      helps on the Omicron side
    • Renaming API boundary structs to be consistent.
    • Integrate all the new types with the API trait that went in upstream
    • A new AdminScopedIpv6 Type to make the calls into dpd properly typed for internal underlay groups
  • Rollback & Error Handling:

    • The addition of a rollback module (and trait) for a more
      functional approach to rollback on creation or updates involving
      tables, ports, etc
    • Improved error propagation in test cleanup to catch resource leaks early
    • Better validation of group ID relationships to match tables and
      allocation states
  • Test Infrastructure Improvements:

    • Enhanced cleanup_test_group() to fail explicitly on deletion errors
      (prevents test pollution), and ensures proper 1:1 deletion mapping
    • New tests for rollback, empty members upon multicast group creation/update
  • Replication Management:

    • Configure replication only when groups have members (change made
      expecting empty groups in Omicron CP initially)
    • Reconfigure replication tables when transitioning between empty/populated groups

Key aspects this commit covers:

  1. ID Management to match expectations in Omicron's multicast impl
  2. Validation: Enhanced API validation, group ID relationship checks,
    SSM validation
  3. Rollback: Reset operations now fail explicitly, better error propagation
  4. Testing: Comprehensive test improvements, better error handling,
    standardized cleanup

@zeeshanlakhani zeeshanlakhani force-pushed the zl/omicron-mcast-fallout branch 5 times, most recently from 15f31df to 076815a Compare September 3, 2025 03:02
@zeeshanlakhani zeeshanlakhani changed the title [mcast] updates for omicron changes [mcast] Lifecycle + API changes for Omicron impl Sep 3, 2025
@zeeshanlakhani zeeshanlakhani force-pushed the zl/omicron-mcast-fallout branch 2 times, most recently from 0fc844d to 55b4d32 Compare September 8, 2025 17:31
@zeeshanlakhani zeeshanlakhani marked this pull request as ready for review September 8, 2025 17:59
Copy link
Contributor

@FelixMcFelix FelixMcFelix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Zeeshan. Full disclosure that I haven't yet looked at the integration tests. I think the new rollback machinery is pretty neat -- obviously it's geared toward just multicast, but I think the model of maintaining a snapshot of the old target state and moving back to it is pretty useful.

I think overall I'm a bit confused by the mention of External forwarding groups being used with instances/guests, but most things here are nits.

zeeshanlakhani and others added 2 commits September 9, 2025 20:49
This change strengthens the multicast implementation with
always-allocated group IDs, better API validation, and comprehensive
test improvements for Omicron integration.

This update no longer generates multicast group IDs optionally.
They are always allocated during group creation, following how multicast
groups are configured in the Omicron CP

In Omicron, multicast groups are created first, without members, and then
members are added as instances are configured for a multicast group.

Replication configuration is only written to tables when members are
added, but IDs are always generated for the 1:1 mapping between underlay
and external (overlay) associated groups.

Includes:
  * **Core ID Management Changes:**
    - Remove Option<MulticastGroupId> - IDs are always allocated during
      group creation
    - Establish 1:1 mapping between underlay and external (overlay) groups
    - External groups now use IDs from corresponding NAT target (Omicron
      keeps the true relational mapping)

  * **API Changes and Validation:**
    - Remove sources field from internal group
      APIs (MulticastGroupCreateEntry, MulticastGroupUpdateEntry)
    - Internal groups cannot have sources or NAT targets - cleaner
      separation of concerns
    - External groups retain sources for proper SSM (Source-Specific
      Multicast) validation
    - Now fail outright on reset if cleanup is not used properly, which
      helps on the Omicron side.

  * **Rollback & Error Handling:**
    - The addition of a rollback module (and trait) for a more
      functional approach to rollback on creation or updates involving
      tables, ports, etc
    - Improved error propagation in test cleanup to catch resource leaks early
    - Better validation of group ID relationships to match tables and
      allocation states

  * **Test Infrastructure Improvements:**
    - Enhanced cleanup_test_group() to fail explicitly on deletion errors
      (prevents test pollution), and ensures proper 1:1 deletion mapping
    - New tests for rollback, empty members upon multicast group creation/update

  * **Replication Management:**
    - Configure replication only when groups have members (change made
      expecting empty groups in Omicron CP initially)
    - Reconfigure replication tables when transitioning between empty/populated groups

Key aspects this commit covers:

1. ID Management to match expectations in Omicron's multicast impl
2. Validation: Enhanced API validation, group ID relationship checks,
   SSM validation
3. Rollback: Reset operations now fail explicitly, better error propagation
4. Testing: Comprehensive test improvements, better error handling,
   standardized cleanup
…nsistentcy

This includes `MulticastUnderlayGroupResponse` and `MulticastExternalGroupResponse`, and a
unified response type for lists, mixed result calls `MulticastGroupResponse`. We also added
an AdminScoped type for underlay and consistent naming throughout. We
also rename structs for consistency, and handle rollback at the
boundary calls to internal fns.

This PR has been updated to accomodate the new API trait,
oxidecomputer/omicron#8922, so it adjusts
a lot from the previous code and commit.
@zeeshanlakhani
Copy link
Contributor Author

@FelixMcFelix Sorry for the additional changes, as 2daa552 went in after the review. With it changing all the type handling, I went ahead and just made the API more consistent (and properly restrictive) across the board.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants