Skip to content

feat: Add Flow Operation Rule REST APIs#2286

Open
kunzhao-nv wants to merge 19 commits into
NVIDIA:mainfrom
kunzhao-nv:feat/rest-api-rule-crud
Open

feat: Add Flow Operation Rule REST APIs#2286
kunzhao-nv wants to merge 19 commits into
NVIDIA:mainfrom
kunzhao-nv:feat/rest-api-rule-crud

Conversation

@kunzhao-nv

@kunzhao-nv kunzhao-nv commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Description

1. Operation Rule CRUD API — 5 site-scoped endpoints, backed by Flow's OperationRule gRPC. Rules govern how tasks execute, so they live under the /task namespace alongside task endpoints:

Method Path Purpose
POST /v2/org/{org}/nico/task/rule create
GET /v2/org/{org}/nico/task/rule list (filter by siteId, operationType)
GET /v2/org/{org}/nico/task/rule/{id} get
PATCH /v2/org/{org}/nico/task/rule/{id} update
DELETE /v2/org/{org}/nico/task/rule/{id} delete

2. Optional ruleId on rack/tray operation APIs — added to 8 request schemas (power / firmware / bring-up, single + batch, rack + tray). When set, the value is forwarded to Flow's rule_id proto field via the three shared Execute* helpers in handler/util/common/common.go. When omitted, Flow falls back to its default rule resolution — no behavior change for existing callers.

3. ruleId on task responses — the RackTask schema now carries the Operation Rule Flow resolved for the task (sourced from Task.applied_rule_id in the Flow proto), whether the caller pinned it via ruleId on the originating request or Flow's default resolution picked it. Callers reading a task can now answer "which rule ran for this task" by reading the task itself, no extra lookup.

Type of Change

  • Add - New feature or capability
  • Change - Changes in existing functionality
  • Fix - Bug fixes
  • Remove - Removed features or deprecated functionality
  • Internal - Internal changes (refactoring, tests, docs, etc.)

Related Issues (Optional)

Breaking Changes

  • This PR contains breaking changes

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • Manual testing performed
  • No testing required (docs, internal refactor, etc.)

Additional Notes

Followups:

  1. Support promoting a rule to default
  2. Add REST api for rule-rack association

Wire 5 site-scoped endpoints (POST/GET/LIST/PATCH/DELETE /v2/org/{org}/nico/rule)
backed by Flow's OperationRule gRPC, with matching Temporal workflows/activities,
typed OpenAPI schemas, and unit tests across model/handler/workflow/activity layers.

Signed-off-by: Kun Zhao <kunzhao@nvidia.com>
Adds optional ruleId to power/firmware/bring-up request models (single +
batch, rack + tray) and forwards it to Flow's rule_id proto field via
the shared Execute* helpers. UUID-validated at the model layer.

Signed-off-by: Kun Zhao <kunzhao@nvidia.com>
Catches up `docs/index.html` and `sdk/standard/**` for the two preceding
spec changes (Rule CRUD endpoints in c73d4f9 and the rack/tray ruleId
field in 4bf84d0), which both shipped without their generated
artifacts.

Signed-off-by: Kun Zhao <kunzhao@nvidia.com>
@kunzhao-nv kunzhao-nv requested a review from a team as a code owner June 7, 2026 05:07
@coderabbitai

coderabbitai Bot commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: e1527967-bd47-4ddc-ad8e-d772ae300692

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented Jun 7, 2026

Copy link
Copy Markdown

@github-actions

github-actions Bot commented Jun 7, 2026

Copy link
Copy Markdown

🔐 TruffleHog Secret Scan

No secrets or credentials found!

Your code has been scanned for 700+ types of secrets and credentials. All clear! 🎉

🔗 View scan details

🕐 Last updated: 2026-06-07 05:09:58 UTC | Commit: dd435f1

Keep the surface minimal; can be reintroduced alongside default-rule
management.

Signed-off-by: Kun Zhao <kunzhao@nvidia.com>
Covers 8464a1e (drop isDefault list filter); also adds SPDX headers to
the rule SDK files that the previous regen missed.

Signed-off-by: Kun Zhao <kunzhao@nvidia.com>
@github-actions

github-actions Bot commented Jun 7, 2026

Copy link
Copy Markdown

🔍 Container Scan Summary

Service Total Critical High Medium Low Other
nico-flow 116 13 50 41 4 8
nico-nsm 133 11 45 66 11 0
nico-psm 118 13 52 41 4 8
nico-rest-api 182 16 84 67 7 8
nico-rest-cert-manager 95 5 47 32 3 8
nico-rest-db 116 13 50 41 4 8
nico-rest-site-agent 115 13 50 41 3 8
nico-rest-site-manager 102 6 48 37 3 8
nico-rest-workflow 118 13 52 41 4 8
TOTAL 1095 103 478 407 43 64

Per-CVE detail lives in the per-service grype-* artifacts (JSON + SARIF). Severity counts only — no CVE IDs published here.

@thossain-nv

Copy link
Copy Markdown
Contributor

@kunzhao-nv Rule also feels like too generic of a high level endpoint and it seems best to contextualize it. Should we consider /flow/* URL namespace for these generic terms?

Rules govern how tasks execute, so colocating the rule endpoints under
the /task namespace matches how callers think about them and keeps the
top-level nico path uncluttered.

  POST   /v2/org/{org}/nico/task/rule
  GET    /v2/org/{org}/nico/task/rule
  GET    /v2/org/{org}/nico/task/rule/{id}
  PATCH  /v2/org/{org}/nico/task/rule/{id}
  DELETE /v2/org/{org}/nico/task/rule/{id}

Signed-off-by: Kun Zhao <kunzhao@nvidia.com>
Flow records the OperationRule it resolves for each task (proto field
Task.applied_rule_id, populated whether the caller pinned a ruleId on
the originating request or Flow picked the default). Expose it on the
REST RackTask schema so clients can answer "which rule ran for this
task" by reading the task itself instead of inferring it.

Field name is ruleId — same as the request-side field on rack/tray
operation APIs — so request and response use the same vocabulary for
the same concept.

Signed-off-by: Kun Zhao <kunzhao@nvidia.com>
Covers 03ea0ea (rule path move to /task/rule) and 28b59af (Task.ruleId).

Signed-off-by: Kun Zhao <kunzhao@nvidia.com>
@kunzhao-nv kunzhao-nv force-pushed the feat/rest-api-rule-crud branch from 9817d24 to c23abc7 Compare June 9, 2026 00:36
Comment thread rest-api/api/pkg/api/model/rule.go Outdated
Comment thread rest-api/api/pkg/api/handler/taskrule.go
Comment thread rest-api/api/pkg/api/handler/rule.go Outdated
Comment thread rest-api/api/pkg/api/handler/rule.go Outdated
Comment thread rest-api/api/pkg/api/handler/rule.go Outdated
Comment thread rest-api/api/pkg/api/model/rule.go Outdated
Comment thread rest-api/api/pkg/api/model/rule.go Outdated
Comment thread rest-api/api/pkg/api/model/rule.go Outdated
Comment thread rest-api/api/pkg/api/handler/rule.go Outdated
Comment thread rest-api/api/pkg/api/handler/rule.go Outdated
@thossain-nv

Copy link
Copy Markdown
Contributor

@kunzhao-nv I've suggested using TaskRule for consistency across the board. But if that creates cognitive dissonance, let's discuss a naming strategy.

Rules apply to tasks; the type, file, and function names now reflect
that domain context so readers don't have to infer the relationship
from prose comments. The change is mechanical — no behavior change.

* Files: model / handler / workflow / activity rule.go (+ tests) →
  taskrule.go.
* Handler types: CreateRuleHandler → CreateTaskRuleHandler (and the
  matching Get / GetAll / Update / Delete variants), with receiver
  variables updated to the codebase's first-letter-abbreviation
  convention (ctrh / gtrh / gatrh / utrh / dtrh).
* List handler renamed to GetAll to match how every other listing
  handler in this package is spelled.
* Model types: APIOperationRule / APIRuleDefinition / APISequenceStep
  / APIActionConfig / APIRetryPolicy → APITaskRule[…]. Request structs
  switch to the APIResourceActionRequest form used elsewhere in this
  package (APITaskRuleCreateRequest / GetRequest / UpdateRequest /
  DeleteRequest / GetAllRequest).
* Workflow + activity entry points: CreateRule → CreateTaskRule, and
  the activity counterparts (CreateOperationRuleOnFlow →
  CreateTaskRuleOnFlow, ManageRule interface → ManageTaskRule, etc.).
* Internal helpers: errRuleResponseSent → errTaskRuleResponseSent,
  ruleHandlerPrepare → prepareTaskRuleHandler.

Signed-off-by: Kun Zhao <kunzhao@nvidia.com>
Match the convention every other enum-like string on the REST surface
already follows (TaskStatus, ComponentType, DiffType, BMCType). The
wire form changes from "power_control" / "firmware_control" to
"PowerControl" / "FirmwareControl".

Flow's internal YAML rule files keep their existing snake_case
spelling; conversion now happens at the REST boundary via the existing
ProtoToAPIOperationTypeName / apiToProtoOperationType maps.

Signed-off-by: Kun Zhao <kunzhao@nvidia.com>
…tion

The nested SequenceStep / ActionConfig / RetryPolicy schemas previously
used snake_case JSON keys (component_type, max_parallel, …) to round-
trip 1:1 with Flow's stored rule_definition_json blob. Snake_case is
out of step with every other model in this package, which is uniformly
camelCase. Switch the REST surface to camelCase and bridge to Flow's
blob via a parallel set of unexported flow*Wire structs whose tags
preserve snake_case.

* APITaskRuleSequenceStep, APITaskRuleActionConfig, APITaskRuleRetry-
  Policy json tags now use componentType / maxParallel / preOperation
  / mainOperation / postOperation / delayAfter / pollInterval /
  maxAttempts / initialInterval / backoffCoefficient / maxInterval.
* New flowRuleDefinitionWire / flowSequenceStepWire / flowAction-
  ConfigWire / flowRetryPolicyWire mirror the API types with snake_
  case tags, plus toFlowJSON / fromFlowJSON helpers used by
  APITaskRule.FromProto and APITaskRuleCreateRequest/UpdateRequest's
  ToProto. Field-by-field conversion is mechanical; keep the wire
  structs in lock-step when adding fields.
* OpenAPI spec (SequenceStep, ActionConfig, RetryPolicy) updated to
  match. ActionConfig.parameters keys remain free-form pass-through
  (Flow vocabulary).

Signed-off-by: Kun Zhao <kunzhao@nvidia.com>
The type and its OpenAPI schema previously read as if tasks only ever
belong to racks, but the same shape is returned by both rack-scoped
and tray-scoped task endpoints (and will cover further task sources
in the future). Drop the misleading prefix.

* APIRackTask → APITask, NewAPIRackTask → NewAPITask,
  ProtoToAPIRackTaskStatusName → ProtoToAPITaskStatusName.
* OpenAPI schema `RackTask` → `Task`; four `$ref` sites updated.
* Description broadened to reflect that tasks model any
  asynchronous, site-scoped operation.

Signed-off-by: Kun Zhao <kunzhao@nvidia.com>
Follow the convention set by allocation.go and other existing models:
optional fields render as explicit null / "" rather than disappearing
from the JSON payload, so clients always see the expected shape.
omitempty remains on the unexported flow*Wire structs because those
serialise to Flow's stored blob, not to clients.

Signed-off-by: Kun Zhao <kunzhao@nvidia.com>
Replace the bespoke validateOptionalUUID helper and the manual
if-checks scattered across the TaskRule create/update/get/delete/
list requests, plus the rack and firmware operation requests, with
ozzo-validation chains so every model in this package speaks the
same validation idiom (see allocation.go for the long-standing
convention).

* firmware.go: APIUpdateFirmwareRequest, APIBatchRackFirmwareUpdate-
  Request, APIBatchTrayFirmwareUpdateRequest now use validation.
  ValidateStruct with validation.When + validationis.UUID for ruleId.
  validateOptionalUUID is removed.
* rack.go: APIBringUpRackRequest, APIBatchBringUpRackRequest follow
  the same pattern.
* taskrule.go: APITaskRuleCreateRequest / UpdateRequest / GetRequest
  / DeleteRequest / GetAllRequest move off manual ifs to ozzo,
  including validation.In for operationType against a shared
  validOperationTypesAny enum slice.
* Tests updated to match ozzo's error format
  ("operationType must be one of …" / "X: must be a valid UUID.").

Signed-off-by: Kun Zhao <kunzhao@nvidia.com>
The helper has always been a thin pointer-to-Flow-UUID wrapper that has
nothing rule-specific about it. Lift it out of the rule-id niche so
future call sites (any optional API ID forwarded to Flow) can reach
for the same primitive instead of cloning it. Behavior is unchanged.

Signed-off-by: Kun Zhao <kunzhao@nvidia.com>
Covers spec changes from 74b3d16..fc054ca:
  74b3d16 rename Rule API surface to TaskRule
  5a68262 PascalCase operationType enum values
  6a473ae camelCase ruleDefinition keys + Flow blob translation
  546a65f rename APIRackTask to APITask

Generated by `make publish-openapi` (docs) and `make generate-sdk`
(SDK). model_rack_task.go removed as the schema is now Task; the
generator dropped an unrelated import-grouping change to
model_tenant_identity_config_create_or_update_request.go which was
reverted.

Signed-off-by: Kun Zhao <kunzhao@nvidia.com>
@kunzhao-nv

Copy link
Copy Markdown
Contributor Author

Pushed b4608bb addressing the latest round of comments. Mapping each change to a reviewer theme:

# Theme Commit
1 Rename RuleTaskRule (files, handler/model/workflow/activity types, receivers, ListRulesHandlerGetAllTaskRuleHandler) 74b3d16
2 PascalCase operationType enum values (power_controlPowerControl, firmware_controlFirmwareControl) 5a68262
3 camelCase nested ruleDefinition keys (componentType, maxParallel, pre/main/postOperation, retry policy fields, …) with a flow*Wire translation layer that keeps Flow's stored blob snake_case 6a473ae
4 APIRackTaskAPITask; OpenAPI schema RackTaskTask (the type covers rack + tray + any future task-producing endpoint) 546a65f
5a Drop omitempty on the TaskRule REST surface to match allocation.go convention; kept on the internal flow*Wire blob types 6cc93ee
5b Move TaskRule + firmware + rack request validators to ozzo (validation.ValidateStruct + validation.When + validationis.UUID); delete the bespoke validateOptionalUUID helper fb08d14
5c Generalize ruleIDToProtoGetFlowUUIDPtr(id *string) *flowv1.UUID so future optional API IDs can reuse it fc054ca
Regen openapi/docs/index.html + sdk/standard/** b4608bb

Notes:

  • camelCase change in (3) is REST-surface only; Flow's rule_definition_json blob still serialises with the original snake_case schema, so YAML rule files don't need any migration.
  • operationCode and ActionConfig.parameters keys stay as Flow's vocabulary (free-form pass-through).
  • ozzo error message format shifted slightly ("operationType: operationType must be one of [PowerControl FirmwareControl]." vs the old "invalid operationType …"); tests updated accordingly.

PTAL when you have a moment.

Trailing blank line left over from fb08d14 (validator cleanup)
trips the Style Check job's gofmt check. No code change.

Signed-off-by: Kun Zhao <kunzhao@nvidia.com>
Finishes the Rule → TaskRule rename for the layer that was missed in
74b3d16: the five Temporal workflow functions still carried the
Operation* spelling, leaving the codebase inconsistent (handler /
model / activity all on TaskRule; workflow alone on OperationRule).

* site-workflow/pkg/workflow/taskrule.go: CreateOperationRule →
  CreateTaskRule (+ matching Get / GetAll / Update / Delete variants).
  Log fields now follow the package convention used by tray.go and
  ibpartition.go — Str("Workflow", "TaskRule").Str("Action", "Create")
  instead of bundling the resource into the Action label.
* site-workflow/pkg/workflow/taskrule_test.go: test-suite types and
  Test_* method names move to the new spelling.
* api/pkg/api/handler/taskrule.go and *_test.go: the workflow-name
  string passed to ExecuteWorkflow / mocked via testify and the
  objectType handed to SetupHandler / TerminateWorkflowOnTimeOut all
  switch to "TaskRule" + the new workflow names. Temporal looks
  workflows up by name, so the registered name (subscriber.go) and
  the dispatch name (handler) have to agree.
* site-agent/.../flowgrpc/subscriber.go: RegisterWorkflow now binds
  the new function identifiers; the accompanying log messages move
  with them.

Flow's gRPC method names (flowv1.CreateOperationRule* and the matching
proto types) intentionally stay as-is — they're Flow's contract, not
ours; the activity layer (CreateTaskRuleOnFlow et al.) is the bridge.

Drive-by: drop two stray "MVP" mentions from doc comments per the
project's "no MVP / phase vocabulary in code comments" preference.

Signed-off-by: Kun Zhao <kunzhao@nvidia.com>
Per PR review (model/rule.go:141): an additional NewX constructor
isn't needed when FromProto already exists. Callers create the
zero-value struct and call FromProto directly, matching the simpler
shape used elsewhere in the package.

* model/taskrule.go: remove NewAPITaskRule.
* handler/taskrule.go (Get + GetAll): inline &model.APITaskRule{} +
  FromProto at the two call sites; error handling is unchanged.
* model/taskrule_test.go: switch the two FromProto test cases to
  invoke the method directly.

Signed-off-by: Kun Zhao <kunzhao@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants