Skip to content

SP and RoT need to coordinate on firmware rollback decisions #1207

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
cbiffle opened this issue Mar 14, 2023 · 0 comments
Open

SP and RoT need to coordinate on firmware rollback decisions #1207

cbiffle opened this issue Mar 14, 2023 · 0 comments
Milestone

Comments

@cbiffle
Copy link
Collaborator

cbiffle commented Mar 14, 2023

When "trying out" new SP firmware, we may need to decide it's crap and cancel the upgrade, flipping back to the other image slot (which presumably still contains the previous assumed-okay firmware). We currently don't have a mechanism built for this.

The RoT will wind up having to be in charge, due to our use of the RoT as a sort of "external bootloader" for the SP. Painting in broad strokes, some things we may need here are

  1. A way for the SP to report "I am totally hosed and cannot boot" to the RoT. Currently this has to be a SPI message since we don't have a "boot failure" net from SP to RoT (though adding one in the future would rock). This implies that the SP needs to be able to detect things being hosed, and generate that message -- more on that in Gimlet SP should pull its fault pin on very serious failures #1206.
  2. A way for the control plane to tell the RoT to stand down. This would be the best end-to-end verification that an SP image is adequate: that it has convinced the control plane it's ok, and has received a message from the control plane agreeing.
  3. Code that monitors these two sources, plus the SP reset pin (which will be pulsed by the SP on a watchdog reset), to decide whether to flip banks back on the next reset, or to stand down. A failure (explicit or unexpected reset) would cause flip-back exactly once; standing down would cause further resets to not cause flip-back.
@cbiffle cbiffle added this to the Unscheduled milestone Apr 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant