-
Notifications
You must be signed in to change notification settings - Fork 484
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Short Rotation Period For Certificates #1670
base: master
Are you sure you want to change the base?
Conversation
b6fc4a1
to
96a3fee
Compare
/cc @tkashem @p0lyn0mial for LGTM /cc @deads2k for approval |
ce017fe
to
ccd17d3
Compare
/retest |
ccd17d3
to
83e6af9
Compare
/lgtm (on the provision that there is no viable alternate method to expose this configuration exclusively for CI and testing, and not making it visible to the customer) |
|
||
* Change validity duration for existing certificates | ||
|
||
## Proposal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Details are scant. To achieve the same benefit, this featuregate needs to be enabled in Default during pre-RC builds and moved to TechPreview after RC.0. Can we get this spelled out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There wasn't any benefit, "developer branch fast rotation" has never worked in CI - see https://github.com/openshift/enhancements/pull/1670/files#diff-1695a5e93f0f7e139919d0e0fac08ce0ea6d442932ff1cc7fab792ef1c616cf1R31-R35
Added more details on how this will be tested in CI and promotion criteria
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There wasn't any benefit, "developer branch fast rotation" has never worked in CI - see https://github.com/openshift/enhancements/pull/1670/files#diff-1695a5e93f0f7e139919d0e0fac08ce0ea6d442932ff1cc7fab792ef1c616cf1R31-R35
Added more details on how this will be tested in CI and promotion criteria
The point is not coverage in CI. The point is every pre-production cluster in existence that runs longer than a day. Which covers a wide variety of clusters internal to this group and external to this group including education, TAMs, test platform, and various test environments. Losing this capability is enormous. CI was an objective, but we gained tremendous benefit without ever having a rotation in CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It never materialized into bugs though (especially into availability tracking), has it? Testing fast rotation in CI however will have the same effect alongside with a better set of debugging information.
In any case, this featuregate doesn't forbid specific components to do pre-release shorter cert rotations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added clarification that existing devrotation code can be used alongside short rotation featuregate
If this is a requirement, I'm not sure that this enhancement gets us anywhere. We still have to open a PR per release to turn it off, just in openshift/api, rather than openshift/cluster-kube-apiserver-operator. xref past KASO pre-release PRs So... I guess that means I'm 👎 on this. |
83e6af9
to
b2d3cc3
Compare
Inactive enhancement proposals go stale after 28d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle stale |
/remove-lifecycle stale |
Inactive enhancement proposals go stale after 28d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle stale |
Stale enhancement proposals rot after 7d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle rotten |
/remove-lifecycle rotten |
b2d3cc3
to
db3592f
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
db3592f
to
397766a
Compare
@vrutkovs: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Inactive enhancement proposals go stale after 28d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle stale |
/remove-lifecycle stale |
### Goals | ||
|
||
* Create a new FeatureGate in DevPreview featureset | ||
* Each component can decide the new duration for certificates separately. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a specific reason for leaving this up to the components?
The dev flag already means an upper bound on the rotation time:
i.e:
Test Plan
observe the cluster for 6-8 hours
So within 6 hrs all of the components of interest should have cycled through a rotation.
So why don't we just explicitly dictate the duration (e.g 3hrs) for the relevant components?
I'm guessing we likely don't want all components to rotate at the same time (although not sure if we expect to tolerate any API disruption from these rotations).
But if the motivation for letting components choose, is to stagger the rotations, then they could still overlap rotations if they end up choosing the same "shortened" duration or change it later for whatever reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So why don't we just explicitly dictate the duration (e.g 3hrs) for the relevant components?
Cert durations depend on functionality, as some certs are relatively painless to replace and some require kube-apiserver revision rollout. So setting all certs to say 30 mins would break our distruption tests. Also we don't know yet how the certs are interacting and what are the effects of rotations, so I'd rather not dictate exact certificate durations in this enhancement.
That said we do want set an upper limit - if the certificate is rotatable it must be rotated during that test, so cert validity duration is capped at 8 hours (but the less the better as its useful to observe several rotations).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. We can hold off on the configurability of the "short" durations until we feel like we actually need to tune that per test run. Or if bumping the hardcoded rotation periods per component proves to be too unwieldy when iterating in CI.
/lgtm |
See https://issues.redhat.com/browse/API-1688