Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Short Rotation Period For Certificates #1670

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

vrutkovs
Copy link
Member

@vrutkovs vrutkovs commented Aug 28, 2024

@vrutkovs
Copy link
Member Author

/cc @tkashem @p0lyn0mial

for LGTM

/cc @deads2k

for approval

@vrutkovs vrutkovs force-pushed the short-rotation-featuregate branch 2 times, most recently from ce017fe to ccd17d3 Compare September 5, 2024 14:31
@sanchezl
Copy link
Contributor

/retest

@vrutkovs vrutkovs force-pushed the short-rotation-featuregate branch from ccd17d3 to 83e6af9 Compare October 15, 2024 15:33
@tkashem
Copy link
Contributor

tkashem commented Oct 15, 2024

/lgtm

(on the provision that there is no viable alternate method to expose this configuration exclusively for CI and testing, and not making it visible to the customer)

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 15, 2024

* Change validity duration for existing certificates

## Proposal
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Details are scant. To achieve the same benefit, this featuregate needs to be enabled in Default during pre-RC builds and moved to TechPreview after RC.0. Can we get this spelled out?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There wasn't any benefit, "developer branch fast rotation" has never worked in CI - see https://github.com/openshift/enhancements/pull/1670/files#diff-1695a5e93f0f7e139919d0e0fac08ce0ea6d442932ff1cc7fab792ef1c616cf1R31-R35

Added more details on how this will be tested in CI and promotion criteria

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There wasn't any benefit, "developer branch fast rotation" has never worked in CI - see https://github.com/openshift/enhancements/pull/1670/files#diff-1695a5e93f0f7e139919d0e0fac08ce0ea6d442932ff1cc7fab792ef1c616cf1R31-R35

Added more details on how this will be tested in CI and promotion criteria

The point is not coverage in CI. The point is every pre-production cluster in existence that runs longer than a day. Which covers a wide variety of clusters internal to this group and external to this group including education, TAMs, test platform, and various test environments. Losing this capability is enormous. CI was an objective, but we gained tremendous benefit without ever having a rotation in CI.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It never materialized into bugs though (especially into availability tracking), has it? Testing fast rotation in CI however will have the same effect alongside with a better set of debugging information.

In any case, this featuregate doesn't forbid specific components to do pre-release shorter cert rotations

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added clarification that existing devrotation code can be used alongside short rotation featuregate

@sjenning
Copy link
Contributor

sjenning commented Oct 15, 2024

To achieve the same benefit, this featuregate needs to be enabled in Default during pre-RC builds and moved to TechPreview after RC.0.

If this is a requirement, I'm not sure that this enhancement gets us anywhere. We still have to open a PR per release to turn it off, just in openshift/api, rather than openshift/cluster-kube-apiserver-operator.

xref past KASO pre-release PRs
https://github.com/openshift/cluster-kube-apiserver-operator/pulls?q=is%3Apr+is%3Aclosed+dev+cert+rotation

So... I guess that means I'm 👎 on this.

@vrutkovs vrutkovs force-pushed the short-rotation-featuregate branch from 83e6af9 to b2d3cc3 Compare October 16, 2024 07:01
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Oct 16, 2024
@openshift-bot
Copy link

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 15, 2024
@vrutkovs
Copy link
Member Author

/remove-lifecycle stale

@openshift-ci openshift-ci bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 15, 2024
@openshift-bot
Copy link

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 13, 2024
@openshift-bot
Copy link

Stale enhancement proposals rot after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Rotten proposals close after an additional 7d of inactivity.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 21, 2024
@vrutkovs
Copy link
Member Author

/remove-lifecycle rotten

@openshift-ci openshift-ci bot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Dec 21, 2024
@vrutkovs vrutkovs force-pushed the short-rotation-featuregate branch from b2d3cc3 to db3592f Compare January 15, 2025 08:36
Copy link
Contributor

openshift-ci bot commented Jan 15, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from tkashem. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@vrutkovs vrutkovs force-pushed the short-rotation-featuregate branch from db3592f to 397766a Compare January 15, 2025 14:29
Copy link
Contributor

openshift-ci bot commented Jan 15, 2025

@vrutkovs: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-bot
Copy link

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 13, 2025
@vrutkovs
Copy link
Member Author

/remove-lifecycle stale

@openshift-ci openshift-ci bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 13, 2025
### Goals

* Create a new FeatureGate in DevPreview featureset
* Each component can decide the new duration for certificates separately.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a specific reason for leaving this up to the components?

The dev flag already means an upper bound on the rotation time:
i.e:

Test Plan
observe the cluster for 6-8 hours

So within 6 hrs all of the components of interest should have cycled through a rotation.
So why don't we just explicitly dictate the duration (e.g 3hrs) for the relevant components?

I'm guessing we likely don't want all components to rotate at the same time (although not sure if we expect to tolerate any API disruption from these rotations).
But if the motivation for letting components choose, is to stagger the rotations, then they could still overlap rotations if they end up choosing the same "shortened" duration or change it later for whatever reason.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So why don't we just explicitly dictate the duration (e.g 3hrs) for the relevant components?

Cert durations depend on functionality, as some certs are relatively painless to replace and some require kube-apiserver revision rollout. So setting all certs to say 30 mins would break our distruption tests. Also we don't know yet how the certs are interacting and what are the effects of rotations, so I'd rather not dictate exact certificate durations in this enhancement.

That said we do want set an upper limit - if the certificate is rotatable it must be rotated during that test, so cert validity duration is capped at 8 hours (but the less the better as its useful to observe several rotations).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. We can hold off on the configurability of the "short" durations until we feel like we actually need to tune that per test run. Or if bumping the hardcoded rotation periods per component proves to be too unwieldy when iterating in CI.

@hasbro17
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Feb 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants