Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support per-shard signing keys #2330

Merged
merged 8 commits into from
Jan 17, 2025

Conversation

haydentherapper
Copy link
Contributor

This change enables key rotation with a per-shard
signing key configuration. The LogRanges structure
now holds both active and inactive shards, with the
LogRange structure containing a signer, encoded
public key and log ID based on the public key.

This change is backwards compatible. If no signing
configuration is specified, the active shard
signing configuration is used for all shards.

Minor change: Standardized log ID vs tree ID, where
the former is the pubkey hash and the latter is the
ID for the Trillian tree.

Signed-off-by: Hayden Blauzvern [email protected]

@haydentherapper haydentherapper requested a review from a team as a code owner January 15, 2025 02:14
pkg/sharding/ranges.go Fixed Show fixed Hide fixed
pkg/sharding/ranges.go Fixed Show fixed Hide fixed
Copy link

codecov bot commented Jan 15, 2025

Codecov Report

Attention: Patch coverage is 70.12987% with 46 lines in your changes missing coverage. Please review.

Project coverage is 49.91%. Comparing base (488eb97) to head (54e84b0).
Report is 284 commits behind head on main.

Files with missing lines Patch % Lines
pkg/sharding/ranges.go 78.08% 10 Missing and 6 partials ⚠️
cmd/rekor-cli/app/get.go 33.33% 8 Missing and 4 partials ⚠️
pkg/api/tlog.go 44.44% 5 Missing ⚠️
cmd/rekor-cli/app/upload.go 62.50% 2 Missing and 1 partial ⚠️
cmd/rekor-cli/app/verify.go 40.00% 2 Missing and 1 partial ⚠️
pkg/api/entries.go 83.33% 2 Missing and 1 partial ⚠️
pkg/signer/tink.go 0.00% 3 Missing ⚠️
pkg/api/api.go 90.90% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #2330       +/-   ##
===========================================
- Coverage   66.46%   49.91%   -16.55%     
===========================================
  Files          92      192      +100     
  Lines        9258    24732    +15474     
===========================================
+ Hits         6153    12346     +6193     
- Misses       2359    11288     +8929     
- Partials      746     1098      +352     
Flag Coverage Δ
e2etests 46.55% <55.84%> (-1.01%) ⬇️
unittests 41.96% <40.25%> (-5.72%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

pkg/sharding/ranges.go Dismissed Show dismissed Hide dismissed
pkg/sharding/ranges.go Dismissed Show dismissed Hide dismissed
@haydentherapper
Copy link
Contributor Author

haydentherapper commented Jan 15, 2025

Ready for review. I'm going to keep playing around with making CodeQL happy but I might just dismiss the alert, it's a FP. (Edit: Dismissed)

Copy link
Member

@bobcallaway bobcallaway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a couple comments; i want to think through the rollout implications of this in an environment where you have multiple instances of rekor running in parallel where they each might see a different signing config concurrently.

cmd/rekor-cli/app/get.go Outdated Show resolved Hide resolved
cmd/rekor-cli/app/get.go Outdated Show resolved Hide resolved
pkg/api/entries.go Outdated Show resolved Hide resolved
cmd/rekor-cli/app/get.go Outdated Show resolved Hide resolved
cmd/rekor-cli/app/get.go Outdated Show resolved Hide resolved
@haydentherapper
Copy link
Contributor Author

haydentherapper commented Jan 16, 2025

a couple comments; i want to think through the rollout implications of this in an environment where you have multiple instances of rekor running in parallel where they each might see a different signing config concurrently.

tl;dr - I see no issues.

Walking through a few examples:

Adding missing signing configurations to prod

Note that the active shard configuration is created dynamically, it is not specified in trillian_log_server.sharding_config. It is composed from the tree ID from trillian_log_server.tlog_id and the signer from rekor_server.signer. The inactive shards come from trillian_log_server.sharding_config.

When no signing config in trillian_log_server.sharding_config is specified, the server falls back to the active key. It's safe to have some instances with no signing config specified and some with it specified - in both cases, the key material is the same.

Rolling out a new shard

I've reviewed the sharding procedure and there should be no issue. The only change is when we create an active tree, the sharding config for the CURRENT_TREE_ID must specify a signing config if the key is being rotated.

  1. Spin Rekor down to 1 instance. This prevents race conditions.
  2. rekor.server.config.treeID is set to "", and the createtree job is forced. rekor.server.sharding.contents is updated to include the CURRENT_TREE_ID AND the signing config, which for prod will be a KMS key.
  3. The createjob will populate the configmap rekor-config.treeID.
  4. When Rekor starts up, helm will set the server arg trillian_log_server.tlog_id from the env var TREE_ID which comes from rekor-config.treeID. Rekor will dynamically create the active shard config from tlog_id and signer as noted above. Rekor will also have the signing config for the now-inactive shard.
  5. Update rekor.server.config.treeID to the new tree ID, set createtree to false. Next time Rekor spins up, the configmap rekor-config.treeID will come from rekor.server.config.treeID rather than being set by createtree. 1
  6. Spin Rekor up to 3 instances.
  7. Set updateTree to run, which freezes CURRENT_TREE_ID. This just calls Trillian, Rekor has no knowledge if an inactive shard is frozen or not.
  8. Disable updateTree.

Rolling out a new shard without spinning Rekor down to 1 instance

This would cause issues outside of signing, so this is moot. While some instances continue to write to the old tree, new instances would write to the new tree. As new instances start up, they would have different tree lengths of inactive shards, which would lead to incorrect calculations on global log indices.

For the signing config, there would be no issue. Inactive shards would be specified to use the old signing key, while the new shard would use the KMS key passed as a server arg.

Updating a signing config without sharding

This would never happen, you can't rotate the key of an active or inactive tree without breaking verification.

Footnotes

  1. Somewhat frighteningly, if you don't set rekor.server.config.treeID, Rekor creates a new tree. This is likely used during testing. I think we should gate this behind a flag, because if we accidentally forget to set config.treeID, Rekor will create a new shard. Though, I'm not sure the createtree job is actually necessary, you could just let Rekor generate a new tree, then grab the tree ID and update the configmap value.

haydentherapper and others added 8 commits January 16, 2025 16:46
This change enables key rotation with a per-shard
signing key configuration. The LogRanges structure
now holds both active and inactive shards, with the
LogRange structure containing a signer, encoded
public key and log ID based on the public key.

This change is backwards compatible. If no signing
configuration is specified, the active shard
signing configuration is used for all shards.

Minor change: Standardized log ID vs tree ID, where
the former is the pubkey hash and the latter is the
ID for the Trillian tree.

Signed-off-by: Hayden Blauzvern <[email protected]>
Signed-off-by: Hayden Blauzvern <[email protected]>
Signed-off-by: Hayden Blauzvern <[email protected]>
Signed-off-by: Hayden Blauzvern <[email protected]>
Signed-off-by: Hayden Blauzvern <[email protected]>
Co-authored-by: Bob Callaway <[email protected]>
Signed-off-by: Hayden B <[email protected]>
@haydentherapper haydentherapper merged commit f875aa2 into sigstore:main Jan 17, 2025
15 checks passed
@haydentherapper haydentherapper deleted the shard-privkey-map branch January 17, 2025 15:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants