Skip to content

Conversation

@ArkaSaha30
Copy link
Contributor

@ArkaSaha30 ArkaSaha30 commented Oct 23, 2025

This PR will add AutoProvider certificate manager as a part of generating default self-signed certificates:

  • interface functions
  • reconciler logic
  • e2e tests
etcd-operator on  auto-cert-provider [$?] via 🐳 colima via 🐹 v1.25.3 on ☁️  (us-east-1) on ☁️  [email protected] 
❯ k get pods
NAME                   READY   STATUS    RESTARTS   AGE
etcdcluster-sample-0   1/1     Running   0          34m
etcdcluster-sample-1   1/1     Running   0          34m
etcdcluster-sample-2   1/1     Running   0          33m


etcd-operator on  auto-cert-provider [$?] via 🐳 colima via 🐹 v1.25.3 on ☁️  (us-east-1) on ☁️  [email protected] 
❯ k describe pod etcdcluster-sample-0  
Name:             etcdcluster-sample-0
Namespace:        default
Priority:         0
Service Account:  default
Node:             kind-control-plane/172.18.0.2
Start Time:       Mon, 27 Oct 2025 23:47:43 +0530
Labels:           app=etcdcluster-sample
                  apps.kubernetes.io/pod-index=0
                  controller=etcdcluster-sample
                  controller-revision-hash=etcdcluster-sample-754c4cfbcd
                  statefulset.kubernetes.io/pod-name=etcdcluster-sample-0
Annotations:      <none>
Status:           Running
IP:               10.244.0.9
IPs:
  IP:           10.244.0.9
Controlled By:  StatefulSet/etcdcluster-sample
Containers:
  etcd:
    Container ID:  containerd://faa47a05746374496ff1b3d0147a93efc4d72bc07572e68183df1df9decdcdbf
    Image:         gcr.io/etcd-development/etcd:v3.5.21
    Image ID:      gcr.io/etcd-development/etcd@sha256:fd158fbe55240e252947bbd2e8dddc217997ff43978071fac2bd202b6ad15c03
    Ports:         2379/TCP, 2380/TCP
    Host Ports:    0/TCP, 0/TCP
    Command:
      /usr/local/bin/etcd
    Args:
      --name=$(POD_NAME)
      --listen-peer-urls=http://0.0.0.0:2380
      --listen-client-urls=http://0.0.0.0:2379
      --initial-advertise-peer-urls=http://$(POD_NAME).etcdcluster-sample.$(POD_NAMESPACE).svc.cluster.local:2380
      --advertise-client-urls=http://$(POD_NAME).etcdcluster-sample.$(POD_NAMESPACE).svc.cluster.local:2379
    State:          Running
      Started:      Mon, 27 Oct 2025 23:47:52 +0530
    Ready:          True
    Restart Count:  0
    Environment Variables from:
      etcdcluster-sample-state  ConfigMap  Optional: false
    Environment:
      POD_NAME:       etcdcluster-sample-0 (v1:metadata.name)
      POD_NAMESPACE:  default (v1:metadata.namespace)
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-kwksc (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True 
  Initialized                 True 
  Ready                       True 
  ContainersReady             True 
  PodScheduled                True 
Volumes:
  server-secret:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  etcdcluster-sample-server-tls
    Optional:    false
  peer-secret:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  etcdcluster-sample-peer-tls
    Optional:    false
  kube-api-access-kwksc:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  34m   default-scheduler  Successfully assigned default/etcdcluster-sample-0 to kind-control-plane
  Normal  Pulling    34m   kubelet            Pulling image "gcr.io/etcd-development/etcd:v3.5.21"
  Normal  Pulled     34m   kubelet            Successfully pulled image "gcr.io/etcd-development/etcd:v3.5.21" in 8.646s (8.646s including waiting). Image size: 20653564 bytes.
  Normal  Created    34m   kubelet            Created container: etcd
  Normal  Started    34m   kubelet            Started container etcd

etcd-operator on  auto-cert-provider [$?] via 🐳 colima via 🐹 v1.25.3 on ☁️  (us-east-1) on ☁️  [email protected] 
❯ k get secrets
NAME                            TYPE                DATA   AGE
etcdcluster-sample-client-tls   kubernetes.io/tls   3      35m
etcdcluster-sample-peer-tls     kubernetes.io/tls   3      35m
etcdcluster-sample-server-tls   kubernetes.io/tls   3      35m

etcd-operator on  auto-cert-provider [$?] via 🐳 colima via 🐹 v1.25.3 on ☁️  (us-east-1) on ☁️  [email protected] 
❯ k describe secret etcdcluster-sample-client-tls
Name:         etcdcluster-sample-client-tls
Namespace:    default
Labels:       <none>
Annotations:  <none>

Type:  kubernetes.io/tls

Data
====
ca.crt:   769 bytes
tls.crt:  769 bytes
tls.key:  365 bytes

etcd-operator on  auto-cert-provider [$?] via 🐳 colima via 🐹 v1.25.3 on ☁️  (us-east-1) on ☁️  [email protected] 
❯ k describe secret etcdcluster-sample-peer-tls  
Name:         etcdcluster-sample-peer-tls
Namespace:    default
Labels:       <none>
Annotations:  <none>

Type:  kubernetes.io/tls

Data
====
tls.crt:  774 bytes
tls.key:  365 bytes
ca.crt:   774 bytes

etcd-operator on  auto-cert-provider [$?] via 🐳 colima via 🐹 v1.25.3 on ☁️  (us-east-1) on ☁️  [email protected] 
❯ k describe secret etcdcluster-sample-server-tls
Name:         etcdcluster-sample-server-tls
Namespace:    default
Labels:       <none>
Annotations:  <none>

Type:  kubernetes.io/tls

Data
====
ca.crt:   769 bytes
tls.crt:  769 bytes
tls.key:  365 bytes

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ArkaSaha30
Once this PR has been reviewed and has the lgtm label, please assign ahrtr for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ArkaSaha30 ArkaSaha30 force-pushed the auto-cert-provider branch 3 times, most recently from f819caf to 82651bc Compare October 27, 2025 18:17
@ArkaSaha30 ArkaSaha30 changed the title [WIP] Add AutoProvider interface functions Add AutoProvider interface functions Oct 27, 2025
@ArkaSaha30 ArkaSaha30 requested review from ahrtr and ivanvc October 27, 2025 18:49
@ahrtr
Copy link
Member

ahrtr commented Oct 28, 2025

@abdurrehman107 @ballista01 @ivanvc @hakman PTAL.

Reviewing this PR is also an opportunity to get familiar with the certificate design & implementation.

Reference: https://docs.google.com/document/d/1k8os77Hqxe36nUaZ31YqBGBuXcgdozVPppJYOfwUDFA/edit?tab=t.0

@ahrtr
Copy link
Member

ahrtr commented Oct 28, 2025

cc @frederiko

@ArkaSaha30 ArkaSaha30 changed the title Add AutoProvider interface functions Add AutoProvider certificate manager Oct 28, 2025
return nil, fmt.Errorf("failed to get certificate: %w", err)
}

return &interfaces.Config{}, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should get the info from the certificate included in the secret.

Copy link
Contributor Author

@ArkaSaha30 ArkaSaha30 Nov 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ahrtr , I explored getting the original config from the secret itself but it will be a complicated one since we are manipulating the config and passing it to transport.SelfCert() which in turn creats the certificate.
Another workaround is to retrieve the certificateConfig from the etcdCluster object via the certificateSecret.ownerRef.Name, but it will cause a circular dependency

What can be the suggested solution here?
cc @ivanvc

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but it will be a complicated one

Where is the complication? I don't see a PoC so far, nor a reasonable explanation.

The easiest solution is to just save the config passed to EnsureCertificateSecret, and return it in GetCertificateConfig. The problem is that it isn't the final real configuration that we used to create the certificate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I have updated the GetCertificateConfig to return the saved user-defined config

Copy link
Member

@ivanvc ivanvc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pull request, Arka. I left some observations, comments, and suggestions.

@ArkaSaha30 ArkaSaha30 force-pushed the auto-cert-provider branch 4 times, most recently from 9d90c9e to e2c11fd Compare October 30, 2025 09:56
@ArkaSaha30
Copy link
Contributor Author

/test pull-etcd-operator-test-e2e

@ivanvc
Copy link
Member

ivanvc commented Nov 9, 2025

@ArkaSaha30, where do we stand in this pull request? Is this ready for another review?

This commit will add the initial scaffolding for AutoProvider.

Signed-off-by: ArkaSaha30 <[email protected]>
Copy link

@neolit123 neolit123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the updates
generally LGTM, just a few minor comments from me

autoConfig := ec.Spec.TLS.ProviderCfg.AutoCfg
duration, err := time.ParseDuration(autoConfig.ValidityDuration)
if err != nil {
log.Printf("Failed to parse ValidityDuration: %s", err)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it return an error here?

log.Printf("Failed to parse ValidityDuration: %s", err)
}

var getAltNames certInterface.AltNames

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just call it altNames


func createCertificate(ec *ecv1alpha1.EtcdCluster, ctx context.Context, c client.Client, certName string) error {
cert, certErr := certificate.NewProvider(certificate.ProviderType(ec.Spec.TLS.Provider), c)
// tls field is present but spec is empty

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// tls field is present but spec is empty
// The TLS field is present but spec is empty

func createClientCertificate(ctx context.Context, ec *ecv1alpha1.EtcdCluster, c client.Client, scheme *runtime.Scheme) error {
certName := getClientCertName(ec.Name)
createClientCertErr := createCertificate(ec, ctx, c, certName)
patchCertErr := patchCertificateSecret(ctx, ec, c, scheme, certName)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just use a single err and check it on every stage.

Comment on lines +691 to +693
patchCertErr := patchCertificateSecret(ctx, ec, c, scheme, serverCertName)
if patchCertErr != nil {
return fmt.Errorf("patching certificate secret: %s with ownerReference failed: %w", serverCertName, patchCertErr)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comments as above


func (ac *Provider) EnsureCertificateSecret(ctx context.Context, secretName, namespace string,
cfg *interfaces.Config) error {
// save the user defined AutoConfig so that it can be returned from GetCertificateConfig

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// save the user defined AutoConfig so that it can be returned from GetCertificateConfig
// Save the user defined AutoConfig so that it can be returned from GetCertificateConfig

if err := controllerutil.SetControllerReference(ec, getCertSecret, scheme); err != nil {
return err
}
if updateErr := c.Update(ctx, getCertSecret); updateErr != nil {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

err

}
}

log.Printf("hosts: %v", hosts)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add a more detailed log msg, like "calling SelfCert with hosts: ...`


caPEM, err := os.ReadFile(caPath)
if err != nil || len(caPEM) == 0 {
// use certPEM when CA file is not found or empty

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// use certPEM when CA file is not found or empty
// Use certPEM when CA file is not found or empty


// Validate cert and key pair
if _, err := tls.X509KeyPair(certPEM, keyPEM); err != nil {
return fmt.Errorf("generated keypair invalid: %w", err)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return fmt.Errorf("generated keypair invalid: %w", err)
return fmt.Errorf("generated keypair is invalid: %w", err)

// Ensure the operator has TLS credentials when the cluster requests TLS.
if ec.Spec.TLS != nil {
if err := createClientCertificate(ctx, ec, r.Client); err != nil {
if err := createClientCertificate(ctx, ec, r.Client, r.Scheme); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r.Client also has the Scheme embedded. Can't that be used instead of passing Scheme through the call chain?


func patchCertificateSecret(ctx context.Context, ec *ecv1alpha1.EtcdCluster, c client.Client, scheme *runtime.Scheme, certSecretName string) error {
getCertSecret := &corev1.Secret{}
getErr := c.Get(ctx, client.ObjectKey{Name: certSecretName, Namespace: ec.Namespace}, getCertSecret)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, the Get call and the error check can be folded to the same block.


var userAutoConfig *interfaces.Config

func (ac *Provider) EnsureCertificateSecret(ctx context.Context, secretName, namespace string,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whenever the "name, namespace" pattern is used, an object key can be used. This ensures consistency. See github.com/kubernetes-sigs/controller-runtime/blob/main/pkg/client/example_test.go for example.

Suggested change
func (ac *Provider) EnsureCertificateSecret(ctx context.Context, secretName, namespace string,
func (ac *Provider) EnsureCertificateSecret(ctx context.Context, secretKey ctrlruntimeclient.ObjectKey

return interfaces.ErrTLSCert
}

decodeCertificatePem, _ := pem.Decode(certificateData)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we not need an err check?

Comment on lines +140 to +146
err := ac.Get(ctx, client.ObjectKey{Name: secretName, Namespace: namespace}, &secret)
if k8serrors.IsNotFound(err) {
return nil
}
if err != nil {
return err
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
err := ac.Get(ctx, client.ObjectKey{Name: secretName, Namespace: namespace}, &secret)
if k8serrors.IsNotFound(err) {
return nil
}
if err != nil {
return err
}
if err := ac.Get(ctx, client.ObjectKey{Name: secretName, Namespace: namespace}, &secret); err != nil {
if k8serrors.IsNotFound(err) {
return nil
}
return err
}


// parsePrivateKey parses the private key from the PEM-encoded data.
func parsePrivateKey(privateKeyData []byte) (crypto.PrivateKey, error) {
block, _ := pem.Decode(privateKeyData)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question as above

Comment on lines +178 to +182
if err != nil {
if err != nil {
return nil, fmt.Errorf("failed to parse private key: %w", err)
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if err != nil {
if err != nil {
return nil, fmt.Errorf("failed to parse private key: %w", err)
}
}
if err != nil {
return nil, fmt.Errorf("failed to parse private key: %w", err)
}

@ahrtr
Copy link
Member

ahrtr commented Dec 2, 2025

@ArkaSaha30 pls feedback after you resolve all the review comments, thx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants