Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems setting up cloudsql-proxy with Workload identity, although wi-test works #1078

Closed
lenalebt opened this issue Jan 20, 2022 · 2 comments
Assignees
Labels
type: question Request for information or clarification.

Comments

@lenalebt
Copy link

Question

I currently have trouble connecting to my cloudsql instances using workload identity, and I don't understand the error message provided. This is the error I get:

2022/01/20 07:15:11 current FDs rlimit set to 1048576, wanted limit is 8500. Nothing to do here.
2022/01/20 07:15:11 errors parsing config:
	Get "https://sqladmin.googleapis.com/sql/v1beta4/projects/maxxeed/instances/europe-west3~main-dev-testing2/connectSettings?alt=json&prettyPrint=false": metadata: GCE metadata "instance/service-accounts/default/token?scopes=https%!A(MISSING)%!F(MISSING)%!F(MISSING)www.googleapis.com%!F(MISSING)auth%!F(MISSING)sqlservice.admin" not defined

What does that mean exactly? I don't know how to debug this further.

Additional Context

The usual workload-identity tests work, as far as I can tell. I followed the steps in https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity#authenticating_to, including verifying that workload identity works for the set-up service account by running this pod:

apiVersion: v1
kind: Pod
metadata:
  name: workload-identity-test
  namespace: K8S_NAMESPACE
spec:
  containers:
  - image: google/cloud-sdk:slim
    name: workload-identity-test
    command: ["sleep","infinity"]
  serviceAccountName: KSA_NAME

with my KSA_NAME. I even added this container as a sidecar to cloudsql-proxy to test whether some other problem in my configuration of that deployment would cause issues, but I could run that curl command curl -H "Metadata-Flavor: Google" http://169.254.169.254/computeMetadata/v1/instance/service-accounts/, which returned two entries:

default/
projects/foobar/serviceAccounts/cloudsql-proxy-dev@foobar.iam.gserviceaccount.com/

As far as I understand, I should only get one entry here, but I don't understand where the second may come from, and I suspect this may be the problem over here!?

This is the deployment definition that currently is running (and crashing); I extracted it from the cluster and removed a few fields around managedFields and status:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cloudsql-proxy
  namespace: default
  labels:
    app: cloudsql-proxy
    app.kubernetes.io/managed-by: Helm
  annotations:
    meta.helm.sh/release-name: cloudsql-proxy
    meta.helm.sh/release-namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: cloudsql-proxy
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: cloudsql-proxy
      annotations:
        prometheus.io/scrape: 'false'
        sidecar.istio.io/inject: 'false'
    spec:
      containers:
        - name: cloudsql-proxy
          image: eu.gcr.io/cloudsql-docker/gce-proxy:1.28.0
          command:
            - /cloud_sql_proxy
            - '-ip_address_types=PRIVATE'
            - '-instances=foobar:europe-west3:main-dev-testing2=tcp:0.0.0.0:5432'
          ports:
            - containerPort: 5432
              protocol: TCP
          resources:
            limits:
              cpu: 100m
              memory: 300Mi
            requests:
              cpu: 10m
              memory: 300Mi
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          imagePullPolicy: IfNotPresent
          securityContext:
            runAsNonRoot: true
      restartPolicy: Always
      terminationGracePeriodSeconds: 30
      dnsPolicy: ClusterFirst
      serviceAccountName: cloudsql-proxy
      serviceAccount: cloudsql-proxy
      securityContext: {}
      schedulerName: default-scheduler
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
      maxSurge: 1
  progressDeadlineSeconds: 600

Any pointer much appreciated. Sorry if this turns out to be a generic workload identity problem - I find the cloudsql-proxy error message quite confusing though :-/.

@lenalebt lenalebt added the type: question Request for information or clarification. label Jan 20, 2022
@lenalebt
Copy link
Author

After rubberducking through every step, detailed and step-by-step, we found that I annotated the kubernetes service-account wrongly. This is how I should have annotated it:

kubectl annotate serviceaccount cloudsql-proxy default gke.io/gcp-service-account=cloudsql-proxy-dev@foobar.iam.gserviceaccount.com
vs
kubectl annotate serviceaccount cloudsql-proxy default gke.io/gcp-service-account=projects/foobar/serviceAccounts/cloudsql-proxy-dev@foobar.iam.gserviceaccount.com

It was buried under a layer of terraform. Sorry for the interruption, I hope it will help somebody else who is in the same situation later on :)

@enocom
Copy link
Member

enocom commented Jan 20, 2022

Glad you figured it out. Getting workload identity setup is definitely tricky and the proxy's error message is pretty terrible. By the way, we are working on fixing the error messages as part of #872.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: question Request for information or clarification.
Projects
None yet
Development

No branches or pull requests

2 participants