Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datadog admission labels cause failed submission #2367

Open
Marcus-Rosti opened this issue Dec 19, 2024 · 5 comments
Open

Datadog admission labels cause failed submission #2367

Marcus-Rosti opened this issue Dec 19, 2024 · 5 comments
Labels
kind/bug Something isn't working

Comments

@Marcus-Rosti
Copy link

What happened?

Adding

    spark.kubernetes.driver.label.admission.datadoghq.com/enabled: "true"
    spark.kubernetes.driver.annotation.admission.datadoghq.com/java-lib.version: "latest"
    spark.kubernetes.executor.label.admission.datadoghq.com/enabled: "true"
    spark.kubernetes.executor.annotation.admission.datadoghq.com/java-lib.version: "latest"

to spark submission results in this error

24/12/18 23:51:09 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using current context from users K8S config file
24/12/18 23:51:09 INFO KerberosConfDriverFeatureStep: You have not specified a krb5.conf file locally or via a ConfigMap. Make sure that you have the krb5.conf locally on the driver image.
24/12/18 23:51:50 ERROR Client: Please check "kubectl auth can-i create pod" first. It should be yes.
Exception in thread "main" io.fabric8.kubernetes.client.KubernetesClientException: An error has occurred.
  at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:129)
  at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:122)
  at io.fabric8.kubernetes.client.dsl.internal.CreateOnlyResourceOperation.create(CreateOnlyResourceOperation.java:44)
  at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.create(BaseOperation.java:1108)
  at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.create(BaseOperation.java:92)
  at org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:153)
  at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$6(KubernetesClientApplication.scala:256)
  at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$6$adapted(KubernetesClientApplication.scala:250)
  at org.apache.spark.util.SparkErrorUtils.tryWithResource(SparkErrorUtils.scala:48)
  at org.apache.spark.util.SparkErrorUtils.tryWithResource$(SparkErrorUtils.scala:46)
  at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:94)
  at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:250)
  at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:223)
  at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1029)
  at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:194)
  at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:217)
  at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
  at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1120)
  at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1129)
  at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.IOException: Canceled
  at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.waitForResult(OperationSupport.java:515)
  at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleResponse(OperationSupport.java:535)
  at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleCreate(OperationSupport.java:340)
  at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.handleCreate(BaseOperation.java:703)
  at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.handleCreate(BaseOperation.java:92)
  at io.fabric8.kubernetes.client.dsl.internal.CreateOnlyResourceOperation.create(CreateOnlyResourceOperation.java:42)
  ... 17 more
Caused by: java.io.IOException: Canceled
  at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:121)
  at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
  at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
  at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:257)
  at okhttp3.RealCall$AsyncCall.execute(RealCall.java:201)
  at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
  at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
  at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
  at java.base/java.lang.Thread.run(Unknown Source)
24/12/18 23:51:50 INFO ShutdownHookManager: Shutdown hook called
24/12/18 23:51:50 INFO ShutdownHookManager: Deleting directory /tmp/spark-a00e6316-9de2-4bd7-b43a-02dbbf4527c9

Reproduction Code

No response

Expected behavior

No response

Actual behavior

No response

Environment & Versions

  • Kubernetes Version: v1.30.5-gke.1443001
  • Spark Operator Version: 2.1.0
  • Apache Spark Version: 3.5.3

Additional context

I'm following this tutorial: https://docs.datadoghq.com/data_jobs/kubernetes/?tab=datadogoperator

Impacted by this bug?

Give it a 👍 We prioritize the issues with most 👍

@Marcus-Rosti Marcus-Rosti added the kind/bug Something isn't working label Dec 19, 2024
@jacobsalway
Copy link
Member

Are you able to give your full SparkApplication spec? I'm not able to replicate this issue using an example app with the label/annotation sparkConf you've provided. From experience I suspect there are options being provided that are causing an invalid pod specification, which causes the pod creation request inside spark-submit to fail.

@Marcus-Rosti
Copy link
Author

@jacobsalway I'm assuming it's that we haven't enabled the mutating webhook

@jacobsalway
Copy link
Member

jacobsalway commented Jan 8, 2025

Annotations/labels don't need to be patched by the mutating webhook as they are already supported by spark-submit/Spark core. They are passed as configuration options in submission.go. They would still be loaded by spark-submit before driver pod creation even if set via extra sparkConf as you've provided.

I wasn't able to replicate any issues with this specification on the latest release and saw the labels/annotations being added appropriately.

apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
  name: spark-pi
  namespace: default
spec:
  type: Scala
  mode: cluster
  image: spark:3.5.3
  imagePullPolicy: IfNotPresent
  mainClass: org.apache.spark.examples.SparkPi
  mainApplicationFile: local:///opt/spark/examples/jars/spark-examples.jar
  arguments:
  - "5000"
  sparkVersion: 3.5.3
  driver:
    annotations:
      admission.datadoghq.com/java-lib.version: "latest"
    labels:
      version: 3.5.3
      admission.datadoghq.com/enabled: "true"
    cores: 1
    memory: 512m
    serviceAccount: spark-operator-spark
  executor:
    annotations:
      admission.datadoghq.com/java-lib.version: "latest"
    labels:
      version: 3.5.3
      admission.datadoghq.com/enabled: "true"
    instances: 1
    cores: 1
    memory: 512m

@Marcus-Rosti
Copy link
Author

I'll try this today!

@Marcus-Rosti
Copy link
Author

yeah I still get the same error:

25/01/14 21:56:48 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using current context from users K8S config file
25/01/14 21:56:58 INFO KerberosConfDriverFeatureStep: You have not specified a krb5.conf file locally or via a ConfigMap. Make sure that you have the krb5.conf locally on the driver image.
25/01/14 21:57:48 ERROR Client: Please check "kubectl auth can-i create pod" first. It should be yes.
Exception in thread "main" io.fabric8.kubernetes.client.KubernetesClientException: An error has occurred.
  at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:129)
  at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:122)
  at io.fabric8.kubernetes.client.dsl.internal.CreateOnlyResourceOperation.create(CreateOnlyResourceOperation.java:44)
  at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.create(BaseOperation.java:1108)
  at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.create(BaseOperation.java:92)
  at org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:153)
  at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$6(KubernetesClientApplication.scala:256)
  at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$6$adapted(KubernetesClientApplication.scala:250)
  at org.apache.spark.util.SparkErrorUtils.tryWithResource(SparkErrorUtils.scala:48)
  at org.apache.spark.util.SparkErrorUtils.tryWithResource$(SparkErrorUtils.scala:46)
  at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:94)
  at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:250)
  at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:223)
  at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1029)
  at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:194)
  at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:217)
  at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
  at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1120)
  at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1129)
  at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.IOException: Canceled
  at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.waitForResult(OperationSupport.java:515)
  at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleResponse(OperationSupport.java:535)
  at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleCreate(OperationSupport.java:340)
  at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.handleCreate(BaseOperation.java:703)
  at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.handleCreate(BaseOperation.java:92)
  at io.fabric8.kubernetes.client.dsl.internal.CreateOnlyResourceOperation.create(CreateOnlyResourceOperation.java:42)
  ... 17 more
Caused by: java.io.IOException: Canceled
  at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:121)
  at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
  at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
  at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:257)
  at okhttp3.RealCall$AsyncCall.execute(RealCall.java:201)
  at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
  at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
  at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
  at java.base/java.lang.Thread.run(Unknown Source)
25/01/14 21:57:48 INFO ShutdownHookManager: Shutdown hook called
25/01/14 21:57:48 INFO ShutdownHookManager: Deleting directory /tmp/spark-1ae9a31e-02a2-4b7b-ab01-b17c32844d91

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants