Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 41 additions & 0 deletions .ci/recreate_dataproc_cluster.cloudbuild.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# Copyright 2026 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# NOTE: The _CLUSTER_NAME substitution variable defined here must match the
# DATAPROC_LIST_JOBS_CLUSTER parameter defined in .ci/integration.cloudbuild.yaml
# (default: cluster-36).

steps:
- id: "recreate-dataproc-cluster"
name: "gcr.io/cloud-builders/gcloud:latest"
env:
- "PROJECT_ID=$PROJECT_ID"
- "CLUSTER_NAME=$_CLUSTER_NAME"
- "REGION=$_REGION"
- "IMAGE_VERSION=$_IMAGE_VERSION"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- "IMAGE_VERSION=$_IMAGE_VERSION"
- "IMAGE_VERSION=$_IMAGE_VERSION"
- "SERVICE_ACCOUNT_EMAIL=$SERVICE_ACCOUNT_EMAIL"

script: |
#!/usr/bin/env bash
bash .ci/recreate_dataproc_cluster.sh "$${PROJECT_ID}" "$${REGION}" "$${IMAGE_VERSION}" "$${CLUSTER_NAME}"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then pass in the service account here.


options:
automapSubstitutions: true
dynamicSubstitutions: true
logging: CLOUD_LOGGING_ONLY
pool:
name: projects/$PROJECT_ID/locations/us-central1/workerPools/integration-testing

substitutions:
_CLUSTER_NAME: "cluster-36"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use a more descriptive name like dataproc-testing-cluster?

_REGION: "us-central1"
_IMAGE_VERSION: "2.3-debian12"
74 changes: 74 additions & 0 deletions .ci/recreate_dataproc_cluster.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
#!/usr/bin/env bash
# Copyright 2026 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

set -eo pipefail

if [ $# -lt 4 ]; then
echo "Error: Missing required arguments." >&2
echo "Usage: $0 <project_id> <region> <image_version> <cluster_name>" >&2
exit 1
fi

PROJECT_ID="$1"
REGION="$2"
IMAGE_VERSION="$3"
CLUSTER_NAME="$4"

SERVICE_ACCOUNT="toolbox-identity@${PROJECT_ID}.iam.gserviceaccount.com"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should remove this. Service account is automatically injected to the env var by cloud build. We can retrieve it directly in the step and pass it to the script:

- "SERVICE_ACCOUNT_EMAIL=$SERVICE_ACCOUNT_EMAIL"


echo "=========================================================="
echo "Recreating Dataproc cluster in project: ${PROJECT_ID}"
echo "Region: ${REGION}"
echo "Image Version: ${IMAGE_VERSION}"
echo "Cluster Name: ${CLUSTER_NAME}"
echo "Service Account: ${SERVICE_ACCOUNT}"
echo "=========================================================="

# Check if the cluster exists, capturing stdout and stderr to distinguish NOT_FOUND from other errors
echo "Checking if cluster '${CLUSTER_NAME}' exists..."
set +e
DESCRIBE_OUT=$(gcloud dataproc clusters describe "${CLUSTER_NAME}" --region="${REGION}" --project="${PROJECT_ID}" 2>&1)
DESCRIBE_STATUS=$?
set -e

if [ ${DESCRIBE_STATUS} -eq 0 ]; then
echo "Cluster '${CLUSTER_NAME}' exists. Deleting it..."
gcloud dataproc clusters delete "${CLUSTER_NAME}" \
--region="${REGION}" \
--project="${PROJECT_ID}" \
--quiet
echo "Cluster '${CLUSTER_NAME}' deleted successfully."
elif echo "${DESCRIBE_OUT}" | grep -q "NOT_FOUND"; then
echo "Cluster '${CLUSTER_NAME}' does not exist. Skipping deletion."
else
echo "Error querying cluster existence: ${DESCRIBE_OUT}" >&2
exit ${DESCRIBE_STATUS}
fi
Comment on lines +41 to +58

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If a previous run of this script was cancelled or failed during deletion, the cluster might be left in a DELETING state. Running gcloud dataproc clusters delete on a cluster that is already deleting will fail, causing this script to exit with an error.

We can make this more robust by querying the cluster's state using --format="value(status.state)". If the state is DELETING, we can poll and wait for the deletion to complete before proceeding to the creation step.

set +e
DESCRIBE_OUT=$(gcloud dataproc clusters describe "${CLUSTER_NAME}" --region="${REGION}" --project="${PROJECT_ID}" --format="value(status.state)" 2>&1)
DESCRIBE_STATUS=$?
set -e

if [ ${DESCRIBE_STATUS} -eq 0 ]; then
  STATE="${DESCRIBE_OUT}"
  if [ "${STATE}" = "DELETING" ]; then
    echo "Cluster '${CLUSTER_NAME}' is already being deleted. Waiting for deletion to complete..."
    while gcloud dataproc clusters describe "${CLUSTER_NAME}" --region="${REGION}" --project="${PROJECT_ID}" &>/dev/null; do
      sleep 10
    done
    echo "Cluster '${CLUSTER_NAME}' deleted successfully."
  else
    echo "Cluster '${CLUSTER_NAME}' exists in state '${STATE}'. Deleting it..."
    gcloud dataproc clusters delete "${CLUSTER_NAME}" \
      --region="${REGION}" \
      --project="${PROJECT_ID}" \
      --quiet
    echo "Cluster '${CLUSTER_NAME}' deleted successfully."
  fi
elif echo "${DESCRIBE_OUT}" | grep -q "NOT_FOUND"; then
  echo "Cluster '${CLUSTER_NAME}' does not exist. Skipping deletion."
else
  echo "Error querying cluster existence: ${DESCRIBE_OUT}" >&2
  exit ${DESCRIBE_STATUS}
fi


# Create the cluster
echo "Creating Dataproc cluster '${CLUSTER_NAME}'..."
gcloud dataproc clusters create "${CLUSTER_NAME}" \
--region="${REGION}" \
--project="${PROJECT_ID}" \
--image-version="${IMAGE_VERSION}" \
--service-account="${SERVICE_ACCOUNT}" \
--scopes=cloud-platform \
--no-address \
--network=default \
--master-machine-type=n4-standard-2 \
--worker-machine-type=n4-standard-2 \
--num-workers=2

echo "Cluster '${CLUSTER_NAME}' created successfully."