Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sdk tests with papermill #2448

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

yehudit1987
Copy link

What this PR does / why we need it:
This PR creates E2E tests for katib examples to run with papermill.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #2417

Checklist:

  • Docs included if any changes are user facing

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@Electronic-Waste
Copy link
Member

/rerun-all

@Electronic-Waste
Copy link
Member

Electronic-Waste commented Oct 28, 2024

@yehudit1987 Can you please fix these CI errors?

@Electronic-Waste
Copy link
Member

@yehudit1987 Can you sign your commits with git commit -s? The DCO checks failed due to this reason.

@Electronic-Waste
Copy link
Member

FYI, you can check this reference: https://github.com/kubeflow/katib/pull/2448/checks?check_run_id=32215445282

@yehudit1987 yehudit1987 force-pushed the sdk-tests-with-papermill branch from 963d367 to 6633aa5 Compare October 29, 2024 19:29
@yehudit1987 yehudit1987 marked this pull request as ready for review October 29, 2024 19:31
@Electronic-Waste
Copy link
Member

/rerun-all

2 similar comments
@YosiElias
Copy link
Member

/rerun-all

@Electronic-Waste
Copy link
Member

/rerun-all

Copy link
Member

@Electronic-Waste Electronic-Waste left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your great contributions @yehudit1987! 🎉

I left some reviews for you, excluding notebooks. Will soon review other files :)

Btw, @andreyvelich @tenzen-y are busy with other projects now and will be back in the middle of November. Your PR will be merged then.

@@ -0,0 +1,28 @@
name: Run e2e sdk tests with papermill
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
name: Run e2e sdk tests with papermill
name: E2E Tests with Notebooks

I guess it will be better to make the testcase's name consistent with others :)

cancel-in-progress: true

jobs:
create-katib-notebooks-test:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
create-katib-notebooks-test:
e2e:

Comment on lines 93 to 103
# Loop through each algorithm in the array
for algorithm_name in "${ALGORITHM_ARRAY[@]}"; do
suggestion_image_name="$(algorithm_name=$algorithm_name yq eval '.runtime.suggestions.[] | select(.algorithmName == env(algorithm_name)) | .image' \
manifests/v1beta1/installs/katib-standalone/katib-config.yaml | cut -d: -f1)"
suggestion_name="$(basename "$suggestion_image_name")"
suggestions+=("$suggestion_name")
done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this loop is redundant with the loop in front of it:

# Search for Suggestion Images required for Trial.
for exp_name in "${EXPERIMENT_ARRAY[@]}"; do
exp_path=$(find examples/v1beta1 -name "${exp_name}.yaml")
algorithm_name="$(yq eval '.spec.algorithm.algorithmName' "$exp_path")"
suggestion_image_name="$(algorithm_name=$algorithm_name yq eval '.runtime.suggestions.[] | select(.algorithmName == env(algorithm_name)) | .image' \
manifests/v1beta1/installs/katib-standalone/katib-config.yaml | cut -d: -f1)"
suggestion_name="$(basename "$suggestion_image_name")"
suggestions+=("$suggestion_name")
done

Can we combine these two loops into a unified one by using the ALGORITHM parameters with other e2e tests.

WDYT👀 @yehudit1987 @kubeflow/wg-automl-leads

echo "Papermill failed for notebook: $NOTEBOOK"
exit 1
}
done
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
done
done

A missing new line here

@@ -172,4 +182,4 @@ fi
echo -e "\nCleanup Build Cache...\n"
docker buildx prune -f

echo -e "\nAll Katib images with ${TAG} tag have been built successfully!\n"
echo -e "\nAll Katib images with ${TAG} tag have been built successfully!\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
echo -e "\nAll Katib images with ${TAG} tag have been built successfully!\n"
echo -e "\nAll Katib images with ${TAG} tag have been built successfully!\n"

kubectl create namespace kubeflow-user-example-com
fi

exit 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
exit 0
exit 0


echo "Start to setup Minikube Kubernetes Cluster"
kubectl version
kubectl cluster-info
kubectl get nodes

echo "Build and Load container images"
./build-load.sh "$DEPLOY_KATIB_UI" "$TUNE_API" "$TRIAL_IMAGES" "$EXPERIMENTS"
./build-load.sh "$DEPLOY_KATIB_UI" "$TUNE_API" "$TRIAL_IMAGES" "$EXPERIMENTS" "$ALGORITHMS"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
./build-load.sh "$DEPLOY_KATIB_UI" "$TUNE_API" "$TRIAL_IMAGES" "$EXPERIMENTS" "$ALGORITHMS"
./build-load.sh "$DEPLOY_KATIB_UI" "$TUNE_API" "$TRIAL_IMAGES" "$EXPERIMENTS" "$ALGORITHMS"

Comment on lines 33 to 35
- name: Setup Minikube Cluster
shell: bash
run: ./test/e2e/v1beta1/scripts/gh-actions/setup-minikube.sh true true "" "" "cmaes"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we reuse template-setup-e2e-test? I guess it will be better if we make full use of the existing template :)

Wait for you thoughts👀 @yehudit1987 @kubeflow/wg-automl-leads

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That template being use as a pre template to template-e2e-test. We are using the second one for running yaml experiments by calling a shell script that calls a python script. In our case we just need to add to the job a step that run the notebook directly with papermill. I guess we can use template-setup-e2e-test but it will not prevent us from using the new one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM:)

@yehudit1987 yehudit1987 marked this pull request as draft November 3, 2024 10:58
@yehudit1987 yehudit1987 marked this pull request as ready for review November 3, 2024 11:16
Copy link
Member

@Electronic-Waste Electronic-Waste left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late response @yehudit1987. I left a few comments for you.

And I'm busy with my works now. I'll give reviews on Notebooks later:)

Comment on lines 39 to 48
if [ -x "$(command -v apt-get)" ]; then
echo "Upgrading Podman using apt-get..."
sudo apt-get update
sudo apt-get install -y podman
elif [ -x "$(command -v dnf)" ]; then
echo "Upgrading Podman using dnf..."
sudo dnf upgrade podman -y
else
echo "Package manager not found. Skipping upgrade."
fi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please tell me why we need to use podman?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be better to change the dir name from template-notebook-test to template-e2e-notebook-test to be consistent with other dirs:)

@Electronic-Waste
Copy link
Member

/rerun-all

@yehudit1987 yehudit1987 marked this pull request as draft November 5, 2024 09:06
@yehudit1987 yehudit1987 marked this pull request as ready for review November 5, 2024 13:00
@Electronic-Waste
Copy link
Member

/rerun-all

@andreyvelich
Copy link
Member

Hi @yehudit1987, do you have time to finalize this PR ?
@saileshd1402 implemented tests as part of this PR: kubeflow/trainer#2274, so we can use the same script for Katib.

@yehudit1987
Copy link
Author

Hi @andreyvelich, yes I have been waited for your approval regarding the script decisions as mentioned above. I will finalize this PR.

@yehudit1987 yehudit1987 force-pushed the sdk-tests-with-papermill branch from 2c0ce60 to 59af784 Compare January 23, 2025 07:58
@yehudit1987 yehudit1987 reopened this Jan 23, 2025
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign tenzen-y for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

1 similar comment
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign tenzen-y for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot added size/XL and removed size/XS labels Jan 23, 2025
Yehudit Kerido added 2 commits January 23, 2025 10:18
Signed-off-by: Yehudit Kerido <[email protected]>
Signed-off-by: Yehudit Kerido <[email protected]>
@yehudit1987 yehudit1987 force-pushed the sdk-tests-with-papermill branch from 70d149a to 683608f Compare January 23, 2025 08:19
@yehudit1987 yehudit1987 marked this pull request as ready for review January 23, 2025 08:35
Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for doing this, I left initial comments.
/assign @Electronic-Waste @helenxie-bit @shashank-iitbhu @kubeflow/wg-training-leads @saileshd1402 Please help with review

@@ -0,0 +1,28 @@
name: E2E Tests with Notebooks
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's call this file:

e2e-test-notebooks.yaml

Comment on lines 3 to 6
on:
push: {}
pull_request: {}
workflow_dispatch: {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
on:
push: {}
pull_request: {}
workflow_dispatch: {}
on:
- pull_request

@@ -0,0 +1,54 @@
name: Notebook test template
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need template if we keep all Notebooks test jobs in the single file.

Comment on lines 29 to 37
- name: Setup Minikube Cluster
uses: medyagh/[email protected]
with:
network-plugin: cni
cni: flannel
driver: none
kubernetes-version: v1.29.2
minikube-version: 1.34.0
start-args: --wait-timeout=120s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can re-use setup-e2e-test template for that, similar to this:

- name: Setup Test Env
uses: ./.github/workflows/template-setup-e2e-test
with:
kubernetes-version: ${{ matrix.kubernetes-version }}
python-version: "3.10"


- name: Setup Minikube
shell: bash
run: ./test/e2e/v1beta1/scripts/gh-actions/setup-minikube.sh true true "" "" "cmaes"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we re-use this template for this action, but passing the appropriate values to it:
https://github.com/kubeflow/katib/blob/683608f6a61e7b10218b7084f310af42334d8e65/.github/workflows/template-e2e-test/action.yaml
We can have one more input for notebooks tests that triggers run-notebook.sh.

echo "Options:"
echo " -i Input notebook (required)"
echo " -o Output notebook (required)"
echo " -k Kubeflow Training Operator Python SDK (optional)"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Katib SDK

case "$opt" in
i) NOTEBOOK_INPUT="$OPTARG" ;; # -i for notebook input path
o) NOTEBOOK_OUTPUT="$OPTARG" ;; # -o for notebook output path
k) TRAINING_PYTHON_SDK="$OPTARG" ;; # -k for training operator python sdk
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Katib SDK

exit 1
fi

papermill_cmd="papermill $NOTEBOOK_INPUT $NOTEBOOK_OUTPUT -p training_python_sdk $TRAINING_PYTHON_SDK -p namespace $NAMESPACE"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should install Katib SDK as a first cell in the Notebooks, similar to this:
https://github.com/kubeflow/trainer/blob/release-1.9/examples/pytorch/image-classification/create-pytorchjob.ipynb

exit 1
fi

echo "Notebook execution completed successfully"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
echo "Notebook execution completed successfully"
echo "Notebook execution completed successfully"

shell: bash
run: |
python -m pip install --upgrade pip
pip install papermill kubeflow-katib jupyter ipykernel
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kubeflow-katib SDK should be installed from source.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @andreyvelich, thanks for the review!
Great points, I’ve fixed them. Let me know if anything else is needed.

Signed-off-by: Yehudit Kerido <[email protected]>
Signed-off-by: Yehudit Kerido <[email protected]>
@yehudit1987 yehudit1987 marked this pull request as ready for review February 17, 2025 15:03
@andreyvelich
Copy link
Member

/ok-to-test
/rerun-all

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Test] E2e Tests for Notebook Examples
4 participants