-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Improve results and debuggability for quay-e2e job #59126
Conversation
dgoodwin
commented
Nov 22, 2024
- Fail quay e2e test step when e2e tests fail
- Enable resource watch observer on quay-quay-tests-master-ocp-418-quay-quay-e2e-tests-quay313-ocp418-lp-interop
This is important for prow and sippy to know the job encountered errors, today this step cannot fail.
…-quay-e2e-tests-quay313-ocp418-lp-interop
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: dgoodwin The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/pj-rehearse |
@dgoodwin: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
/pj-rehearse cancel |
@dgoodwin: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
@dgoodwin: job(s): cancel either don't exist or were not found to be affected, and cannot be rehearsed |
[REHEARSALNOTIFIER]
A total of 36 jobs have been affected by this change. The above listing is non-exhaustive and limited to 25 jobs. A full list of affected jobs can be found here Interacting with pj-rehearseComment: Once you are satisfied with the results of the rehearsals, comment: |
/pj-rehearse periodic-ci-quay-quay-tests-master-ocp-418-quay-quay-e2e-tests-quay313-ocp418-lp-interop |
@dgoodwin: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
@dgoodwin: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
This PR demonstrates a couple concepts I wanted to share for the interop QE jobs. Many seem to have test steps that cannot fail (return non-zero), this is a critical problem that needs to be addressed for all tooling right from prow on down to sippy. None of the tooling can properly understand the job until this is resolved. The second is something I think could be a dramatic improvement for investigating and categorizing failures as either infrastructure or a product or test bug. I started digging into some job failures and found that some of the tools I would normally use are not present because these tests are not run through origin. (big future concept around that mentioned below) But we do have an interesting tool called the resource watch observer that runs origin's "monitortests" as well. This enables a long running process throughout the life of the job, including product installation. It watches for interesting things happening in the cluster, monitors for network disruption to a number of different network stacks in the cluster (tested by requests coming from the CI build farm), and creates a git repo with commits that represent many of the most important kube api objects in the cluster as they change, and when. Enabling this observer is demonstrated and is quite simple. Then you end up with an artifact directory like this: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_release/59126/rehearse-59126-periodic-ci-quay-quay-tests-master-ocp-418-quay-quay-e2e-tests-quay313-ocp418-lp-interop/1860029073342861312/artifacts/quay-e2e-tests-quay313-ocp418-lp-interop/observers-resource-watch/artifacts/ The tar is the interesting part, extracting this you will find:
Both of these I think would be very powerful for interop jobs. There is a big project underway for components and layered products to be able to define their tests in their own repos, but be run via origin. (enhancement openshift/enhancements#1676) This would require go tests, but will eventually allow for dramatically better tooling to be used consistently. This is a ways out and I gather many of these jobs are not using go tests at this time, but if any one product might be interested in being an early adopter we could get those discussions going sooner. |
Issues in openshift/release go stale after 30d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
Stale issue in openshift/release rot after 15d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle rotten |
Rotten issues in openshift/release close after 15d of inactivity. Reopen the issue by commenting /close |
@openshift-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |