Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run HealthCheck without saving the ExecSession to the database #25003

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Honny1
Copy link
Member

@Honny1 Honny1 commented Jan 13, 2025

This PR creates a method to run the HealthCheck command without creating and deleting an ExecSession in the database.

When HealthCheck is run using the original exec method, an ExecSession is created and deleted. This approach causes unexpectedly higher IO usage when synchronizing the container and creating and deleting ExecSession in the database.

The new healthCheckExec function locks the container and creates the ExecSession locally without writing to the database. Executes a local ExecSession. As a result, the number of writes in the database has been reduced to zero.

Verify reduction

  • Start 30 containers with /bin/true as a health check that runs every 10 seconds.
  • Monitor writes for two mins: timeout 120 stap check.stp 0x23 > stap.out
    • Note: you will probably need to install debug symbols for the kernel: dnf debuginfo-install kernel-$(uname -r)

    • Script check.stp:
#!/usr/bin/stap

global mydev

probe begin

{ dev = usrdev2kerndev($1) mydev = MKDEV(MAJOR(dev), MINOR(dev)) }
probe vfs.write.return

{ if (dev == mydev) printf ("%s(%d)[%s(%d)] %s 0x%x %s\n", execname(), pid(), pexecname(), ppid(), ppfunc(), dev, fullpath_struct_file(task_current(), @entry($file))) }
  • Process result: sed -E 's/([0-9]+)//' stap.out | sort | uniq -c | sort -bn | tail -n 3
    • The result before should look like this:
4179 podman()[systemd(1)] vfs_write 0x23 /var/lib/containers/storage/db.sql
16115 podman()[systemd(1)] vfs_write 0x23 /var/lib/containers/storage/db.sql-journal
29857374 stapio()[stap(195325)] vfs_write 0x23 /home/jrodak/dev/stap.out

The result after applying this change should not contain /var/lib/containers/storage/db.sql and /var/lib/containers/storage/db.sql-journal or have a smaller first number on the line.

Fixes: https://issues.redhat.com/browse/RHEL-69970

Does this PR introduce a user-facing change?

The HelathCheck is executed without writing to the database. 

@openshift-ci openshift-ci bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note labels Jan 13, 2025
Copy link
Contributor

openshift-ci bot commented Jan 13, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Honny1
Once this PR has been reviewed and has the lgtm label, please assign mheon for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Honny1 Honny1 force-pushed the no-db-healtcheck-exec branch from ac4e5b3 to 31172f4 Compare January 13, 2025 16:27
@Honny1 Honny1 changed the title Create exec method without database writes for HealthCheck execution Run HealthCheck without saving the ExecSession to the database Jan 13, 2025
@Honny1 Honny1 force-pushed the no-db-healtcheck-exec branch from 31172f4 to 79dddb6 Compare January 13, 2025 16:34
Copy link

Ephemeral COPR build failed. @containers/packit-build please check.

1 similar comment
Copy link

Ephemeral COPR build failed. @containers/packit-build please check.

@Honny1 Honny1 added the No New Tests Allow PR to proceed without adding regression tests label Jan 13, 2025
@Honny1 Honny1 force-pushed the no-db-healtcheck-exec branch 2 times, most recently from a2e6f31 to 8434c76 Compare January 14, 2025 11:45
@Honny1 Honny1 removed the No New Tests Allow PR to proceed without adding regression tests label Jan 14, 2025
@Honny1 Honny1 force-pushed the no-db-healtcheck-exec branch from 8434c76 to 30e4aac Compare January 14, 2025 17:20
@Honny1
Copy link
Member Author

Honny1 commented Jan 15, 2025

/packit retest-failed

@Honny1
Copy link
Member Author

Honny1 commented Jan 15, 2025

This PR should not merge until after 5.4 branches.

We'd like this to sit in Fedora for a while before we put it in RHEL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant