-
Notifications
You must be signed in to change notification settings - Fork 178
migration: Add case testing the abort operation #6619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
WalkthroughAdds a new Libvirt migration test config and test module that runs a background migration, aborts it before the QEMU layer starts (via domjobabort or killing virtqemud), collects and validates the result, then reruns the migration and performs cleanup. Changes
Sequence Diagram(s)sequenceDiagram
participant Runner as Test Runner
participant Conn as MigrationBase / Connections
participant Thread as Migration Thread
participant VM as Source VM (virsh)
participant Queue as Result Queue
Note over Runner,Conn: Setup
Runner->>Conn: setup_connection()
Runner->>Conn: terminate remote virtqemud
Note over Runner,Thread: Start migration
Runner->>Thread: start run_migration(vm_name, dest_uri, option, extra)
Thread->>VM: virsh migrate (background)
Thread->>Queue: push interim status/results
alt Abort via domjobabort
Runner->>VM: virsh domjobabort
else Abort via kill virtqemud
Runner->>Conn: pkill/terminate virtqemud on dest
end
Thread->>Queue: push final migration result
Runner->>Thread: join
Runner->>Queue: pop result
Runner->>Runner: libvirt.check_result(result, err_msg)
Note over Runner: Rerun and verify
Runner->>Conn: run_migration_again()
Runner->>Conn: verify_default()
Runner->>Conn: cleanup_connection()
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Nitpick comments (4)
libvirt/tests/src/migration/async_job/abort_before_qemu_layer_mig_start.py (3)
69-70: Replace fixed sleep with a deterministic wait for “job started”.Polling virsh domjobinfo (or pgrep existence) reduces flakiness across hosts.
Example:
utils_misc.wait_for( lambda: process.run('pgrep -f "virsh migrate"', shell=False, ignore_status=True).exit_status == 0, timeout=int(params.get("ssh_timeout", 60)), step=1 )
25-25: Scope the queue to the test to avoid cross‑test leakage.Make msg_queue local and pass it to the worker to reduce global state.
Example:
-msg_queue = queue.Queue() + # inside run(): + msg_queue = queue.Queue() @@ -def run_migration(vm_name, dest_uri, option, extra): +def run_migration(vm_name, dest_uri, option, extra, outq): @@ - msg_queue.put(ret) + outq.put(ret) @@ - mig_t = threading.Thread(target=run_migration, args=(vm_name, dest_uri, option, extra)) + mig_t = threading.Thread(target=run_migration, args=(vm_name, dest_uri, option, extra, msg_queue))
52-55: Reduce blast radius: target virtqemud main PID via systemd.pkill affects all virtqemud processes. Prefer systemctl kill to only pause/resume the managed service.
Example:
migrate_vm.run_remote_cmd("systemctl kill -s STOP --kill-who=main virtqemud", server_ip, server_user, server_pwd) ... migrate_vm.run_remote_cmd("systemctl kill -s CONT --kill-who=main virtqemud", server_ip, server_user, server_pwd)Also applies to: 80-81
libvirt/tests/cfg/migration/async_job/abort_before_qemu_layer_mig_start.cfg (1)
27-27: Relax error matching to reduce flakiness across libvirt versions/locales.Match the stable substring only.
Apply:
- err_msg = "error: operation aborted: migration out: canceled by client" + err_msg = "canceled by client"Please confirm this substring appears on your target libvirt builds; if not, adjust to the most stable fragment observed in stderr.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
libvirt/tests/cfg/migration/async_job/abort_before_qemu_layer_mig_start.cfg(1 hunks)libvirt/tests/src/migration/async_job/abort_before_qemu_layer_mig_start.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
libvirt/tests/src/migration/async_job/abort_before_qemu_layer_mig_start.py (2)
provider/migration/base_steps.py (4)
setup_connection(343-364)MigrationBase(27-477)verify_default(309-324)cleanup_connection(366-378)libvirt/tests/src/migration/migrate_vm.py (1)
run_remote_cmd(759-776)
🪛 Ruff (0.14.0)
libvirt/tests/src/migration/async_job/abort_before_qemu_layer_mig_start.py
74-74: Function call with shell=True parameter identified, security issue
(S604)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Python 3.8
- GitHub Check: Python 3.9
- GitHub Check: Python 3.11
- GitHub Check: Python 3.12
| :param vm_name: VM name | ||
| :param dest_uri: virsh uri | ||
| :param option: virsh migrate option parameters | ||
| :param extra: virsh migrate extra paramsters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo in docstring (“paramsters”).
Fix spelling for clarity.
Apply:
- :param extra: virsh migrate extra paramsters
+ :param extra: virsh migrate extra parameters📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| :param extra: virsh migrate extra paramsters | |
| :param extra: virsh migrate extra parameters |
🤖 Prompt for AI Agents
In libvirt/tests/src/migration/async_job/abort_before_qemu_layer_mig_start.py
around line 35, the docstring has a typo: "paramsters" should be "parameters";
update the docstring line to correct the spelling for the :param extra:
description so it reads ":param extra: virsh migrate extra parameters".
libvirt/tests/src/migration/async_job/abort_before_qemu_layer_mig_start.py
Outdated
Show resolved
Hide resolved
| test.log.info("Setup steps.") | ||
| migration_obj.setup_connection() | ||
| remote.run_remote_cmd("pkill -19 virtqemud", params) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong remote command helper/signature. Use migrate_vm.run_remote_cmd with server creds.
remote.run_remote_cmd(params) is not the exported helper; call migrate_vm.run_remote_cmd(command, server_ip, server_user, server_pwd).
Apply:
@@
-from virttest import remote
+from virttest import utils_misc
+from virttest import virsh
+from virttest.utils_test import libvirt
+from provider.migration import base_steps
+from libvirt.tests.src.migration import migrate_vm
@@
- migration_obj.setup_connection()
- remote.run_remote_cmd("pkill -19 virtqemud", params)
+ migration_obj.setup_connection()
+ migrate_vm.run_remote_cmd(
+ "pkill -19 virtqemud",
+ params.get("server_ip"),
+ params.get("server_user"),
+ params.get("server_pwd"),
+ )Note: Keep the new import; repeat the same change for the resume step below.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In libvirt/tests/src/migration/async_job/abort_before_qemu_layer_mig_start.py
around lines 52 to 55, the test is calling the wrong remote helper
(remote.run_remote_cmd with params); replace that call with
migrate_vm.run_remote_cmd(command, server_ip, server_user, server_pwd) — e.g.
migrate_vm.run_remote_cmd("pkill -19 virtqemud", server_ip, server_user,
server_pwd) — and ensure the same change is applied to the resume step later in
the file; keep the new import for migrate_vm as requested.
| ret = process.run("pgrep -f 'virsh migrate'", shell=True, verbose=True).stdout_text.strip() | ||
| utils_misc.safe_kill(ret, signal.SIGINT) | ||
| time.sleep(60) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Harden Ctrl‑C abort: avoid shell=True, handle multiple PIDs, and remove fixed sleep.
Addresses Ruff S604, multi‑PID cases, and unnecessary 60s delay.
Apply:
- ret = process.run("pgrep -f 'virsh migrate'", shell=True, verbose=True).stdout_text.strip()
- utils_misc.safe_kill(ret, signal.SIGINT)
- time.sleep(60)
+ res = process.run('pgrep -f "virsh migrate"', shell=False,
+ verbose=True, ignore_status=True)
+ pids = [p for p in res.stdout_text.split()]
+ if not pids:
+ test.error("No 'virsh migrate' PID found to SIGINT")
+ for pid in pids:
+ utils_misc.safe_kill(int(pid), signal.SIGINT)Committable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 Ruff (0.14.0)
74-74: Function call with shell=True parameter identified, security issue
(S604)
🤖 Prompt for AI Agents
In libvirt/tests/src/migration/async_job/abort_before_qemu_layer_mig_start.py
around lines 74 to 76, avoid using shell=True with pgrep, handle multiple PIDs
returned, and remove the fixed 60s sleep: run pgrep as a list command (no
shell), check its return code and split stdout into lines to get all PIDs, call
utils_misc.safe_kill for each PID with signal.SIGINT, then replace
time.sleep(60) with a short polling loop that checks for remaining migration
processes (e.g., re-run pgrep or check process existence) with a reasonable
timeout and small sleep interval so the test proceeds as soon as processes exit.
| remote.run_remote_cmd("pkill -18 virtqemud", params) | ||
| mig_t.join() | ||
|
|
||
| output = msg_queue.get() | ||
| libvirt.check_result(output, err_msg) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid hangs: add timeouts to join() and queue.get().
Ensure the test fails fast instead of blocking indefinitely.
Apply:
- remote.run_remote_cmd("pkill -18 virtqemud", params)
- mig_t.join()
-
- output = msg_queue.get()
+ migrate_vm.run_remote_cmd(
+ "pkill -18 virtqemud",
+ params.get("server_ip"),
+ params.get("server_user"),
+ params.get("server_pwd"),
+ )
+ mig_t.join(timeout=int(params.get("ssh_timeout", 60)))
+ if mig_t.is_alive():
+ test.error("Migration thread did not exit in time after abort")
+ try:
+ output = msg_queue.get(timeout=int(params.get("ssh_timeout", 60)))
+ except queue.Empty:
+ test.error("No migrate result retrieved from worker thread")
libvirt.check_result(output, err_msg)Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In libvirt/tests/src/migration/async_job/abort_before_qemu_layer_mig_start.py
around lines 80 to 85, the test calls mig_t.join() and msg_queue.get() without
timeouts which can hang indefinitely; update to pass timeouts (e.g.
join(timeout=... ) and get(timeout=... )) and check their results so the test
fails fast: after join(timeout) assert the thread is not alive (or raise/fail if
it is), and wrap msg_queue.get(timeout) in a try/except for queue.Empty (or
check for None) to raise a clear test failure if no message arrives within the
timeout. Ensure the chosen timeout values are reasonable for the test
environment and that failure paths log/raise an informative error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (5)
libvirt/tests/cfg/migration/async_job/abort_before_qemu_layer_mig_start.cfg (5)
8-11: Normalize cfg types: quote vars and use yes/no booleans.Avoid mixed boolean styles and unquoted vars to prevent parser surprises.
- mnt_path_name = ${nfs_mount_dir} - only_pty = True - take_regular_screendumps = no + mnt_path_name = "${nfs_mount_dir}" + only_pty = "yes" + take_regular_screendumps = "no"
27-27: Make error check robust across libvirt versions.Exact strings can differ slightly; match a stable substring.
- err_msg = "error: operation aborted: migration out: canceled by client" + err_msg = "operation aborted"If the harness supports regex, prefer a pattern like: operation aborted.*canceled by client.
37-39: Add a tcp health check port.You define migrate_desturi_port but don't check it; add port_to_check for tcp.
- tcp: migrate_desturi_port = "16509" migrate_desturi_type = "tcp" + port_to_check = "${migrate_desturi_port}" virsh_migrate_desturi = "qemu+tcp://${migrate_dest_host}/system"
30-36: Confirm intended cartesian product of variants.Current layout yields p2p × with_precopy × {tcp, unix} × abort_mig. Ensure this expansion is desired (test time explosion, naming). If not, consider scoping or naming variant dimensions to control combinations.
51-52: Add SIGINT abort path to match PR scope.PR summary mentions domjobabort and SIGINT; config only has domjobabort.
variants abort_mig: - domjobabort: + - sigint:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
libvirt/tests/cfg/migration/async_job/abort_before_qemu_layer_mig_start.cfg(1 hunks)libvirt/tests/src/migration/async_job/abort_before_qemu_layer_mig_start.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- libvirt/tests/src/migration/async_job/abort_before_qemu_layer_mig_start.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Python 3.12
- GitHub Check: Python 3.11
- GitHub Check: Python 3.8
- GitHub Check: Python 3.9
🔇 Additional comments (2)
libvirt/tests/cfg/migration/async_job/abort_before_qemu_layer_mig_start.cfg (2)
43-50: Verify unix_proxy socket paths across distros.Paths under /var/lib/libvirt/qemu may differ (e.g., /run/libvirt/qemu on newer systems). Confirm these are correct for your target platforms.
18-19: Validate final domain states correspond to the second (successful) run.With status_error="yes" then migrate_again_status_error="no", confirm dest=running and src="shut off" are evaluated after the retry, not the initial abort.
libvirt/tests/cfg/migration/async_job/abort_before_qemu_layer_mig_start.cfg
Show resolved
Hide resolved
Abort migration job before qemu layer migration starts, then migrate vm again. VIRT-298398 - [VM migration][precopy] abort migration job before qemu layer migration starts Signed-off-by: lcheng <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (5)
libvirt/tests/src/migration/async_job/abort_before_qemu_layer_mig_start.py (5)
15-19: Missing required import for remote command execution.The code at lines 50 and 67 needs
migrate_vm.run_remote_cmd, but themigrate_vmmodule is not imported.
50-50: Wrong remote command API - will fail at runtime.
remote.run_remote_cmd(command, params)is not the correct signature. Usemigrate_vm.run_remote_cmd(command, server_ip, server_user, server_pwd)instead.
67-67: Wrong remote command API - will fail at runtime.Same issue as line 50: use
migrate_vm.run_remote_cmd(command, server_ip, server_user, server_pwd)instead ofremote.run_remote_cmd(command, params).
68-68: Add timeout to thread join to prevent indefinite hang.Without a timeout, the test can block forever if the migration thread doesn't exit. Add a timeout and check if the thread is still alive.
70-70: Add timeout to queue.get() to prevent indefinite hang.Without a timeout, this call can block forever if the migration thread fails to put a result. Add a timeout and handle the
queue.Emptyexception.
🧹 Nitpick comments (1)
libvirt/tests/src/migration/async_job/abort_before_qemu_layer_mig_start.py (1)
21-21: Consider moving queue to local scope to avoid shared state.The module-level
msg_queuecould retain state across test runs if the module is reused. Moving it insiderun()ensures clean state per test execution.Apply this diff to move the queue to local scope:
-msg_queue = queue.Queue() - - def run_migration(vm_name, dest_uri, option, extra): """ Run migration :param vm_name: VM name :param dest_uri: virsh uri :param option: virsh migrate option parameters :param extra: virsh migrate extra parameters """ + msg_queue = threading.current_thread().msg_queue ret = virsh.migrate(vm_name, dest_uri, option, extra, ignore_status=True, debug=True) msg_queue.put(ret)Then in
run_test():+ msg_queue = queue.Queue() mig_t = threading.Thread(target=run_migration, args=(vm_name, dest_uri, option, extra)) + mig_t.msg_queue = msg_queue mig_t.start()
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
libvirt/tests/cfg/migration/async_job/abort_before_qemu_layer_mig_start.cfg(1 hunks)libvirt/tests/src/migration/async_job/abort_before_qemu_layer_mig_start.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
libvirt/tests/src/migration/async_job/abort_before_qemu_layer_mig_start.py (2)
provider/migration/base_steps.py (4)
setup_connection(343-364)MigrationBase(27-477)verify_default(309-324)cleanup_connection(366-378)libvirt/tests/src/migration/migrate_vm.py (1)
run_remote_cmd(759-776)
🔇 Additional comments (5)
libvirt/tests/src/migration/async_job/abort_before_qemu_layer_mig_start.py (2)
24-34: LGTM!The function correctly uses
ignore_status=Trueto ensure a result is always captured, preventing queue deadlock when migration is aborted.
73-84: LGTM!The main execution flow correctly initializes the migration object, runs setup and test, executes migration again, verifies results, and ensures cleanup in the finally block.
libvirt/tests/cfg/migration/async_job/abort_before_qemu_layer_mig_start.cfg (3)
1-29: LGTM!Base configuration parameters are comprehensive and appropriate for testing migration abort before QEMU layer starts, including NFS storage setup, SSH timeouts, expected states, and error messages.
30-50: LGTM!The variant configurations correctly cover multiple transport methods (p2p, tcp, unix) with appropriate parameters. The
service_to_check = ""at line 42 correctly uses an empty string (past issue resolved).
51-52: LGTM!The
abort_migvariant correctly defines the domjobabort test scenario.
Abort migration job before qemu layer migration starts, then migrate vm again.
VIRT-298398 - [VM migration][precopy] abort migration job before qemu layer migration starts
Test result:
(1/2) type_specific.io-github-autotest-libvirt.migration.async_job.abort_before_qemu_layer_mig_start.domjobabort.tcp.with_precopy.p2p: STARTED
(1/2) type_specific.io-github-autotest-libvirt.migration.async_job.abort_before_qemu_layer_mig_start.domjobabort.tcp.with_precopy.p2p: PASS (228.03 s)
(2/2) type_specific.io-github-autotest-libvirt.migration.async_job.abort_before_qemu_layer_mig_start.domjobabort.unix.with_precopy.p2p: STARTED
(2/2) type_specific.io-github-autotest-libvirt.migration.async_job.abort_before_qemu_layer_mig_start.domjobabort.unix.with_precopy.p2p: PASS (188.65 s)
Summary by CodeRabbit