kvm: fix direct-download live storage migration#13084
kvm: fix direct-download live storage migration#13084Kunalbehbud wants to merge 2 commits intoapache:4.22from
Conversation
Direct-download backed volumes should not use the linked-clone live storage migration path because the backing chain is not guaranteed on the destination host. Force the KVM migration request to full clone when a migrated volume is direct-download backed, while ignoring volumes that are skipped from the migration request. Keep direct-download volume metadata consistent across VolumeDataFactory paths and make target-connection/template-reference failures explicit. Add unit coverage for forced full-clone, skipped-volume boundaries, copied template references, ModifyTargets answers, and VolumeDataFactory propagation.
7f25ad0 to
4b6de36
Compare
There was a problem hiding this comment.
Pull request overview
Fixes KVM live storage migration failures when migrating volumes backed by direct-download templates by forcing full-clone storage migration in cases where linked-clone backing chains cannot be assumed to exist on the destination.
Changes:
- Force full-clone live storage migration when any migrated volume in the request is backed by a direct-download template (while ignoring volumes skipped from the request).
- Ensure
VolumeDataFactoryImplconsistently sets thedirectDownloadflag onVolumeObjectviafindByIdIncludingRemoved(...). - Improve robustness around template reference updates and ModifyTargets/connect-path error handling, with expanded unit test coverage.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| engine/storage/datamotion/src/main/java/org/apache/cloudstack/storage/motion/StorageSystemDataMotionStrategy.java | Implements the full-clone fallback decision for direct-download-backed volumes; improves ModifyTargets handling and copied-template reference updates. |
| engine/storage/volume/src/main/java/org/apache/cloudstack/storage/volume/VolumeDataFactoryImpl.java | Centralizes/standardizes setting directDownload on returned VolumeObject instances (including templates removed later). |
| engine/storage/datamotion/src/test/java/org/apache/cloudstack/storage/motion/StorageSystemDataMotionStrategyTest.java | Adds unit tests for the new migration decision logic and updated ModifyTargets/template-reference behaviors. |
| engine/storage/datamotion/src/test/java/org/apache/cloudstack/storage/motion/KvmNonManagedStorageSystemDataMotionTest.java | Adds a unit test covering the “direct-download template already staged” behavior. |
| engine/storage/volume/src/test/java/org/apache/cloudstack/storage/volume/VolumeDataFactoryImplTest.java | Adds unit tests validating directDownload flag propagation across getVolume(...) variants. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Map<VolumeInfo, DataStore> volumeMap = new HashMap<>(); | ||
|
|
||
| configurePoolLookup(skippedDirectDownloadVolume, samePowerFlexStore, 1L, 1L, StoragePoolType.PowerFlex, StoragePoolType.PowerFlex); | ||
| configurePoolLookup(regularVolume, destPrimary, 3L, 4L, StoragePoolType.NetworkFilesystem, StoragePoolType.NetworkFilesystem); |
There was a problem hiding this comment.
In shouldForceFullCloneMigrationIgnoresSkippedDirectDownloadVolumes, skippedDirectDownloadVolume is never stubbed to return true from isDirectDownload(), so the assertion would still pass even if the skip logic were removed. To actually verify the “ignored direct-download volume” behavior, set Mockito.when(skippedDirectDownloadVolume.isDirectDownload()).thenReturn(true) and keep the expectation false due to the PowerFlex same-pool skip.
| configurePoolLookup(regularVolume, destPrimary, 3L, 4L, StoragePoolType.NetworkFilesystem, StoragePoolType.NetworkFilesystem); | |
| configurePoolLookup(regularVolume, destPrimary, 3L, 4L, StoragePoolType.NetworkFilesystem, StoragePoolType.NetworkFilesystem); | |
| Mockito.when(skippedDirectDownloadVolume.isDirectDownload()).thenReturn(true); |
| protected boolean shouldForceFullCloneMigration(Map<VolumeInfo, DataStore> volumeDataStoreMap, Host destHost) { | ||
| for (Map.Entry<VolumeInfo, DataStore> entry : volumeDataStoreMap.entrySet()) { | ||
| VolumeInfo srcVolumeInfo = entry.getKey(); | ||
| DataStore destDataStore = entry.getValue(); | ||
| StoragePoolVO sourceStoragePool = _storagePoolDao.findById(srcVolumeInfo.getPoolId()); | ||
| StoragePoolVO destStoragePool = _storagePoolDao.findById(destDataStore.getId()); | ||
|
|
There was a problem hiding this comment.
shouldForceFullCloneMigration performs two _storagePoolDao.findById(...) lookups per volume, and copyAsync immediately repeats those same lookups in the main loop. This doubles DAO calls for every live migration request. Consider computing the “force full clone” decision inside the existing copyAsync loop (before decideMigrationType...) or refactoring to reuse the already-fetched sourceStoragePool/destStoragePool objects to avoid redundant database access.
Avoid repeating storage pool lookups while deciding and preparing KVM live storage migration requests. Also make the skipped-volume test assert that skipped direct-download volumes do not affect the full-clone decision.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## 4.22 #13084 +/- ##
============================================
+ Coverage 17.67% 17.71% +0.03%
- Complexity 15789 15820 +31
============================================
Files 5922 5922
Lines 533094 533150 +56
Branches 65210 65220 +10
============================================
+ Hits 94246 94440 +194
+ Misses 428208 428054 -154
- Partials 10640 10656 +16
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Description
This PR fixes live storage migration for KVM instances whose migrated volumes are backed by direct-download templates.
The problematic path is linked-clone live storage migration. For that mode, libvirt expects the rest of the backing chain to already exist on the destination and to match the source. That assumption is not safe for direct-download backed volumes, where the template may have been bypassed or staged directly on primary storage. When such a volume is part of the actual migration set, this PR forces the KVM migration request to use full-clone storage migration instead.
A few related edge cases are handled in the same path:
shouldMigrateVolumeVolumeDataFactoryImplnow carries the templatedirectDownloadflag consistently across itsgetVolume(...)variants, including volumes whose template was removed laterModifyTargetsCommandfailures preserve the agent error details and empty connected-path answers fail with a clearer exceptionThis keeps the change scoped to the KVM storage migration bug. It does not try to mix linked-clone and full-clone per disk, since the current KVM/libvirt migration command chooses the storage migration mode for the VM migration request as a whole.
Types of changes
Feature/Enhancement Scale or Bug Severity
Feature/Enhancement Scale
Bug Severity
Screenshots (if appropriate):
N/A. This is a backend KVM storage migration fix.
How Has This Been Tested?
Targeted unit tests:
mvn -pl engine/storage/datamotion,engine/storage/volume -am \ -Dtest=AncientDataMotionStrategyTest,KvmNonManagedStorageSystemDataMotionTest,StorageSystemDataMotionStrategyTest,VolumeDataFactoryImplTest \ -DfailIfNoTests=false -Dsurefire.failIfNoSpecifiedTests=false testResult: 74 tests run, 0 failures, build success.
Manual verification was also done in a two-host KVM 4.22 test environment:
How did you try to break this feature and the system with this change?
Covered the boundary cases that are most likely to regress this path:
ModifyTargetsCommandreturning a generic failedAnswerModifyTargetsAnswerreturning no connected paths