[Xamarin.Android.Build.Tasks] Retry RemoveDirFixed on ERROR_DIR_NOT_EMPTY#11195
Conversation
The retry loop in RemoveDirFixed only kicks in for ERROR_ACCESS_DENIED (0x80070005) and ERROR_SHARING_VIOLATION (0x80070020). The transient "Directory not empty" failure observed on Linux during `dotnet publish` for Android raises an IOException with HResult 0x80070091 (ERROR_DIR_NOT_EMPTY), which falls through the whitelist and aborts the task on the first attempt. On Unix, .NET maps `ENOTEMPTY` from `rmdir(2)` to ERROR_DIR_NOT_EMPTY, so the same constant covers both Windows and Unix sources. Adding it to the whitelist lets the existing retry-with-backoff machinery do its job, with no behavioral change after `RetryAttempts` is exhausted. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR updates the RemoveDirFixed MSBuild task in Xamarin.Android.Build.Tasks to retry directory deletion when .NET surfaces a transient “directory not empty” failure (HRESULT 0x80070091), addressing intermittent clean/publish failures on Unix-like systems.
Changes:
- Add
ERROR_DIR_NOT_EMPTYconstant (0x80070091) with explanatory comment. - Extend the existing retry whitelist to include
ERROR_DIR_NOT_EMPTYin the IOException/UnauthorizedAccessException retry path.
|
/review |
|
✅ Android PR Reviewer completed successfully! |
| // the Linux `ntfs3` driver, where directory metadata can momentarily | ||
| // report children that have just been unlinked, but is not specific | ||
| // to that filesystem. | ||
| const int ERROR_DIR_NOT_EMPTY = -2147024751; // 0x80070091 |
There was a problem hiding this comment.
This is a real code listed here:
There was a problem hiding this comment.
✅ LGTM
Clean, well-scoped fix. One file, three meaningful lines, thorough PR description with root-cause analysis.
Summary
| Severity | Count |
|---|---|
| ❌ Error | 0 |
| 0 | |
| 💡 Suggestion | 1 |
What I verified
- HResult correctness:
0x80070091→ signed int32-2147024751→ Win32ERROR_DIR_NOT_EMPTY(145). ✓ - Logic: The existing De Morgan pattern
(code != A && code != B && code != C) || retryCount >= attemptscorrectly retries on any of the three errors while retries remain, and rethrows otherwise. ✓ - Comment quality: The explanation on the new constant clearly documents the Unix
ENOTEMPTY→ Win32 mapping and thentfs3scenario. ✓ - Contract preservation: The retry contract is unchanged — if retries are exhausted, the task still fails loudly. No silent swallowing. ✓
- Existing test coverage:
DirectoryInUseWithRetryvalidates the retry loop mechanics; the only change here is which HResult enters that path. The PR description's reasoning for not adding a new test is sound. ✓ - CI: Completed checks are all green; a few macOS emulator jobs are still in progress but no failures.
Positive callouts
- Excellent PR description — the root-cause diagnosis, POSIX-to-Win32 mapping explanation, and filesystem-specific context are exemplary.
- Adding hex comments to the pre-existing constants alongside the new one is a nice touch for maintainability.
Generated by Android PR Reviewer for issue #11195 · ● 2.6M
| case IOException: | ||
| int code = Marshal.GetHRForException(e); | ||
| if ((code != ERROR_ACCESS_DENIED && code != ERROR_SHARING_VIOLATION) || retryCount >= attempts) { | ||
| if ((code != ERROR_ACCESS_DENIED && code != ERROR_SHARING_VIOLATION && code != ERROR_DIR_NOT_EMPTY) || retryCount >= attempts) { |
There was a problem hiding this comment.
🤖 💡 Code organization — The negated-check pattern (code != A && code != B && code != C) is getting harder to scan as more error codes are added. Consider inverting to a positive check:
bool isRetriable = code is ERROR_ACCESS_DENIED or ERROR_SHARING_VIOLATION or ERROR_DIR_NOT_EMPTY;
if (!isRetriable || retryCount >= attempts) {
throw;
}This reads as "retry if retriable and retries remain" which matches the intent more naturally, and is easier to extend if a fourth HResult ever surfaces.
Rule: Reduce indentation / improve readability (Postmortem #62)
jonathanpeppers
left a comment
There was a problem hiding this comment.
There are a couple CI failures that are ignorable.
Continuation of #7772. On .NET for Android, `Assembly.Location` returns a relative path like `"AndroidTest1.dll"` (MonoVM) or an empty string (CoreCLR). `Path.GetDirectoryName` returns `""` for these inputs, and then `Directory.CreateDirectory("")` throws `ArgumentException`. This would fail in Release mode on Android, or if the "fast deployment" feature is disabled. Guard against empty/relative paths by checking the result of `Path.GetDirectoryName` and falling back to the standard `RootDeploymentDirectory/Out` path. Full stack trace from AZDO build (https://dev.azure.com/dnceng-public/public/_build/results?buildId=1392743): INSTRUMENTATION_RESULT: error=System.ArgumentException: The value cannot be an empty string. (Parameter 'path') at System.ArgumentException.ThrowNullOrEmptyException(String argument, String paramName) at System.ArgumentException.ThrowIfNullOrEmpty(String argument, String paramName) at System.IO.Directory.CreateDirectory(String path) at Microsoft.VisualStudio.TestPlatform.MSTestAdapter.PlatformServices.Utilities.FileUtility.CreateDirectoryIfNotExists(String directory) at Microsoft.VisualStudio.TestPlatform.MSTestAdapter.PlatformServices.Utilities.DeploymentUtilityBase.CreateDeploymentDirectories(IRunContext runContext, String firstTestSource) at Microsoft.VisualStudio.TestPlatform.MSTestAdapter.PlatformServices.TestDeployment.Deploy(IEnumerable`1 testCases, IRunContext runContext, IFrameworkHandle frameworkHandle, ITestSourceHandler testSourceHandler, TestRunCancellationToken cancellationToken) at Microsoft.VisualStudio.TestPlatform.MSTest.TestAdapter.MSTestExecutor.RunTestsAsync(...) at Microsoft.VisualStudio.TestTools.UnitTesting.MSTestBridgedTestFramework.SynchronizedRunTestsAsync(...) at Microsoft.Testing.Extensions.VSTestBridge.SynchronizedSingleSessionVSTestBridgedTestFramework.ExecuteRequestAsync(...) at Microsoft.Testing.Platform.Builder.TestApplication.RunAsync() at DotNetNewAndroidTestMonoVM.TestInstrumentation.<OnStart>b__2_0() INSTRUMENTATION_CODE: 0 Related: dotnet/android#11195, #7769 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Fixes #10124.
@jonathanpeppers — challenge accepted 🌶️ (with the disclaimer that the AI sounded confident because the fix is genuinely tiny 😄).
Diagnosis
RemoveDirFixedalready has a perfectly good retry-with-backoff loop. The catch is that it only re-tries when the underlyingIOExceptioncarries one of two HResults:ERROR_ACCESS_DENIED0x80070005ERROR_SHARING_VIOLATION0x80070020The transient failure people are hitting on Linux during
dotnet publishfor Android is a third one:On Unix,
Directory.Delete(recursive: true)ultimately callsrmdir(2). When that returnsENOTEMPTY(POSIX errno 39), .NET maps it toERROR_DIR_NOT_EMPTY=0x80070091(Win32 145), which is not in the whitelist — so the very first hiccup re-throws and the build fails.The race is real (the SDK is recursively deleting
obj/.../android/assets/<abi>while files in it are still being unlinked) but the window is usually microscopic on ext4/btrfs. It widens dramatically on NTFS volumes mounted via the Linuxntfs3driver because NTFS directory metadata is non-POSIX-coherent — a directory can briefly report children that have just been unlinked. Same race, different filesystem clock.Fix
Three lines: one new constant, and one extra term in the existing condition.
I prefer this over the "just warn if we can't delete" idea floated in the issue: this keeps the contract intact — if the directory still can't be removed after
RetryAttempts(default backoff already in place), the task fails loudly as before. We're only widening which HResult is considered worth retrying, not how many retries or what happens after.Test
I deliberately didn't add a new NUnit test for this. The existing
DirectoryInUseWithRetryalready covers the retry mechanics; the only thing this PR changes is one constant in a whitelist. ReproducingENOTEMPTYdeterministically on a CI agent without anntfs3mount would require either:Directory.Deleteand throw a syntheticIOException(intrusive for a 3-line change), orENOTEMPTYis hard to win against a non-NTFS FS).Happy to add either if you'd prefer — just say the word.
Repro / verification
Repro steps and full system context (Pop!_OS 24.04 LTS, kernel 6.18, .NET 10.0.201, Android SDK 36.1.53, NTFS via
ntfs3) are in my comment on #10124.Locally, the issue reproduces ~30% of the time on the affected setup. With this patch applied to a local build of the SDK it stops reproducing across many consecutive runs, because the very small ntfs3 inconsistency window is comfortably covered by the first
Files.GetFileWriteRetryDelay()backoff.It's also worth noting that
DotnetDeployer(the tool I use to publish .NET projects, SuperJMN/DotnetDeployer) already ships a client-side mitigation that detects this exact error, cleansobj/Release/<tfm>/android, and retriesdotnet publishonce. With this PR landed, that workaround becomes unnecessary 🎉.Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com