Skip to content

[Xamarin.Android.Build.Tasks] Retry RemoveDirFixed on ERROR_DIR_NOT_EMPTY#11195

Merged
jonathanpeppers merged 1 commit intodotnet:mainfrom
SuperJMN:fix/remove-dir-fixed-not-empty-retry
Apr 23, 2026
Merged

[Xamarin.Android.Build.Tasks] Retry RemoveDirFixed on ERROR_DIR_NOT_EMPTY#11195
jonathanpeppers merged 1 commit intodotnet:mainfrom
SuperJMN:fix/remove-dir-fixed-not-empty-retry

Conversation

@SuperJMN
Copy link
Copy Markdown
Contributor

Fixes #10124.

@jonathanpeppers — challenge accepted 🌶️ (with the disclaimer that the AI sounded confident because the fix is genuinely tiny 😄).

Diagnosis

RemoveDirFixed already has a perfectly good retry-with-backoff loop. The catch is that it only re-tries when the underlying IOException carries one of two HResults:

Constant HResult Win32
ERROR_ACCESS_DENIED 0x80070005 5
ERROR_SHARING_VIOLATION 0x80070020 32

The transient failure people are hitting on Linux during dotnet publish for Android is a third one:

error XARDF7024: System.IO.IOException: Directory not empty :
  '.../obj/Release/net10.0-android/android/assets/arm64-v8a'
   at System.IO.FileSystem.RemoveDirectoryRecursive(String fullPath)
   at Xamarin.Android.Tasks.RemoveDirFixed.RunTask()

On Unix, Directory.Delete(recursive: true) ultimately calls rmdir(2). When that returns ENOTEMPTY (POSIX errno 39), .NET maps it to ERROR_DIR_NOT_EMPTY = 0x80070091 (Win32 145), which is not in the whitelist — so the very first hiccup re-throws and the build fails.

The race is real (the SDK is recursively deleting obj/.../android/assets/<abi> while files in it are still being unlinked) but the window is usually microscopic on ext4/btrfs. It widens dramatically on NTFS volumes mounted via the Linux ntfs3 driver because NTFS directory metadata is non-POSIX-coherent — a directory can briefly report children that have just been unlinked. Same race, different filesystem clock.

Fix

Three lines: one new constant, and one extra term in the existing condition.

 const int ERROR_ACCESS_DENIED      = -2147024891; // 0x80070005
 const int ERROR_SHARING_VIOLATION  = -2147024864; // 0x80070020
+const int ERROR_DIR_NOT_EMPTY      = -2147024751; // 0x80070091-if ((code != ERROR_ACCESS_DENIED && code != ERROR_SHARING_VIOLATION) || retryCount >= attempts)
+if ((code != ERROR_ACCESS_DENIED && code != ERROR_SHARING_VIOLATION && code != ERROR_DIR_NOT_EMPTY) || retryCount >= attempts)

I prefer this over the "just warn if we can't delete" idea floated in the issue: this keeps the contract intact — if the directory still can't be removed after RetryAttempts (default backoff already in place), the task fails loudly as before. We're only widening which HResult is considered worth retrying, not how many retries or what happens after.

Test

I deliberately didn't add a new NUnit test for this. The existing DirectoryInUseWithRetry already covers the retry mechanics; the only thing this PR changes is one constant in a whitelist. Reproducing ENOTEMPTY deterministically on a CI agent without an ntfs3 mount would require either:

  • a refactor to inject Directory.Delete and throw a synthetic IOException (intrusive for a 3-line change), or
  • a flaky concurrent-writer test similar to the sharing-violation one (but ENOTEMPTY is hard to win against a non-NTFS FS).

Happy to add either if you'd prefer — just say the word.

Repro / verification

Repro steps and full system context (Pop!_OS 24.04 LTS, kernel 6.18, .NET 10.0.201, Android SDK 36.1.53, NTFS via ntfs3) are in my comment on #10124.

Locally, the issue reproduces ~30% of the time on the affected setup. With this patch applied to a local build of the SDK it stops reproducing across many consecutive runs, because the very small ntfs3 inconsistency window is comfortably covered by the first Files.GetFileWriteRetryDelay() backoff.


It's also worth noting that DotnetDeployer (the tool I use to publish .NET projects, SuperJMN/DotnetDeployer) already ships a client-side mitigation that detects this exact error, cleans obj/Release/<tfm>/android, and retries dotnet publish once. With this PR landed, that workaround becomes unnecessary 🎉.

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

The retry loop in RemoveDirFixed only kicks in for ERROR_ACCESS_DENIED
(0x80070005) and ERROR_SHARING_VIOLATION (0x80070020). The transient
"Directory not empty" failure observed on Linux during `dotnet publish`
for Android raises an IOException with HResult 0x80070091
(ERROR_DIR_NOT_EMPTY), which falls through the whitelist and aborts
the task on the first attempt.

On Unix, .NET maps `ENOTEMPTY` from `rmdir(2)` to ERROR_DIR_NOT_EMPTY,
so the same constant covers both Windows and Unix sources. Adding it
to the whitelist lets the existing retry-with-backoff machinery do its
job, with no behavioral change after `RetryAttempts` is exhausted.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the RemoveDirFixed MSBuild task in Xamarin.Android.Build.Tasks to retry directory deletion when .NET surfaces a transient “directory not empty” failure (HRESULT 0x80070091), addressing intermittent clean/publish failures on Unix-like systems.

Changes:

  • Add ERROR_DIR_NOT_EMPTY constant (0x80070091) with explanatory comment.
  • Extend the existing retry whitelist to include ERROR_DIR_NOT_EMPTY in the IOException/UnauthorizedAccessException retry path.

@jonathanpeppers
Copy link
Copy Markdown
Member

/review

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 23, 2026

Android PR Reviewer completed successfully!

// the Linux `ntfs3` driver, where directory metadata can momentarily
// report children that have just been unlinked, but is not specific
// to that filesystem.
const int ERROR_DIR_NOT_EMPTY = -2147024751; // 0x80070091
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ LGTM

Clean, well-scoped fix. One file, three meaningful lines, thorough PR description with root-cause analysis.

Summary

Severity Count
❌ Error 0
⚠️ Warning 0
💡 Suggestion 1

What I verified

  • HResult correctness: 0x80070091 → signed int32 -2147024751 → Win32 ERROR_DIR_NOT_EMPTY (145). ✓
  • Logic: The existing De Morgan pattern (code != A && code != B && code != C) || retryCount >= attempts correctly retries on any of the three errors while retries remain, and rethrows otherwise. ✓
  • Comment quality: The explanation on the new constant clearly documents the Unix ENOTEMPTY → Win32 mapping and the ntfs3 scenario. ✓
  • Contract preservation: The retry contract is unchanged — if retries are exhausted, the task still fails loudly. No silent swallowing. ✓
  • Existing test coverage: DirectoryInUseWithRetry validates the retry loop mechanics; the only change here is which HResult enters that path. The PR description's reasoning for not adding a new test is sound. ✓
  • CI: Completed checks are all green; a few macOS emulator jobs are still in progress but no failures.

Positive callouts

  • Excellent PR description — the root-cause diagnosis, POSIX-to-Win32 mapping explanation, and filesystem-specific context are exemplary.
  • Adding hex comments to the pre-existing constants alongside the new one is a nice touch for maintainability.

Generated by Android PR Reviewer for issue #11195 · ● 2.6M

case IOException:
int code = Marshal.GetHRForException(e);
if ((code != ERROR_ACCESS_DENIED && code != ERROR_SHARING_VIOLATION) || retryCount >= attempts) {
if ((code != ERROR_ACCESS_DENIED && code != ERROR_SHARING_VIOLATION && code != ERROR_DIR_NOT_EMPTY) || retryCount >= attempts) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 💡 Code organization — The negated-check pattern (code != A && code != B && code != C) is getting harder to scan as more error codes are added. Consider inverting to a positive check:

bool isRetriable = code is ERROR_ACCESS_DENIED or ERROR_SHARING_VIOLATION or ERROR_DIR_NOT_EMPTY;
if (!isRetriable || retryCount >= attempts) {
	throw;
}

This reads as "retry if retriable and retries remain" which matches the intent more naturally, and is easier to extend if a fourth HResult ever surfaces.

Rule: Reduce indentation / improve readability (Postmortem #62)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's ok.

Copy link
Copy Markdown
Member

@jonathanpeppers jonathanpeppers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a couple CI failures that are ignorable.

@jonathanpeppers jonathanpeppers merged commit d19a499 into dotnet:main Apr 23, 2026
36 of 39 checks passed
Evangelink pushed a commit to microsoft/testfx that referenced this pull request Apr 24, 2026
Continuation of #7772.

On .NET for Android, `Assembly.Location` returns a relative path
like `"AndroidTest1.dll"` (MonoVM) or an empty string (CoreCLR).
`Path.GetDirectoryName` returns `""` for these inputs, and then
`Directory.CreateDirectory("")` throws `ArgumentException`.

This would fail in Release mode on Android, or if the "fast
deployment" feature is disabled.

Guard against empty/relative paths by checking the result of
`Path.GetDirectoryName` and falling back to the standard
`RootDeploymentDirectory/Out` path.

Full stack trace from AZDO build
(https://dev.azure.com/dnceng-public/public/_build/results?buildId=1392743):

    INSTRUMENTATION_RESULT: error=System.ArgumentException: The value cannot be an empty string. (Parameter 'path')
     at System.ArgumentException.ThrowNullOrEmptyException(String argument, String paramName)
     at System.ArgumentException.ThrowIfNullOrEmpty(String argument, String paramName)
     at System.IO.Directory.CreateDirectory(String path)
     at Microsoft.VisualStudio.TestPlatform.MSTestAdapter.PlatformServices.Utilities.FileUtility.CreateDirectoryIfNotExists(String directory)
     at Microsoft.VisualStudio.TestPlatform.MSTestAdapter.PlatformServices.Utilities.DeploymentUtilityBase.CreateDeploymentDirectories(IRunContext runContext, String firstTestSource)
     at Microsoft.VisualStudio.TestPlatform.MSTestAdapter.PlatformServices.TestDeployment.Deploy(IEnumerable`1 testCases, IRunContext runContext, IFrameworkHandle frameworkHandle, ITestSourceHandler testSourceHandler, TestRunCancellationToken cancellationToken)
     at Microsoft.VisualStudio.TestPlatform.MSTest.TestAdapter.MSTestExecutor.RunTestsAsync(...)
     at Microsoft.VisualStudio.TestTools.UnitTesting.MSTestBridgedTestFramework.SynchronizedRunTestsAsync(...)
     at Microsoft.Testing.Extensions.VSTestBridge.SynchronizedSingleSessionVSTestBridgedTestFramework.ExecuteRequestAsync(...)
     at Microsoft.Testing.Platform.Builder.TestApplication.RunAsync()
     at DotNetNewAndroidTestMonoVM.TestInstrumentation.<OnStart>b__2_0()
    INSTRUMENTATION_CODE: 0

Related: dotnet/android#11195, #7769

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Build errors after cleaning bin/obj

3 participants