Skip to content

Conversation

mertcanaltin
Copy link

@mertcanaltin mertcanaltin commented Oct 5, 2025

Implements Windows file path handling as part of the WHATWG URL
spec change (whatwg/url#874).

Changes

  • Detects Windows drive letter paths (e.g., C:\path\file.txt)
  • Converts backslashes to forward slashes
  • Prefixes with file:/// for proper URL parsing
  • Updates WPT commit hash to latest test expectations with
    percent-encoded Unicode

Implementation

When a Windows file path is detected at the start of URL parsing:

  1. Check for drive letter pattern (C:) or UNC path (\server)
  2. Convert all backslashes to forward slashes
  3. Prefix with file:/// (drive letters) or file: (UNC)
  4. Continue with standard URL parsing

@mertcanaltin
Copy link
Author

mertcanaltin commented Oct 11, 2025

Per @annevk's guidance in whatwg/url#874, the implementation is now
complete:

Implementation Summary

Core Rules (Implemented):

  • Single ASCII letter + :\ → Windows file path (C:\path
    file:///C:/path)
  • Invalid drive patterns → Failure (CC:\path → TypeError)
  • UNC paths → file:// URL (\\server\share
    file://server/share)
  • Opaque paths support backslash (non-special:\\opaque is valid)

Test Results:

  • 5342/5381 tests passing (99.3%)
  • All core Windows path scenarios working
  • 38 edge case tests awaiting spec clarification

Changes Made:

  • lib/url-state-machine.js:536-574: Windows path pre-processing
    • Validates single ASCII letter drives
    • Rejects invalid drive patterns (multi-letter, non-alpha)
    • Converts backslashes to forward slashes
    • Handles UNC paths

WPT Tests: Updated in web-platform-tests/wpt#53459

Ready for review! 🚀

- Single ASCII letter + :\ → Windows file path
- Multi-letter drives (CC:\, ABC:\) → Failure
- Non-letter drives (1:\, @:\) → Failure
- Normal schemes with backslash → Opaque path (valid)

Per @annevk's feedback in whatwg/url#874
- Add 'u' flag to all regex patterns
- Use template literals instead of string concatenation
- Fix whitespace before property access
@mertcanaltin
Copy link
Author

I applied last changes @annevk f48453a

mertcanaltin added a commit to mertcanaltin/wpt that referenced this pull request Oct 18, 2025
Updated tests to reflect simplified Windows path handling:

1. Multi-letter drives (CC:\, ABC:\) are now parsed as normal URLs
   - CC:\path → scheme: cc, path: \path (not failure)
   - 1:\path → failure (schemes must start with ASCII letter)
   - @:\path → failure (@ not valid in scheme)

2. UNC paths without base URL should fail
   - \\server\share → failure (no special UNC handling)
   - UNC paths with file: base still work via relative parsing

This aligns with whatwg/url#874 guidance:
"Why would we not parse CC:\path as we do today?"

Only single ASCII letter + :\ should be treated as Windows file path.
Everything else uses normal URL parsing.

Related: whatwg/url#874, jsdom/whatwg-url#304
]
.map(async file => {
const res = await fetch(`${urlPrefix}${file}`);
await fs.writeFile(path.resolve(targetDir, file), res.body);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this change.

// Only convert single ASCII letter + :\ pattern (e.g., C:\, D:\)
// Note: Only backslash (\), not forward slash (/)
// Everything else goes through normal URL parsing
if (!stateOverride && !this.url.scheme && /^[a-zA-Z]:\\/u.test(this.input)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't match the spec text at https://github.com/whatwg/url/pull/874/files#diff-29243b3b9b716b55c6a61970b0c4864f464b139d397fb961a05bb6e1e2b97cabR2251 . Please translate the spec text directly.

// Handle Windows file paths if no state override
// Only convert single ASCII letter + :\ pattern (e.g., C:\, D:\)
// Note: Only backslash (\), not forward slash (/)
// Everything else goes through normal URL parsing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These comments are not useful (and do not appear in the spec).

// Everything else goes through normal URL parsing
if (!stateOverride && !this.url.scheme && /^[a-zA-Z]:\\/u.test(this.input)) {
const converted = this.input.replace(/\\/gu, "/");
this.input = `file:///${converted}`;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants