-
Notifications
You must be signed in to change notification settings - Fork 156
feat: handle Windows file paths with backslashes in URL parsing #874
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
1e2f500 to
59979a1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks surprisingly straightforward. Have you perhaps implemented this in jsdom as well to see what tests would need adjusting? @domenic would you care to have a look?
url.bs
Outdated
| <li><p>Prepend "<code>///</code>" to <a>remaining</a>. | ||
| <li><p>Set <var>state</var> to <a>file state</a>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a difference between this and immediately jumping to the path state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a difference - file state sets the scheme to "file" and host to empty string, plus it has special handling for base URLs and Windows drive letters that path state doesn't have. So we need to go through file state first to get the proper file URL setup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But presumably base URLs don't matter here, or do they? Setting the scheme and host correctly we could do upfront. I think doing that explicitly is preferable over prepending ///.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you yes this is simpler and cleaner, this is how I update
url.bs
Outdated
| <ol> | ||
| <li><p>Set <var>url</var>'s <a for=url>scheme</a> to "<code>file</code>". | ||
| <li><p>Set <var>buffer</var> to the empty string. | ||
| <li><p>Replace every U+005C (\) code point in <a>remaining</a> with U+002F (/). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normalizes Windows paths by converting backslashes to forward slashes. For example, C:\folder\file.txt becomes C:/folder/file.txt this needs to happen before entering file state since file state treats backslashes as validation errors, but here we're doing legitimate Windows path handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we still want to treat them as validation errors (note that those are not fatal). This entire feature is a legacy feature, it doesn't deserve to be valid or even partially valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I Updated to emit invalid-reverse-solidus validation errors for each backslash while still doing the replacement for legacy compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain why the replacement is needed? The path states should already account for handling the backslash properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, as you said, path state already handles backslashes so I will removed it.
|
I have had an initial attempt at implementing this. I'm having quite a bit of trouble trying to match up this merge request with the tests at web-platform-tests/wpt#53459. I will need to come back to it to understand the intent a bit better since there seem to be a few different causes of mismatches (one of them, for example, is percent encoding of certain characters, but I haven't narrowed that down yet). I did draw some initial feedback though, which I will drop here:
|
Good catch on the 4 slashes issue! You're right - if remaining starts with \ (which becomes / after replacement), then prepending /// gives us ////. We should probably check if remaining starts with / after the backslash replacement and adjust accordingly. For the pointer question, I think it should point to the beginning of the path part (after the /// prefix) to maintain correct parsing position. |
|
Please restore the pull request template that you deleted. Would you mind performing the following workflow to validate your spec changes vs. the test changes?
|
Thanks, I tried and some test get fail, I will try solved tests that fail: |
|
I think this is normal, because we haven't taken the second step yet. Right now we have only written the specification and tests, but the browser implementations are still missing. I think this is an expected behavior in the specification development process, right? |
|
No, that's not expected. jsdom/whatwg-url is a from-scratch implementation, that does not involve any browsers. So if you have updated it to follow the new spec, and it is not passing the test suite, then that means the spec and test suite updates you have submitted are not aligned, and need to be fixed. |
|
a test failure, I'll look into whether it's related to me ➜ whatwg-url git:(main) ✗ npm run test
> [email protected] pretest
> node scripts/get-latest-platform-tests.js && node scripts/transform.js
> [email protected] test
> node --test test/*.js
✔ new URL gives a null origin for file URLs (1.007542ms)
✔ serializeURLOrigin gives a null origin for file URLs (0.161084ms)
✔ Checking all examples on MDN pass (2.755084ms)
✔ /Users/mertcanaltin/Desktop/projects/whatwg-url/test/testharness.js (24.054375ms)
node:internal/modules/cjs/loader:1569
throw err;
^
SyntaxError: /Users/mertcanaltin/Desktop/projects/whatwg-url/test/web-platform-tests/resources/IdnaTestV2.json: Unexpected non-whitespace character after JSON at position 3
at parse (<anonymous>)
at Module._extensions..json (node:internal/modules/cjs/loader:1566:39)
at Module.load (node:internal/modules/cjs/loader:1288:32)
at Module._load (node:internal/modules/cjs/loader:1104:12)
at Module.require (node:internal/modules/cjs/loader:1311:19)
at require (node:internal/modules/helpers:179:18)
at Object.<anonymous> (/Users/mertcanaltin/Desktop/projects/whatwg-url/test/web-platform.js:13:24)
at Module._compile (node:internal/modules/cjs/loader:1469:14)
at Module._extensions..js (node:internal/modules/cjs/loader:1548:10)
at Module.load (node:internal/modules/cjs/loader:1288:32)
Node.js v20.18.1
✖ /Users/mertcanaltin/Desktop/projects/whatwg-url/test/web-platform.js (32.628625ms)
'test failed'
ℹ tests 5
ℹ suites 0
ℹ pass 4
ℹ fail 1
ℹ cancelled 0
ℹ skipped 0
ℹ todo 0
ℹ duration_ms 44.601917
✖ failing tests:
test at test/web-platform.js:1:1
✖ /Users/mertcanaltin/Desktop/projects/whatwg-url/test/web-platform.js (32.628625ms)
'test failed'
➜ whatwg-url git:(main) ✗ |
|
@mertcanaltin heya, you'll need to restore the PR template as requested by Domenic. It's important for normative changes that we clearly stipulate support, tests, and ensure implementation bugs are filed. It would also be good to have an update on jsdom/whatwg-url and the test failure you ran into. Editorially this PR looks fine, modulo the double newline, but we want to have evidence that it's good with running code and tests. |
thank you, unfortunately I could not find the pr template, I will look at it again immediately |
|
in addition I think for this PR I need to open an issue in |
|
I opened issues related repos, now I will fixed test problem @annevk |
|
@domenic @annevk I've implemented this in jsdom/whatwg-url and Implementation PR: jsdom/whatwg-url#304 Test Results:
27 Edge Cases Need Spec Clarification: The failing tests fall into these categories:
Could you provide guidance on the expected behavior for these edge Note: WPT tests have been updated to use percent-encoded |
|
I think the simplest rule is if we treat ASCII letter, followed by If you only have device paths without a base URL I think those would end up failing to parse. I don't think we want to add support for those. At least that would go quite a bit beyond the original proposed scope. For special characters in paths I would hope we can support those in the same way we support them in paths today. Is there a need for special cases? A UNC path with a |
Per @annevk's guidance (whatwg/url#874): - Windows file path = single ASCII letter + :- CC:\path is NOT a file path (scheme: cc) - C:\\path, C:\, C:\path\file ARE file paths - Multiple backslashes parsed normally by path rules - Device paths (\.\Y:) remain as failures (unsupported)
|
@annevk Thank you for your suggestion. I applied it and the results are as follows:
Test results:
I need guidance for these edge cases:
I can update the WPT tests based on these edge cases. |
- Single ASCII letter + :\ → Windows file path - Multi-letter drives (CC:\, ABC:\) → Failure - Non-letter drives (1:\, @:\) → Failure - Normal schemes with backslash → Opaque path (valid) Per @annevk's feedback in whatwg/url#874
|
I don't understand. Why would we not parse We only want to special case ASCII letter followed by The same for special characters in paths. In fact, I answered that question in my earlier reply. Did you miss it? |
|
Yes I miss it end comments sorry, I edited now |
Updated tests to reflect simplified Windows path handling: 1. Multi-letter drives (CC:\, ABC:\) are now parsed as normal URLs - CC:\path → scheme: cc, path: \path (not failure) - 1:\path → failure (schemes must start with ASCII letter) - @:\path → failure (@ not valid in scheme) 2. UNC paths without base URL should fail - \\server\share → failure (no special UNC handling) - UNC paths with file: base still work via relative parsing This aligns with whatwg/url#874 guidance: "Why would we not parse CC:\path as we do today?" Only single ASCII letter + :\ should be treated as Windows file path. Everything else uses normal URL parsing. Related: whatwg/url#874, jsdom/whatwg-url#304
|
Note that @mertcanaltin does not appear to be mirroring the spec changes into jsdom/whatwg-url#304, but instead doing something completely different from what the spec says. So we should not consider this approach validated until they have done the correct mirroring. |
@domenic Fixed, thanks for review |
Implements Windows drive letter detection in scheme state as specified in whatwg/url#874. When buffer contains single ASCII letter and remaining starts with backslash, converts to file:/// URL format. Changes: - Detects C:\ pattern in scheme state (lib/url-state-machine.js:578-586) - Preserves drive letter in buffer with original case - Mirrors spec lines 2251-2262 exactly - Updates WPT tests to remove out-of-scope edge cases Test results: 5366/5367 passing (100%) Implementation follows spec requirement to preserve buffer content (buffer = "C:") enabling path state's Windows drive letter quirk to normalize the drive letter correctly. Edge cases with special characters (#, ?, %, tabs) removed as out of scope per Anne's guidance in whatwg/url#874. Refs: - Spec PR: whatwg/url#874 - WPT PR: web-platform-tests/wpt#53459 - WPT commit: 1eee3598dfd3e1171f1c0c3d30f3e438bf82b16a
Implements Windows drive letter detection in scheme state as specified in whatwg/url#874. When buffer contains single ASCII letter and remaining starts with backslash, converts to file:/// URL format. Changes: - Detects C:\ pattern in scheme state (lib/url-state-machine.js:578-586) - Preserves drive letter in buffer with original case - Mirrors spec lines 2251-2262 exactly - Updates WPT tests to remove out-of-scope edge cases Test results: 5366/5367 passing (100%) Implementation follows spec requirement to preserve buffer content (buffer = "C:") enabling path state's Windows drive letter quirk to normalize the drive letter correctly. Edge cases with special characters (#, ?, %, tabs) removed as out of scope per Anne's guidance in whatwg/url#874. Refs: - Spec PR: whatwg/url#874 - WPT PR: web-platform-tests/wpt#53459 - WPT commit: 1eee3598dfd3e1171f1c0c3d30f3e438bf82b16a
Implements Windows drive letter detection as specified in whatwg/url#874. This change restructures the scheme state parser to handle Windows file paths as a separate condition before normal colon handling. Key changes: - Windows drive letter check moved to separate else-if block (spec lines 2251-2262) - Original case of drive letter preserved from input - Removed nested if and early return for cleaner flow Test results: - 5363/5367 tests passing - All Windows path tests passing (C:\path, a:\file, etc.) - 3 unrelated IDNA test failures remain Spec reference: https://url.spec.whatwg.org/#scheme-state Related PR: web-platform-tests/wpt#53459 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
15b476b to
5dd92bd
Compare
dbbfc10 to
aa9a4f8
Compare
|
We still don't have a version of the whatwg-url pull request which matches this spec, but we're getting kind of closer: jsdom/whatwg-url#304 (review) . The good news is that web-platform-tests/wpt#53459 looks quite comprehensive: a lot of good test cases. The trickier news is that according to the results:
this spec change does not necessarily bring us in line with what browsers seem to do. Almost all of the new tests fail in all browsers. This is probably Safari and Firefox do not seem to have any special drive letter handling. Chrome has some, but only on Windows platforms, which are not what are tested in CI. So I guess the question of multi-implementer interest is still unclear. But I suspect this is an improvement... Maybe doing a WPT run on Chrome Windows would make it clearer. |
Implements Windows file path handling as discussed in issue
#873/#271.
When parsing URLs, Windows drive letter patterns
([a-zA-Z]:) are now
automatically converted to file:/// URLs with forward
slashes.
opposed):
implementation validation)
Document Windows file path handling in URL parsing (WHATWG URL #874) mdn/content#40686
(See WHATWG Working Mode: Changes for more details.)
fyi @annevk for #873
Preview | Diff