Support URLencoding during normalization #396

raboof · 2023-05-16T15:12:09Z

Earlier, the path would be URL-decoded after normalization. This meant some opportunities for normalization would be missed when dots or slashes were URL-encoded.

One way to solve this would be to change the ordering of operations, but that seems risky as it would change the meaning of some of the abstractions. Because of that I opted for a more conservative approach, where normalization will take into account URL-encoded characters, but otherwise leave the input string intact.

commons-vfs2/src/main/java/org/apache/commons/vfs2/provider/UriParser.java

garydgregory · 2024-01-08T13:44:38Z

commons-vfs2/src/main/java/org/apache/commons/vfs2/provider/UriParser.java

+        private int cursor = 0;
+        private int lastSeparator;
+        private int end;
+        PathNormalizer(StringBuilder path) {


Use final where you can.

garydgregory

Hi @raboof
A few nits 😉

garydgregory · 2024-01-08T13:46:21Z

commons-vfs2/src/main/java/org/apache/commons/vfs2/provider/UriParser.java

+            boolean reading = true;
+            while (reading) {
+                reading = readNonSeparator();
+            }


Simpler to say `while (readNonSeparator()); ?

hmm, it seems PMD won't let me

garydgregory · 2024-01-08T13:46:51Z

commons-vfs2/src/main/java/org/apache/commons/vfs2/provider/UriParser.java

+            }
+            return false;
+        }
+        private boolean readDot() {


Nit: Add a blank line b/w methods.

commons-vfs2/src/main/java/org/apache/commons/vfs2/provider/UriParser.java

Earlier, the path would be URL-decoded after normalization. This meant some opportunities for normalization would be missed when dots or slashes were URL-encoded. One way to solve this would be to change the ordering of operations, but that seems risky as it would change the meaning of some of the abstractions. Because of that I opted for a more conservative approach, where normalization will take into account URL-encoded characters, but otherwise leave the input string intact.

garydgregory · 2024-01-08T17:56:33Z

@raboof
Run mvn before you push to catch build errors. In this case, this PR breaks binary compatibility because it makes a previously public constructor private. I took the liberty of fixing to see if we can get to a green build 😉

raboof · 2024-01-08T18:19:13Z

Ah, rebase mistake... thx for the fix, not sure what's going on with DefaultFileSystemManagerTest.testResolveFileNameType, will look into that tomorrow.

Might need some more reviewing to make sure this logic is correct

garydgregory · 2024-01-08T23:15:56Z

Error:  src/main/java/org/apache/commons/vfs2/provider/UriParser.java:[528,1] (whitespace) FileTabCharacter: File contains tab characters (this is the first instance).

Run mvn before you push to catch build errors 😉

raboof · 2024-01-09T13:05:47Z

Run mvn before you push to catch build errors 😉

I know ;) - some of the tests are failing or very slow on my machine (even on the main branch), probably because I'm running in a sandbox, so I've been running individual targets - sorry about the noise ;) . Now getting 500's that are not our fault it looks like though, https://www.githubstatus.com/

Do you want me to squash the commits above?

garydgregory · 2024-01-09T13:27:15Z

Don't worry about the noise 😉
WRT squashing, up to you, I normally squash before merging unless there's a good reason not to do so.

garydgregory · 2024-01-09T13:28:16Z

Looks like we'll have to wait for GH to fix itself...

garydgregory · 2024-01-09T15:33:08Z

Re-running builds now that GH is back up...

The reverts apache#396 and related changes and implements the same in a simpler way by replacing the encoded characters already in `fixSeparators`. This approach has a slightly higher risk at breaking existing behaviour, but a lower risk of remaining problems in this part of the codebase. All testcases still succeed. This PR is intended to replace apache#543 and apache#555 This reverts commit cb45c94. This reverts commit 5399c76.

The reverts apache#396 and related changes and implements the same in a simpler way by replacing the encoded characters already in `fixSeparators`. This approach has a slightly higher risk at breaking existing behaviour, but a lower risk of remaining problems in this part of the codebase. All testcases still succeed. This PR is intended to replace apache#543 and apache#555. It includes the testcases from apache#543, adapted to the behaviour before apache#396. This reverts commit cb45c94. This reverts commit 5399c76. Co-Authored-By: Anthony Goubard <[email protected]>

* Simplify UriParser The reverts #396 and related changes and implements the same in a simpler way by replacing the encoded characters already in `fixSeparators`. This approach has a slightly higher risk at breaking existing behaviour, but a lower risk of remaining problems in this part of the codebase. All testcases still succeed. This PR is intended to replace #543 and #555. It includes the testcases from #543, adapted to the behaviour before #396. This reverts commit cb45c94. This reverts commit 5399c76. Co-Authored-By: Anthony Goubard <[email protected]> * Add benchmark for UriParser --------- Co-authored-by: Anthony Goubard <[email protected]>

raboof force-pushed the urldecoded-normalization branch 3 times, most recently from af17cb7 to 6c12c74 Compare May 16, 2023 16:47

raboof force-pushed the urldecoded-normalization branch 4 times, most recently from 6f73d37 to e421bfd Compare May 27, 2023 19:25

garydgregory reviewed Jan 8, 2024

View reviewed changes

commons-vfs2/src/main/java/org/apache/commons/vfs2/provider/UriParser.java Outdated Show resolved Hide resolved

garydgregory reviewed Jan 8, 2024

View reviewed changes

garydgregory requested changes Jan 8, 2024

View reviewed changes

raboof force-pushed the urldecoded-normalization branch from e421bfd to 8f33bb3 Compare January 8, 2024 14:39

raboof force-pushed the urldecoded-normalization branch from 8f33bb3 to f1611bf Compare January 8, 2024 14:47

Fix binary compatibility

aceac75

Fix DefaultFileSystemManagerTest

cff08f2

Might need some more reviewing to make sure this logic is correct

raboof marked this pull request as draft January 8, 2024 22:32

raboof force-pushed the urldecoded-normalization branch from 444ac4a to 31e0882 Compare January 8, 2024 22:44

Make PMD happy

ec42e65

raboof force-pushed the urldecoded-normalization branch from 31e0882 to ec42e65 Compare January 9, 2024 08:18

A few additional testcases

c95b48d

raboof force-pushed the urldecoded-normalization branch from 01b8934 to c95b48d Compare January 9, 2024 13:03

raboof marked this pull request as ready for review January 9, 2024 13:20

garydgregory merged commit cb45c94 into apache:master Jan 9, 2024
15 checks passed

garydgregory added a commit that referenced this pull request Jan 9, 2024

Support URLencoding during normalization #396

fb9642f

raboof mentioned this pull request Jun 25, 2024

Fix for incorrect detection of %2f as separator at a few places in UriParser #543

Closed

raboof mentioned this pull request Jun 26, 2024

Simplify UriParser #558

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support URLencoding during normalization #396

Support URLencoding during normalization #396

raboof commented May 16, 2023

garydgregory Jan 8, 2024

garydgregory left a comment

garydgregory Jan 8, 2024

raboof Jan 8, 2024

garydgregory Jan 8, 2024

garydgregory commented Jan 8, 2024 •

edited

Loading

raboof commented Jan 8, 2024

garydgregory commented Jan 8, 2024

raboof commented Jan 9, 2024

garydgregory commented Jan 9, 2024

garydgregory commented Jan 9, 2024

garydgregory commented Jan 9, 2024

Support URLencoding during normalization #396

Support URLencoding during normalization #396

Conversation

raboof commented May 16, 2023

garydgregory Jan 8, 2024

Choose a reason for hiding this comment

garydgregory left a comment

Choose a reason for hiding this comment

garydgregory Jan 8, 2024

Choose a reason for hiding this comment

raboof Jan 8, 2024

Choose a reason for hiding this comment

garydgregory Jan 8, 2024

Choose a reason for hiding this comment

garydgregory commented Jan 8, 2024 • edited Loading

raboof commented Jan 8, 2024

garydgregory commented Jan 8, 2024

raboof commented Jan 9, 2024

garydgregory commented Jan 9, 2024

garydgregory commented Jan 9, 2024

garydgregory commented Jan 9, 2024

garydgregory commented Jan 8, 2024 •

edited

Loading