-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support URLencoding during normalization #396
Conversation
af17cb7
to
6c12c74
Compare
6f73d37
to
e421bfd
Compare
commons-vfs2/src/main/java/org/apache/commons/vfs2/provider/UriParser.java
Outdated
Show resolved
Hide resolved
private int cursor = 0; | ||
private int lastSeparator; | ||
private int end; | ||
PathNormalizer(StringBuilder path) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use final where you can.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @raboof
A few nits 😉
boolean reading = true; | ||
while (reading) { | ||
reading = readNonSeparator(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simpler to say `while (readNonSeparator()); ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, it seems PMD won't let me
} | ||
return false; | ||
} | ||
private boolean readDot() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Add a blank line b/w methods.
commons-vfs2/src/main/java/org/apache/commons/vfs2/provider/UriParser.java
Outdated
Show resolved
Hide resolved
e421bfd
to
8f33bb3
Compare
Earlier, the path would be URL-decoded after normalization. This meant some opportunities for normalization would be missed when dots or slashes were URL-encoded. One way to solve this would be to change the ordering of operations, but that seems risky as it would change the meaning of some of the abstractions. Because of that I opted for a more conservative approach, where normalization will take into account URL-encoded characters, but otherwise leave the input string intact.
8f33bb3
to
f1611bf
Compare
@raboof |
Ah, rebase mistake... thx for the fix, not sure what's going on with |
Might need some more reviewing to make sure this logic is correct
444ac4a
to
31e0882
Compare
Run mvn before you push to catch build errors 😉 |
31e0882
to
ec42e65
Compare
01b8934
to
c95b48d
Compare
I know ;) - some of the tests are failing or very slow on my machine (even on the main branch), probably because I'm running in a sandbox, so I've been running individual targets - sorry about the noise ;) . Now getting 500's that are not our fault it looks like though, https://www.githubstatus.com/ Do you want me to squash the commits above? |
Don't worry about the noise 😉 |
Looks like we'll have to wait for GH to fix itself... |
Re-running builds now that GH is back up... |
The reverts apache#396 and related changes and implements the same in a simpler way by replacing the encoded characters already in `fixSeparators`. This approach has a slightly higher risk at breaking existing behaviour, but a lower risk of remaining problems in this part of the codebase. All testcases still succeed. This PR is intended to replace apache#543 and apache#555 This reverts commit cb45c94. This reverts commit 5399c76.
The reverts apache#396 and related changes and implements the same in a simpler way by replacing the encoded characters already in `fixSeparators`. This approach has a slightly higher risk at breaking existing behaviour, but a lower risk of remaining problems in this part of the codebase. All testcases still succeed. This PR is intended to replace apache#543 and apache#555. It includes the testcases from apache#543, adapted to the behaviour before apache#396. This reverts commit cb45c94. This reverts commit 5399c76. Co-Authored-By: Anthony Goubard <[email protected]>
The reverts apache#396 and related changes and implements the same in a simpler way by replacing the encoded characters already in `fixSeparators`. This approach has a slightly higher risk at breaking existing behaviour, but a lower risk of remaining problems in this part of the codebase. All testcases still succeed. This PR is intended to replace apache#543 and apache#555. It includes the testcases from apache#543, adapted to the behaviour before apache#396. This reverts commit cb45c94. This reverts commit 5399c76. Co-Authored-By: Anthony Goubard <[email protected]>
The reverts apache#396 and related changes and implements the same in a simpler way by replacing the encoded characters already in `fixSeparators`. This approach has a slightly higher risk at breaking existing behaviour, but a lower risk of remaining problems in this part of the codebase. All testcases still succeed. This PR is intended to replace apache#543 and apache#555. It includes the testcases from apache#543, adapted to the behaviour before apache#396. This reverts commit cb45c94. This reverts commit 5399c76. Co-Authored-By: Anthony Goubard <[email protected]>
* Simplify UriParser The reverts #396 and related changes and implements the same in a simpler way by replacing the encoded characters already in `fixSeparators`. This approach has a slightly higher risk at breaking existing behaviour, but a lower risk of remaining problems in this part of the codebase. All testcases still succeed. This PR is intended to replace #543 and #555. It includes the testcases from #543, adapted to the behaviour before #396. This reverts commit cb45c94. This reverts commit 5399c76. Co-Authored-By: Anthony Goubard <[email protected]> * Add benchmark for UriParser --------- Co-authored-by: Anthony Goubard <[email protected]>
Earlier, the path would be URL-decoded after normalization. This meant some opportunities for normalization would be missed when dots or slashes were URL-encoded.
One way to solve this would be to change the ordering of operations, but that seems risky as it would change the meaning of some of the abstractions. Because of that I opted for a more conservative approach, where normalization will take into account URL-encoded characters, but otherwise leave the input string intact.