Skip to content

Add normalize_lexically to Path #396

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Tracked by #134694
ChrisDenton opened this issue Jun 16, 2024 · 7 comments
Closed
Tracked by #134694

Add normalize_lexically to Path #396

ChrisDenton opened this issue Jun 16, 2024 · 7 comments
Labels
ACP-accepted API Change Proposal is accepted (seconded with no objections) api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api

Comments

@ChrisDenton
Copy link
Member

ChrisDenton commented Jun 16, 2024

Proposal

Problem statement

For Unix platforms, we take pains to warn about the dangers of naively resolving .. components (i.e. resolving /path/to/../file as /path/file). However, that doesn't mean it's never useful. Sometimes when working within a subdirectory we don't intend to follow .. links. Also people have a habit of using a literal .. when they really did mean pop(). If nothing else, providing a function for this case can be a good hook to add documentation on the issue in a central location.

Motivating examples or use cases

Say you have a base path and you want the user to be able to use paths below it.

// You've already checked that the user_path is not a `/` root path and does not have any prefix
// but this still has issues because even a relative path may escape the base path
let subpath = base_path.join(user_path);

Solution sketch

Have a function that removes .. components from the path, in addition to the usual normalization that the components iterator does (such as normalizing separators).

impl Path {
    // normalizes in place, avoiding an allocation.
    pub fn normalize_lexically(&mut self) -> Result<&mut Self, NormalizeError>;
}

Or:

impl Path {
    // more convenient but always allocates.
    pub fn normalize_lexically(&self) -> Result<PathBuf, NormalizeError>;
}

Either way, this would return an error if the Path contains left over .. components. I.e. path\..\..\to\file resolves to ..\to\file. It could also error if it resolves to the empty path (less sure about this but unexpectedly empty paths can be a footgun).

Alternatives

  • Instead of returning a Result, we could collect any left over .. components and place them at the beginning of the path.
  • The current name is chosen to be a bit weird so as to highlight that this is a potentially dangerous operation. Maybe another name could be chosen.

Links and related work

What happens now?

This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.

Possible responses

The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):

  • We think this problem seems worth solving, and the standard library might be the right place to solve it.
  • We think that this probably doesn't belong in the standard library.

Second, if there's a concrete solution:

  • We think this specific solution looks roughly right, approved, you or someone else should implement this. (Further review will still happen on the subsequent implementation PR.)
  • We're not sure this is the right solution, and the alternatives or other materials don't give us enough information to be sure about that. Here are some questions we have that aren't answered, or rough ideas about alternatives we'd want to see discussed.
@ChrisDenton ChrisDenton added api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api labels Jun 16, 2024
@ChrisDenton ChrisDenton changed the title Extend the Path API with some useful helper functions. Add normalize_lexically to Path Jul 23, 2024
@ChrisDenton
Copy link
Member Author

ChrisDenton commented Jul 23, 2024

I've updated this ACP based on notes from the libs-api meeting. It concentrates on what I'm now calling normalize_lexically which seemed to have general support although there was some uncertainty about the naming and some of the details.

@pitaj
Copy link

pitaj commented Jul 24, 2024

Should this normalize all separators to MAIN_SEPARATOR?

@kennytm
Copy link
Member

kennytm commented Jul 24, 2024

I think that is implied by "in addition to the usual normalization that the components iterator does", since if you path.components().collect::<PathBuf>() they are joined using MAIN_SEPARATOR_STR.

@ChrisDenton
Copy link
Member Author

I've added that link to the ACP. I've also added some links for other languages.

@kennytm
Copy link
Member

kennytm commented Jul 24, 2024

I've checked all linked implementation from other languages and they all behave the same regarding left-over ..:

Implementation a/../../b /a/../../b ../a/../../b
Go path.Clean ../b /b ../../b
Java Path.normalize ../b /b ../../b
Node.js path.normalize ../b /b ../../b
C++ lexically_normal ../b /b ../../b

@the8472
Copy link
Member

the8472 commented Jul 24, 2024

Sometimes when working within a subdirectory we don't intend to follow .. links.

Depending on application it'd probably be safer to have some sort of join_beneath or similar where you specify a trusted prefix and some untrusted suffix and it would only normalize the suffix as long as it does not ascend out of the prefix.

E.g. https://docs.rs/safe-path/0.1.0/safe_path/fn.scoped_resolve.html

@Amanieu
Copy link
Member

Amanieu commented Sep 17, 2024

We discussed this in the @rust-lang/libs-api meeting today. We're happy to accept this with one slight modification: empty paths should just be allowed as-is rather than erroring. Errors should only be when trying to .. past the root.

We recognize that the behavior of erroring differs from other languages, but we believe that this behavior is more useful in practice for path validation.

@Amanieu Amanieu closed this as completed Sep 17, 2024
@Amanieu Amanieu added the ACP-accepted API Change Proposal is accepted (seconded with no objections) label Sep 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ACP-accepted API Change Proposal is accepted (seconded with no objections) api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api
Projects
None yet
Development

No branches or pull requests

5 participants