Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add must-use-output attribute #3773

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

Kixunil
Copy link

@Kixunil Kixunil commented Feb 16, 2025

This adds an RFC proposing a new attribute that would help detect mistakes when a parameter of a function is not used after the function returns. E.g. when inserting into a collection that is then immediately dropped.

Cc @joshtriplett

Rendered

This adds an RFC proposing a new attribute that would help detect
mistakes when a parameter of a function is not used after the function
returns. E.g. when inserting into a collection that is then immediately
dropped.
Comment on lines 37 to 38

Note also that some functions may be useless to call even when they take in truly immutable (`Freeze`) references. For instance `clone` is wasteful if the original is not accessed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Note also that some functions may be useless to call even when they take in truly immutable (`Freeze`) references. For instance `clone` is wasteful if the original is not accessed.

This is true, but I don't think it's on-topic for this RFC, and addressing it would need some additional care to avoid the annoying ripple-effect problem of "don't write the clone on the last use". Can we drop it from this RFC?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we mention it but don't propose to mark it just yet? Maybe some library has a use for it in some other context and I don't think adding code to ban Freeze references is worth it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Kixunil I'm not suggesting adding code to ban it, just that it's something different and may not want to use the same handling. Could you move it to one of the later notes sections (e.g. future work), then, along with the note that this isn't proposing to mark such values?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused. If we allow it right away and there's a clear use case with clone-like functions, which people could write how is it a "future possibility"? Note that while I wrote clone in the list below, I can simply remove it from the list and reword this to:

Note also that some functions may be useless to call even when they take in truly immutable (Freeze) references. For instance clone is wasteful if the original is not accessed, however this RFC is not proposing adding the attribute to core's Clone::clone just yet.

And put "Add the attribute to clone" in "Future possibilities"

This was overlooked in initial PR.
@joshtriplett joshtriplett added T-lang Relevant to the language team, which will review and decide on the RFC. I-lang-nominated Indicates that an issue has been nominated for prioritizing at the next lang team meeting. A-lint Proposals relating to lints / warnings / clippy. labels Feb 16, 2025
These address the review comments where the author completely agrees
with the reviewer.
@Kixunil
Copy link
Author

Kixunil commented Feb 16, 2025

I've incorporated all suggestions I completely agree with.

This addresses all comments raised by Kevin Reid and one by Josh
Triplett.
@Kixunil
Copy link
Author

Kixunil commented Feb 16, 2025

Thanks @kpreid ! I've addressed your feedback. Also @joshtriplett thanks for your review so far, I have also addressed the other rewording.

- Make it a `clippy` lint instead. However not everyone uses `clippy` and the need to communicate which arguments are considered important would diminish its effect. `#[must_use]` is a native rustc lint, and this should be as well, using the same machinery.
- Try to somehow analyze the code and not require the attribute. This seems hard and it could lead into libraries accidentally triggering warnings in users code if they change the body of the function.
- Implement a more full feature — e.g. by doing some things in "Future possibilities" section. However, this feature should be useful even without them.
- Have the attribute on the function instead listing the names of paramters. This could make it nicer to extend to support the "or" relationship described in "Future possibilities".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Have the attribute on the function instead listing the names of paramters. This could make it nicer to extend to support the "or" relationship described in "Future possibilities".
- Have the attribute on the function instead listing the names of parameters. This could make it nicer to extend to support the "or" relationship described in "Future possibilities".

[rationale-and-alternatives]: #rationale-and-alternatives

- We could also simply not do this but the potential mistakes it catches are real.
- The name could be something different and preferably shorter. The name used here was suggested by Josh Triplett and is pretty readable and it can be used before stabilization in the absence of better one. A pretty short and clear `#[must_read]` was also suggested by Kevin Reid.
Copy link
Member

@joshtriplett joshtriplett Feb 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record, while I mildly prefer must_use_output, I would check a box for must_read. I do think it somewhat has the problem that must_use would (it could be read as applying to the body of the function rather than to the caller), but it does seem slightly better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe #[caller_must_use]?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kennytm Sadly still pretty long but I like it better than must_use_output. It at least doesn't pose "what do you mean by the output?" question.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Kixunil Agreed. must_use and must_read don't answer "who". must_use_output specifies that indirectly: "the one that gets this as an output must use it". But caller_must_use specifies it directly. That seems pretty clear to me.

@clarfonthey
Copy link

While I like the idea of this feature, I do wonder how many custom attributes we're going to have to add to the language before we just decide to reintroduce the idea of labeling functions as pure.

@Kixunil
Copy link
Author

Kixunil commented Feb 27, 2025

@clarfonthey note that if we had pure attribute/keyword it wouldn't help here because, according to Wikipedia:

the function has no side effects (no mutation of local static variables, non-local variables, mutable reference arguments or input/output streams)

An attribute that disallows side effects except for allocation and mutation of parameters might be interesting but we would need to invent a name for it. :)

@clarfonthey
Copy link

clarfonthey commented Feb 28, 2025

I personally think that it's okay to consider a function pure if its mutations are explicitly defined; the issue with purity is when they're not defined, i.e. external state.

There isn't much of a distinction between taking and returning a value versus taking a mutable reference, other than calling convention. The idea of purity really boils down to whether a function call can be removed if its output is not used, which is what's relevant here.

Sure, maybe calling it pure is a bad idea, and maybe const isn't good enough either, but we're definitely approaching the point where typing pure fn is a whole lot easier than adding multiple attributes.


Note also that under this model, stuff like must_use are still useful, but useful for cases when it's actually not creating dead code and instead suggesting to the user they should handle extra cases. So, pushing to a vector that isn't used means the entire lifetime of the vector can be deleted, whereas ignoring an error still means the function is run.

The other benefit of this is that, with an actual language feature for this, we can do complete dead code analysis, whereas at the moment, the best we can do is emit one lint at the final call site if the code path ends.


- We could also simply not do this but the potential mistakes it catches are real.
- The name could be something different and preferably shorter. The name used here was suggested by Josh Triplett and is pretty readable and it can be used before stabilization in the absence of better one. A pretty short and clear `#[must_read]` was also suggested by Kevin Reid.
- We could write `#[must_use]` on a reference parameter instead. The downside would be that this could be mistaken for saying that the *callee* must use the parameter, rather than the *caller*.
Copy link
Contributor

@traviscross traviscross Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem like much of a problem to me. We already lint on function parameters that are unused within the callee1, so if I saw #[must_use] on a &mut parameter, I'd expect this would refer to an obligation on the caller, especially as that's the existing connotation of #[must_use].

Barring better proposals (and hopefully more concise ones than #[must_use_output]), my own preference is to just use #[must_use] for this.

Footnotes

  1. Except for self, unfortunately, due to not having a good way to suppress it, but realistically we would never use #[must_use] &self in this way, to require something of the callee.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I also feel like it wouldn't be that big deal and people would figure it out quickly. But at the same time I'm not bothered by the name too much and I respect whatever decision will be made by the appropriate team.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also prefer #[must_use]. When applied to a function, it also means "the output at the call site must be used", and people don't have a problem understanding it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the attribute could be named differently, something along the lines of the function having side effect that needs to be observed #[must_use(side_effect)] or #[is_a_setter]?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a couple of parallel discussions occurring about the name, one here and one at #3773 (comment) .

From that discussion: caller_must_use seems like it removes the ambiguity.

@Kixunil
Copy link
Author

Kixunil commented Mar 7, 2025

@clarfonthey I agree pretty much completely and I think excluding mutation through pointer arguments was a bad idea when someone coined the term pure. But the term exists and if possible I'd prefer a different one, if possible, to avoid confusion.


Have a way to explain to the compiler how smart pointers work so that this can be used on `Pin` as well.

Have an attribute that can express "or" relationship between parameters and return types. For instance, `Vec::pop` is sometimes used to remove the last element and then pass the vec to something else (e.g. trim `\n` at the end) and sometimes it's used to get the last value and the vec is no longer relevant. Someting like `#[must_use_at_least_one(self, return)]` on function could handle this case.
Copy link

@Aloso Aloso Mar 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should just be the default behaviour when a function has multiple outputs (e.g. an output parameter as well as a return type) that should be used. You have to use at least one, not all of them.

P.S. This is another reason why I prefer the name #[must_use] instead of #[must_use_output], because it is conceptually the same thing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should just be the default behaviour when a function has multiple outputs (e.g. an output parameter as well as a return type) that should be used. You have to use at least one, not all of them.

Then we would need a way to override it in the other direction, because there are absolutely functions for which you need to use all the output parameters.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, to give a specific example: core::mem::swap.

But also for write and read you should use the returned value specifically. Just using the writer is not enough.

@Aloso
Copy link

Aloso commented Mar 9, 2025

Related: #[must_use] should go to the return type:

fn foo() -> #[must_use] Output;

This feels more consistent with #[must_use_output] on function parameters.

This would also address the problem that you can't mark part of a return type as #[must_use]:

#[must_use]
fn read() -> Result<usize> {...}

read()?; // does not trigger the lint

What we want is this:

fn read() -> Result<#[must_use] usize> {...}

@Kixunil
Copy link
Author

Kixunil commented Mar 10, 2025

@Aloso in principle yes but there isn't much we can do now about it apart from having it as an alternative way to specify the atribute.

@ahicks92
Copy link

I'm not sure where specifically in the RFC to comment but the RFC and this thread both don't seem to be mentioning it. As presented this will warn on data structures which impl Drop (it has to, otherwise the vec example at the top cannot work), but it is entirely possible to use a data structure to hold RAII guards (as one example). That is:

struct FancyDataStructure {
    something_atomic: AtomicU64,
    lock: Mutex<()>,
}

fn foo() {
    // snip...
    let mut guards = vec![];
    for thing in structures{
        guards.push(thing.lock.lock().unwrap());
    }

    // Don't touch guards, but when we drop they'l unlock.
}

As far as I can tell that is roughly equivalent to the first example but we actually do for real rely on the drop only. If you're wondering why you'd do something like this, it can come up in concurrent data structures when it is safe for readers to access the other fields.

As another possible write-only do-not-consume case, consider File where you are passing raw fds off to something else but need the File objects to stay around.

The trouble is that in order to do this generically--at least as far as I see--the rule has to become more complex. Something like "warn if the container's type parameters do not impl Drop" or something. Otherwise, you get false positives in these cases. Consider a crate like smallvec, which is what I would really use for cases like these; it has to extend to that.

There's the obvious rule "if any type parameter impls Drop then ignore it" but that immediately fails out on Vec<Box<Anything>> or Vec<Vec<Anything>>.

I kind of agree with @clarfonthey in that this is sort of poor man's pure. As written you can basically only use it on pure functions. If it can't handle the Drop case then even putting it in std doesn't work unless the function is pure.

Of course I imagine there'd be some way to turn this off if it's a false positive. Maybe people feel that we should make people do that. I have weak opinions on this whole thing in general: I see no harm in it but have never personally run into a case where it matters.

@kennytm
Copy link
Member

kennytm commented Mar 12, 2025

@ahicks92 in typical Rust that example will be written not as a for loop but a map+collect:

let guards = structures.iter()
    .map(|s| s.lock.lock().unwrap())
    .collect::<Vec<_>>();

which currently rustc will warn #[warn(unused_variables)]. The standard solution is to name it with a leading underscore:

let _guards = structures.iter()
    .map(|s| s.lock.lock().unwrap())
    .collect::<Vec<_>>();

I think the situation should be the same here. If we require to use Vec::push and don't want the warning and don't want to use #[expect] or drop(guards), maybe the RFC should support recognizing variables with a leading underscore:

    let mut _guards = vec![];
    for thing in structures{
        _guards.push(thing.lock.lock().unwrap()); 
        // suppress warning since `Vec::push` is called on `&mut _guards`?
    }

@ahicks92
Copy link

_ prefixes don't play well with this, especially since they are taught to people as unused variable:

let mut v = vec![];
v.push(something);
if let Some(front) = v.front() {
    // stuff.
}
v.push(another_thing);

Where does the _ go?

let _ = whatever.push(...) under the same logic as Result might be the best option, otherwise as far as I can tell you have to allow() it somewhere at least with current mechanisms. For this I'd at least like the mitigation/override to be on/near the line that triggers it.

I also feel the need to push back on the "not idiomatic" response. I started with significant compiler contributions in 2017 and have spent the last 4 years doing Rust full-time. I do understand that iterators are in some sense better and sometimes faster, I do understand that most experienced Rust people use them, but I could also make an argument for why I don't always do that rooted primarily in interacting with polyglot teams. In this case though, the goal is illustrating a problem I see with the RFC in a straightforward manner, not coming up with something idiomatic or arguing about it. In fact "learn to use iterator chains, and it won't warn" is exactly the opposite of what I would want to do to anyone--that's actively hostile if it's the solution, whether that hostility is intended or not, and in addition isn't something that Rustc could advise people about.

@kennytm
Copy link
Member

kennytm commented Mar 13, 2025

@ahicks92 I haven't used the word "idiomatic" in my response ("standard solution" is talking about suppressing the unused_variable lint), I'm just pointing out if this is written in terms of iterators instead of for-loop, we see that rustc provided one escape hatch which this RFC may copy. I'm sorry if you find it confused.

In more complex situation one should #[expect] the lint to clearly convey the intention why it's ok v is no longer used.

let mut v = vec![];
v.push(something);
if let Some(front) = v.front() {
    // stuff.
}
#[expect(unused_must_use, reason="v takes ownership of another_thing which is a lock, ensuring it won't be released until v is dropped at the end of function")]
v.push(another_thing);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-lint Proposals relating to lints / warnings / clippy. I-lang-nominated Indicates that an issue has been nominated for prioritizing at the next lang team meeting. T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants