-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: const ergonomics for NonZero<T> #3786
base: master
Are you sure you want to change the base?
RFC: const ergonomics for NonZero<T> #3786
Conversation
|
||
When writing test or example code and using hardcoded constants, you can omit the conversion into | ||
`NonZero<T>` - it is done implicitly at compile time. This only works with constant values (either | ||
`const` variables or literals like `1234`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think only literals (or negated literals) should automatically give you a NonZero<T>
, not const
items. otherwise you end up with fragile code due to a lack of type safety if the const
value could be changed to zero later (e.g. it's measuring the size of a file at compile time and you decide that that file should be empty for whatever valid reason).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Thinking out loud, the problem is exemplified by:
// constants.rs
const MIN_REDDIT_POSTS_USED_FOR_TRAINING: u64 = 100_000;
// loader.rs
fn load_posts(min_count: NonZero<u64>) { .. }
load_posts(crate::constants::MIN_REDDIT_POSTS_USED_FOR_TRAINING);
This coercion relationship would be iffy here because the person authoring/touching the constant could reasonably want to consider 0 a valid value. Yet the person authoring load_posts
reasonably might design an API that cannot be called with zero. If one day this constant becomes 0
, the breakage would be surprising.
If the author of MIN_REDDIT_POSTS_USED_FOR_TRAINING
wants zero to be considered invalid, they could simply make it:
// constants.rs
const MIN_REDDIT_POSTS_USED_FOR_TRAINING: NonZero<u64> = 100_000;
Thinking about the compatibility/adoption angle here - there may be constants that already exist today that are not NonZero
in the type system (due to issues such as the ones that motivate this RFC) but are logically non-zero, in which case the type of the constant would need to be updated to benefit. Which seems right and proper?
I will update the RFC to only apply to literals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed on literals only. Once you have a value, that value has a type, and it shouldn't change to another type (with a minor asterisk for subtyping). This is about how a type variable becomes a concrete type; it doesn't change what happens once something already has a type.
assert!(!item_fits_exactly_in_packaging(3)); | ||
assert!(item_fits_exactly_in_packaging(25)); | ||
assert!(!item_fits_exactly_in_packaging(999)); | ||
assert!(item_fits_exactly_in_packaging(1000)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will this feature work if the user has written this?
assert!(item_fits_exactly_in_packaging(1000u32));
i guess no because the u32
suffix forces the literal to have type u32
rather than any possible integers.
and what about literals with const-folding
item_fits_exactly_in_packaging(123 | 456);
// should produce `NonZeroU32::new(123).unwrap() | NonZeroU32::new(456).unwrap()`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent questions! Applying such a coercion in these cases does seem to raise more problems than the benefit it may bring (if any). I will amend the RFC to clarify that the proposed behavior here is quite conservative (unless further discussion in here brings up reasons to make some bolder moves).
For the 1000u32
case, the proposed behavior would be that the coercion is applied only if a single integer type can be identified for the coercion:
- If the literal is untyped (e.g.
123
), theT
must be unambiguously resolved from the target type. - If the literal is typed (e.g.
123u32
), theT
in the target type must either match or be inferred to the literal's type.
For the const-folding case, I believe we can have the behavior be conservative by having the coercion apply only to literals - if the source type is from an expression like 123 | 456
then we simply do not consider NonZero
coercion as a valid candidate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the source type is from an expression like
123 | 456
then we simply do not considerNonZero
coercion as a valid candidate.
As a current-editions behaviour I totally agree, but one Future Possibility that might be interesting to consider would be allowing rust to combine 123 | 456
to a 579
literal, and then apply the coercion to that.
That way you could write out whatever with literals and the compiler would just evaluate it like python does (using infinite precision if needed) then only converting back to a "real" rust type once it's given a specific type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That way you could write out whatever with literals and the compiler would just evaluate it like python does (using infinite precision if needed) then only converting back to a "real" rust type once it's given a specific type.
If we ever have literals convert to user-defined types, I would argue that Rust should not do operations to combine literals before converting to the user type, since that user type may not use standard arithmetic, e.g. if it is for carry-less arithmetic (or equivalently operations on 0b101 + 0b111 == 0b010
and 0b101 * 0b111 == 0b11011
. So, if Rust tried to do arithmetic on literals before conversion, it would produce the wrong result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do support
let x: Gf2x = 0b101 + 0b111;
I'd say it would be surprising to some portion of users whether x == Gf2x(0b10)
or x == Gf2x(0b1100)
.
The former is essentially how Rust currently infers the integer literal type, as evidenced in:
let x: i8 = 1000 - 999;
// error: literal out of range for `i8`, `#[deny(overflowing_literals)]` on by default
// linted on both "1000" and "999"
// suggesting we are treating this as `1000_i8 - 999_i8`.
Since we can't guarantee an operator being homomorphic for a custom type, indeed the conservative stance is the best, in which we only allow conversion on $:literal
(with arbitrary number of parentheses).
The question is whether -$:literal
should be allowed, because using the same argument the impl Neg for G
may be non-standard such that G(-x) != -G(x)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The question is whether
-$:literal
should be allowed, because using the same argument theimpl Neg for G
may be non-standard such thatG(-x) != -G(x)
Part of why the API I proposed has negation separated out is that Rust does treat negation specially:
pub const A: i8 = -0x80i8; // no error
pub const B: i8 = -(0x80i8); // no error, arguably a bug in rustc
pub const C: i8 = -{ 0x80i8 }; // error
so, if -MyType(x)
isn't the same as MyType(-x)
the conversion from literal to MyType
can see that the literal is negated (because that's passed into the literal conversion rather than calling Neg
) and do the appropriate value transformation.
# Prior art | ||
[prior-art]: #prior-art | ||
|
||
Exploration of other languages suggests that while refinement types like `NonZero` are common, they |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in terms of user-defined literals there is quite a big prior art not mentioned here which is C++11 (of course C++ also has implicit constructors and exceptions so they could have non_zero<uint32_t> v = 123;
even without UDL.)
back to Rust, we could for instance introduce a const trait
pub trait /*const*/ FromIntLiteral: Sized {
const fn from_int_literal(input: u128) -> Self;
}
impl<T: FromIntLiteral + ZeroablePrimitive> FromIntLiteral for NonZero<T> {
const fn from_int_literal(input: u128) -> Self {
Self::new(T::from_int_literal(input)).expect("input must be non-zero")
}
}
and then using non-suffixed integer literal would be equivalent to using FromIntLiteral
:
assert!(item_fits_exactly_in_packaging(1000));
// equivalent to
assert!(item_fits_exactly_in_packaging(const { FromIntLiteral::from_int_literal(1000_u128) }));
const NEGATIVE_ONE: NonZero<i32> = -1;
// equivalent to (assuming `impl const Neg for NonZero<i32>`)
const NEGATIVE_ONE: NonZero<i32> = -const { FromIntLiteral::from_int_literal(1_u128) };
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That trait could potentially be made more general by passing the constant as a &[u8]
in some specified endiannes (probably little?) so it could potentially represent numbers greater than 128 bits. It could also pass in whether it was negative with a separate argument.
TBH what I really want for this is a |
Languages with prior art for this (albeit none restricting it to compile time) include Swift and Haskell (and some of its derivatives, like Agda). Julia also has a precedent but by different means (user-defined conversion and promotion semantics). |
I think that would need the source text including the sign to be passed into the maybe have an API like: pub ~const trait FromIntLiteral<const SUFFIX: str>: Sized {
~const fn parse<E, F: ~const FnOnce(fmt::Arguments<'_>) -> E>(negative: bool, text: &str, parse_error: F) -> Result<Self, E>;
}
pub ~const trait FromFloatLiteral<const SUFFIX: str>: Sized {
~const fn parse<E, F: ~const FnOnce(fmt::Arguments<'_>) -> E>(negative: bool, text: &str, parse_error: F) -> Result<Self, E>;
}
// maybe have `FromStringLiteral` too? that way you can try to parse things at runtime without panicking on parse failure, and you can just pass in a function that panics at compile time. example: pub struct UInt<const N: usize>(...);
pub struct SInt<const N: usize>(...);
pub struct Expr<T>(...);
impl ~const FromIntLiteral<*"hdl_u12"> for Expr<UInt<12>> {
...
}
impl ~const FromIntLiteral<*"hdl_i5"> for Expr<SInt<5>> {
...
}
pub fn f() {
let a = 123_hdl_u12; // has type Expr<UInt<12>>
let b = -0x10_hdl_i5; // has type Expr<SInt<5>>
} |
Definitely like the idea of a fully generic solution, although it feels like we're still a ways off from being able to accomplish that. That said, I think that there's a clear issue with literal suffixes here, and that's that a lot of people (myself included) use them to remove type ambiguity, but this change would effectively break that. Now, for example, I'm not 100% sure if this is a particularly big issue (especially since it would probably default to the specified suffix type if it's ambiguous, rather than choosing Also: it's worth saying that since |
I love this proposal! I was surprised when I was learning rust and discovered this didn't exist.
This has been my exact thought process multiple times, I usually code my own |
On suffixes: I think we shouldn't add any more of those, but should instead find a way to make an ascription syntax that people can be happy with. While it's true that we had to get rid of (AKA the compiler only calls the hypothetical |
…d of talking about implementation
Based on initial comments received, I have adjusted the RFC text to remove coercion from constant variables (limiting proposed behavior to only literals), and to clarify some open topics the initial draft failed to address, defaulting to relatively conservative viewpoints to start from until/unless the discussion suggests more ambitious moves are necessary:
This still seems to satisfy the main mission of making tests and examples more readable, while avoiding complexity in corner cases and avoiding new syntax/suffixes. |
The "only literals, not expressions" logic has some interaction with rust-lang/compiler-team#835 in that if literals do not have signs, this coercion would be limited to positive values only, which would be imperfect (though still valuable). |
the API I proposed in a thread above basically treats literals as a signed literal where conversion from a literal to a user-defined type has separate inputs for the sign and for the literal's text. basically, if negation does something weird, the conversion function knows the input was negated (even if it's |
I agree with @clarfonthey wrt literal suffixes and explicitness - in the "alternatives" section about suffixes, I feel like the theoretical allowedness is a bit the wrong way round, and I'd rather see: takes_nz(1); // ok
takes_nz(1_u32); // error: u32 always means u32
takes_nz(1_u32_nz); // ok Having a new int literal coercion that doesn't have an explicit suffix associated seems like it could lead to some annoying inconsistencies, e.g. a macro that takes a |
I think the type ascription approach is really powerful, though I would argue that implicit coercion should perhaps still be on the table, regardless, as type ascription for every argument will still be quite verbose. That is, if we imagine a explicit(1:NonZero, 2:NonZero, 4:NonZero, 5:NonZero);
implicit(1, 2, 4, 5); The literals are drowned out in the first case! Hence, much like type suffixes are typically rare in code thanks to type inference, I think there's a strong argument for coercion here so long as type inference has figured out the argument types. Note: and yes, syntax wise, I do find type ascription preferable to suffixes too; using suffix literals in C++ is painful as the conversion operators need to be imported separately, and sometimes clash when you try and import a bundle of them... it's also always not clear which operator creates which type. I would guess we'd need the fully family, then: And in terms of signature, I think it could be interesting to support "raw" literals in them. That is, even for the Integer|Float case, passing the token string, rather than the pre-parsed integral:
|
+1 on the eventual |
Some way of using number literals to safely create some types (like NonZero, BigInts, and so on) could be good to have (as long as it's simple to use, to understand and it's safe). I even opened a ER that asks for a little macro that avoids the unwrap for compile-time NonZero values. But some people are working at introducing pattern types in Rust, like "u8 is 1 ..", that probably will never really deprecate the NonZero std lib types, but I think the design of everything in this issue should be seen in a wider contest of adding pattern type literals too. |
I was skeptical of the special casing of the RFC as of writing, even though I totally run into this everywhere (not just tests as the RFC repeatedly emphasizes, if it were just tests I would just use macros and short functions like my |
I like the RFC and would find benefit from it. I don't like most of the taking it further discussions. Consider:
Versus:
Now to be clear I recognize that everyone is only discussing literals right now. But the refactor is something one would expect to work. I would argue against Rust having custom implicit conversions. But if it did have custom implicit conversions they should really work everywhere and the obvious refactors shouldn't change the meaning of things. I don't believe this problem comes up often enough but if someone convinced me it did I would instead propose raising or re-raising custom string suffixes e.g. "123"mytype or somesuch (which isn't an argument against this RFC, It's fine to have the occasional special thing in stdlib imo). Such an approach at least makes it clear that there is magic, and it is easy to explain that it desugars to e.g. mytype::new("value") or some callout to a macro. I see other problems with a generalized approach, but they're off topic. |
well, imo its something one would expect not to work, if you have
I agree custom implicit conversions are generally bad. 1 The way I see it, literals can be one of many types, and the actual type they have is deduced from how they're used, so Footnotes
|
I think the consensus is both are not going to work. In the first case you need explicit conversion make it clear whether you want // assuming `MyWeirdInt: const Add`.
const WEIRD: MyWeirdInt = (123 as MyWeirdInt) + (456 as MyWeirdInt); |
The
std::num::NonZero<T>
type allows non-zero integer semantics to be clearly expressed. Yet thistype is only seamlessly usable if all APIs with non-zero semantics use this type, due to required
to/from conversion at any API boundary that differs in its use of
NonZero
.The burden of these conversions is especially heavy in tests and examples. This RFC proposes new
coercions to facilitate implicit conversion to
NonZero
from integral constants, simplifying usagein tests and examples, where succinctness and readability are paramount.
Rendered