Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Guarantee slice representation #3775
base: master
Are you sure you want to change the base?
Guarantee slice representation #3775
Changes from all commits
5e0e852
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this specific order? Why not length before pointer? Are there any plausible reasons to prefer one over the other, e.g., based on target architecture?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that our current layout algorithm is best able to exploit any niche in
len
iflen
is first in this ordering. And we currently cannot use a "double niche" on two different fields very effectively. I can work on improving this, but I cannot promise a specific result yet.Due to various quirks of our existing APIs, the
len
niche would not apply to*mut [T]
and*const [T]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@workingjubile Why would
len
have a niche? I would expectdata
to be the field with a niche, as it is non-null for&[T]
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://doc.rust-lang.org/std/slice/fn.from_raw_parts.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, clever! Thanks for describing that-- I'll mention it in the RFC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cramertj It's really important that the length have a niche in
&[T]
, as has been FCPed by opsem in rust-lang/unsafe-code-guidelines#510, because that allows us to add range metadata to the lengths in every function that takes a&[T]
, and correspondingly lets LLVM know that, for example,(i + j)/2
on in-bounds indexes can't overflow (well, for non-ZSTT
s, which TBH are the only ones that matter for loop-over-slice optimizations).Sadly
*const [T]
doesn't have the niche, though, because https://doc.rust-lang.org/std/ptr/fn.slice_from_raw_parts.html was stabilized before this was realized and because casting*const [()]
to*const [i32]
works and doesn't change the metadata.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, also, we want to ensure to leave space for size-and-alignment based niches in references too.
I guess for a slice there's always the "zero size" case, so that simplifies to just alignment niches, but that still means
(-align, align)
are all impossible. (Which simplifies to just "it's not null" for something with align-1.)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Theoretically, we could have been even stricter, additionally requiring that ptr + length not overflow the address space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You appear to have deleted the reference-level explanation section from the template.
I would encourage you to split this back into two parts: the guide level informal explanation that talks about how it allows you to read things from C, but then also have a reference-level explanation of exactly what it's guaranteeing, without ever using the word "layout", because "layout" means too many different things to different people. (Some people think it means just
std::alloc::Layout
, some include field offsets, some include validity invariants, etc.)I'm absolutely in favour of doing this RFC, see this old Zulip thread, so long as it's clearly scoped to the parts that really are uncontroversial.
Notably, I don't think that any description that include
*const T
is correct in a reference-level section if it applies to&[_]
references, because it at least needs to beNonNull<T>
-- but it also needs to be aligned, and a bunch of other things.So I want to see a precise, positively-specified list of exactly what we're RFCing.
A quick stab at it:
{&,&mut,*const,*mut} [T]
withT: Sized
sizeof(*const impl Thin)
==sizeof(usize)
.2 * sizeof(usize)
.alignof(*const impl Thin)
==alignof(usize)
.alignof(usize)
.0
, which points to the first elementsizeof(usize)
, and contains the length in units of elements (not of bytes)I don't know if we stop there, but if that's all we're committing to I think it's uncontroversial -- and I bet it's what a whole bunch of unsafe code in the ecosystem already assumes anyway, de facto, since it's been true since at least 1.1.0.
I don't know what, if anything, we want to promise or require about the validity invariant, especially since that's already defined to be different between
&[i32]
and*const [i32]
. Perhaps this RFC should just not say anything about it -- reading a rust slice from C doesn't need to know anything about it.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Limiting it to platforms where the size and alignment of usize and a pointer are the same seems unnecessarily restrictive. Is there a difficulty with differing size and alignment I'm not aware of?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, IIRC we still haven't decided if
usize
is an integer big enough to contain a pointer (C'suintptr_t
), or big enough for just the address (effectively C'ssize_t
) -- those are different on stuff like CHERI where a pointer also has another 64 bits (or more) of permissions information as well as the 64 bits of address.Last discussion on it (that I recall at the moment):
https://rust-lang.zulipchat.com/#narrow/channel/136281-t-opsem/topic/Pre-RFC.20discussion.3A.20usize.20semantics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. But even if it is decided that
usize
should be just enough for just the address (size_t
), then would there be a problem with a slice being defined as something like?
Another consideration is how the value would be passed by value as a function argument over FFI. I believe at least on x86_64, the
repr(C)
struct would probably be passed in two registers, but I'm not sure if that would be the case given a definition entirely based on offsets.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
passing by value through function arguments/return is specifically excluded by this RFC because that's much more complex and we may want to change it. In practice Rust passes a slice by value as separate pointer and length arguments, rather than as a struct. I've heard that you can't express what it does for return in C code on x86-64 (or x86? icr).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tmccombs AFAIK every platform we currently support meets those requirements, so they're there more as a way to make writing out what's guaranteed easy, and because anywhere that doesn't meet those requirements there's a meaningful conversation to be had about what the layout should be.
For example, suppose there was a platform with (size: 16, align:8) pointers and (size: 8, align:8) slice-metadata-type. Should
&[_]
be size 32 or 24? I don't know, but I don't think we need to make that decision now, so I'm happy to leave it out of scope.That's intentional, yes. Does this need to define an ABI for it? I don't know. Maybe we can accomplish most of the goals by requiring that it be a field in a struct that's passed by pointer, or just passing it as
&&[T]
for now, so we don't need to make ABI decisions -- which is especially nice since ABIs are such a mess for by-value passing overall.Basically, I think there's simple and non-controversial set of things that we can approve easily that'll be useful, even if it's not everything that everyone might one day want. Let's land those parts first, then a later RFC that wants to, say, spend the time doing the ABI details survey to figure out what's practical can do that, but we can avoid worrying about it for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we wished to extend Rust to allow references to unsized types that encompass unsized types... for example,
&[[T]]
, or more interestingly,&[dyn Trait]
... then one of the likely choices we would make is to have "triple-wide" (or larger!) references, instead of this "double-wide" one. Thus we would only apply the corresponding layout this RFC describes (or any other such specified layout) in the case ofwhere T: Sized
.I feel this "multiple metadata" possibility should be considered and either explicitly reserved as a future possibility or explicitly dismissed. I feel accepting this RFC as-is could be interpreted as foreclosing it, but there might forever be grumbling about how "it doesn't say...!" Thus if we don't want to do that, we should be clear, and if we're still open to that possibility, we should also be clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered this, but IMO it seems difficult to imagine how
&[[T]]
would work-- even with triple-wide references, the items in the collection could be heterogeneous.I suppose we could make it work if all of the elements were the same type, e.g. coercing
&[String]
into&[dyn Debug]
.Personally, I'd be happy to restrict this RFC to specify that this only applies to
&[T] where T: Sized
. Would that resolve your concern?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could make the RFC compatible with
&[[U]]
by simply relaxing the bound toT: ?Sized
:so I don't see how this definition of
Slice<T>
forecloses the multi-meta possibility.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could either require homogeneity like you suggest, or you could have the metadata be its own pointer to a separate region of memory, sharing the lifetime of the data pointer. (I wrote a very experimental crate that does something like the latter: https://github.com/Jules-Bertholet/unsized-vec)
But since current Rust doesn’t support this, and nothing in this RFC would interfere with someday adopting either option, it’s not a concern IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another valid extension is to just insert more fields for metadata between the pointer and length or change the length to be a struct-- these match how the pointer metadata APIs work more closely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changing the metadata (length) from
usize
to a struct would be rust-lang/libs-team#246if we still want to have zero-cost unsizing the type
&[[U]]
must only be able to represent rectangular matrices:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really think so, that's just wrapping the metadata type in a struct whenever you're not using it as part of a pointer type. This is explicitly similar to
mem::Discriminant
which wraps an enum's discriminant in a struct, even though the discriminant might be a well-known type, e.g.u8
for arepr(u8)
fieldless enum.for
[T]
whereT: Sized
, the metadata type would still beusize
; the metadata could be a struct or some other type when using customextern
types -- that kind of customextern
type doesn't exist in Rust yet.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think both homogeneous
&[dyn Trait]
and rectangular&[[T]]
sound useful, especially because the retangular case benefits from more optimizations.We could consider unsizing options for more general types too like
struct Foo<T>(T,T)
, so then a homogeneousFoo<dyn Trait>
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the RFC does describe that the non-null requirements are upheld, because
*const T where T: Sized
lacks a "niche", the way this is written could be misread to preclude our "niches", which obviously cannot be the case because we promise niche-based transformations in certain cases. There are two major niches we may wish to be able to exploit:data
. This one is a promised transformation.T
is non-zero-sized, the size of the slice reference is bounded byisize::MAX
, which means that we have a very large niche on thelen
as the top bit can never be set.This RFC should probably at least mention the guaranteed niche transformations exist and their implications, just to make clear it is not contradicting or overruling them. A weaker version was described in RFC 3391 but we strengthened that decision to generalize to similarly-shaped enums, not just Result or Option, post-hoc. I documented them in this test, I don't know where to look it up in the reference: https://github.com/rust-lang/rust/blob/ed49386d3aa3a445a9889707fd405df01723eced/tests/ui/rfcs/rfc-3391-result-ffi-guarantees.rs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another niche that isn't currently exploited but may be in the future is alignment, e.g. the pointer in
&[u32]
is not only non-null but addresses 1, 2, and 3 are also invalid (assumingalign_of::<u32>() == 4
). This is a smaller niche than the length being at mostisize::MAX / size_of::<T>()
, but it also applies for ZSTs.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, perhaps more generally we should be clear that "the representation is stable" still allows for transformations to happen "around it", particularly when it comes to ADT tag layout. This means that
unsafe
code that interacts with a blob of bytes that happens to beEnumType<&[T]>
should be cautious. This was always the case but will only "get worse" over time.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I don't think is well-described here, or at least I feel that based on some other comments it might be at risk of being glossed over when some people engage with this RFC, is that calling convention (AKA parameter and return passing) and in-memory layout are both very different things.
Often, "ABI" is used to describe both, because they are both technically part of the literal "application binary interface". Due to arguments and returns sometimes passing through the stack, and almost always when the number of them is large enough, in-memory layout becomes very relevant for almost every calling convention. But they're not the same.
In particular, every calling convention is a beautiful and unique snowflake. Thus, we could define the calling convention for
&[T]
specifically when passed overextern "C"
andextern "C-unwind"
without defining it in other cases, and especially without defining it forextern "Rust"
(i.e. the "default" ABI).BUT I would advise caution in promising compatibility in a very specific way like "just for
extern "C"
" like that, as people will misread, misinterpret, and fail to consider edge cases. We cannot solve such entirely, but we should be very careful about tentative almost-promises. It might only be appropriate to mention as a future possibility.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to amend the RFC to clarify this point! The sentence you highlighted seems like it covers this to me, but perhaps I need to make it bolder/brighter/a headline.
Do you have other suggestions for how I could make this more straightforward?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simply be more boring and avoid vague terms like ABI, and specify exactly what you specify: "This guarantees the in-memory layout only, it does not concern how it is passed or returned through functions".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the other hand, we could opt not to guarantee this since no one seems to have much of a use case for this right now anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I keep getting tripped up reading this section by thinking it says we're committing to something that diverges from C++. I would phrase it as something like
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would be worth a perf run to see if there's any measurable impact of doing this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
crABI provides a way to pass slices to/from C without locking in our representation #3470, that is probably worth mentioning somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a related way, maybe we could do something like guarantee the layout only in
repr(C)
types or inextern "C"
function signatures? Which retains the flexibility to change ordering in most cases.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should have warned against this and made everyone use
*const u8
instead. Seems impossible to make a change like this now though.