Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guarantee slice representation #3775

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
205 changes: 205 additions & 0 deletions text/0000-guaranteed-slice-repr.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,205 @@
- Feature Name: guaranteed_slice_repr
- Start Date: 2025-02-18
- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000)
- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000)

# Summary
[summary]: #summary

This RFC guarantees the in-memory representation of slice and str references.
Specifically, `&[T]` and `&mut [T]` are guaranteed to have the same layout as:

```rust
#[repr(C)]
struct Slice<T> {
data: *const T,
len: usize,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this specific order? Why not length before pointer? Are there any plausible reasons to prefer one over the other, e.g., based on target architecture?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that our current layout algorithm is best able to exploit any niche in len if len is first in this ordering. And we currently cannot use a "double niche" on two different fields very effectively. I can work on improving this, but I cannot promise a specific result yet.

Due to various quirks of our existing APIs, the len niche would not apply to

  1. zero-sized types
  2. the "raw" version of this: *mut [T] and *const [T]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@workingjubile Why would len have a niche? I would expect data to be the field with a niche, as it is non-null for &[T].

Copy link

@hanna-kruppe hanna-kruppe Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The total size len * mem::size_of::<T>() of the slice must be no larger than isize::MAX, and adding that size to data must not “wrap around” the address space. See the safety documentation of pointer::offset.

https://doc.rust-lang.org/std/slice/fn.from_raw_parts.html

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, clever! Thanks for describing that-- I'll mention it in the RFC.

Copy link
Member

@scottmcm scottmcm Feb 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cramertj It's really important that the length have a niche in &[T], as has been FCPed by opsem in rust-lang/unsafe-code-guidelines#510, because that allows us to add range metadata to the lengths in every function that takes a &[T], and correspondingly lets LLVM know that, for example, (i + j)/2 on in-bounds indexes can't overflow (well, for non-ZST Ts, which TBH are the only ones that matter for loop-over-slice optimizations).

Sadly *const [T] doesn't have the niche, though, because https://doc.rust-lang.org/std/ptr/fn.slice_from_raw_parts.html was stabilized before this was realized and because casting *const [()] to *const [i32] works and doesn't change the metadata.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, also, we want to ensure to leave space for size-and-alignment based niches in references too.

I guess for a slice there's always the "zero size" case, so that simplifies to just alignment niches, but that still means (-align, align) are all impossible. (Which simplifies to just "it's not null" for something with align-1.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as has been FCPed by opsem in rust-lang/unsafe-code-guidelines#510

Theoretically, we could have been even stricter, additionally requiring that ptr + length not overflow the address space.

}
```

The layout of `&str` is the same as that of `&[u8]`, and the layout of
`&mut str` is the same as that of `&mut [u8]`.

# Motivation
[motivation]: #motivation

This RFC allows non-Rust (e.g. C or C++) code to read from or write to existing
slices and to declare slice fields or locals.

For example, guaranteeing the representation of slice references allows
non-Rust code to read from the `data` or `len` fields of `string` in the type
below without intermediate FFI calls into Rust:

```rust
#[repr(C)]
struct HasString {
string: &'static str,
}
```

Note: prior to this RFC, the type above is not even properly `repr(C)` since the
size and alignment of slices were not guaranteed. However, the Rust compiler
accepts the `repr(C)` declaration above without warning.

# Guide-level explanation
Copy link
Member

@scottmcm scottmcm Feb 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You appear to have deleted the reference-level explanation section from the template.

I would encourage you to split this back into two parts: the guide level informal explanation that talks about how it allows you to read things from C, but then also have a reference-level explanation of exactly what it's guaranteeing, without ever using the word "layout", because "layout" means too many different things to different people. (Some people think it means just std::alloc::Layout, some include field offsets, some include validity invariants, etc.)

I'm absolutely in favour of doing this RFC, see this old Zulip thread, so long as it's clearly scoped to the parts that really are uncontroversial.

Notably, I don't think that any description that include *const T is correct in a reference-level section if it applies to &[_] references, because it at least needs to be NonNull<T> -- but it also needs to be aligned, and a bunch of other things.

So I want to see a precise, positively-specified list of exactly what we're RFCing.

A quick stab at it:

  • we're only talking about {&,&mut,*const,*mut} [T] with T: Sized
  • we're only talking about platforms where sizeof(*const impl Thin) == sizeof(usize).
  • the size is 2 * sizeof(usize).
  • we're only talking about platforms where alignof(*const impl Thin) == alignof(usize).
  • the align is alignof(usize).
  • the pointer component is at offset 0, which points to the first element
  • the length component is at offset sizeof(usize), and contains the length in units of elements (not of bytes)

I don't know if we stop there, but if that's all we're committing to I think it's uncontroversial -- and I bet it's what a whole bunch of unsafe code in the ecosystem already assumes anyway, de facto, since it's been true since at least 1.1.0.

I don't know what, if anything, we want to promise or require about the validity invariant, especially since that's already defined to be different between &[i32] and *const [i32]. Perhaps this RFC should just not say anything about it -- reading a rust slice from C doesn't need to know anything about it.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Limiting it to platforms where the size and alignment of usize and a pointer are the same seems unnecessarily restrictive. Is there a difficulty with differing size and alignment I'm not aware of?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Limiting it to platforms where the size and alignment of usize and a pointer are the same seems unnecessarily restrictive. Is there a difficulty with differing size and alignment I'm not aware of?

Yes, IIRC we still haven't decided if usize is an integer big enough to contain a pointer (C's uintptr_t), or big enough for just the address (effectively C's size_t) -- those are different on stuff like CHERI where a pointer also has another 64 bits (or more) of permissions information as well as the 64 bits of address.

Last discussion on it (that I recall at the moment):
https://rust-lang.zulipchat.com/#narrow/channel/136281-t-opsem/topic/Pre-RFC.20discussion.3A.20usize.20semantics

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. But even if it is decided that usize should be just enough for just the address (size_t), then would there be a problem with a slice being defined as something like

#[repr(C)]
struct Slice<T> {
    data: NonNull<T>,
    len: usize,
}

?

Another consideration is how the value would be passed by value as a function argument over FFI. I believe at least on x86_64, the repr(C) struct would probably be passed in two registers, but I'm not sure if that would be the case given a definition entirely based on offsets.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another consideration is how the value would be passed by value as a function argument over FFI. I believe at least on x86_64, the repr(C) struct would probably be passed in two registers, but I'm not sure if that would be the case given a definition entirely based on offsets.

passing by value through function arguments/return is specifically excluded by this RFC because that's much more complex and we may want to change it. In practice Rust passes a slice by value as separate pointer and length arguments, rather than as a struct. I've heard that you can't express what it does for return in C code on x86-64 (or x86? icr).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tmccombs AFAIK every platform we currently support meets those requirements, so they're there more as a way to make writing out what's guaranteed easy, and because anywhere that doesn't meet those requirements there's a meaningful conversation to be had about what the layout should be.

For example, suppose there was a platform with (size: 16, align:8) pointers and (size: 8, align:8) slice-metadata-type. Should &[_] be size 32 or 24? I don't know, but I don't think we need to make that decision now, so I'm happy to leave it out of scope.

but I'm not sure if that would be the case given a definition entirely based on offsets.

That's intentional, yes. Does this need to define an ABI for it? I don't know. Maybe we can accomplish most of the goals by requiring that it be a field in a struct that's passed by pointer, or just passing it as &&[T] for now, so we don't need to make ABI decisions -- which is especially nice since ABIs are such a mess for by-value passing overall.


Basically, I think there's simple and non-controversial set of things that we can approve easily that'll be useful, even if it's not everything that everyone might one day want. Let's land those parts first, then a later RFC that wants to, say, spend the time doing the ABI details survey to figure out what's practical can do that, but we can avoid worrying about it for now.

[guide-level-explanation]: #guide-level-explanation

Slice references are represented with a pointer and length pair. Their in-memory
layout is the same as a `#[repr(C)]` struct like the following:

```rust
#[repr(C)]
struct Slice<T> {
data: *const T,
len: usize,
Comment on lines +47 to +54
Copy link
Member

@workingjubilee workingjubilee Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we wished to extend Rust to allow references to unsized types that encompass unsized types... for example, &[[T]], or more interestingly, &[dyn Trait]... then one of the likely choices we would make is to have "triple-wide" (or larger!) references, instead of this "double-wide" one. Thus we would only apply the corresponding layout this RFC describes (or any other such specified layout) in the case of where T: Sized.

I feel this "multiple metadata" possibility should be considered and either explicitly reserved as a future possibility or explicitly dismissed. I feel accepting this RFC as-is could be interpreted as foreclosing it, but there might forever be grumbling about how "it doesn't say...!" Thus if we don't want to do that, we should be clear, and if we're still open to that possibility, we should also be clear.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered this, but IMO it seems difficult to imagine how &[[T]] would work-- even with triple-wide references, the items in the collection could be heterogeneous.

I suppose we could make it work if all of the elements were the same type, e.g. coercing &[String] into &[dyn Debug].

Personally, I'd be happy to restrict this RFC to specify that this only applies to &[T] where T: Sized. Would that resolve your concern?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could make the RFC compatible with &[[U]] by simply relaxing the bound to T: ?Sized:

&[[U]]
->
{ data: *const [U], len: usize }
->
{ data: { data: *const U, len: usize }, len: usize }

so I don't see how this definition of Slice<T> forecloses the multi-meta possibility.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered this, but IMO it seems difficult to imagine how &[[T]] would work-- even with triple-wide references, the items in the collection could be heterogeneous.

You could either require homogeneity like you suggest, or you could have the metadata be its own pointer to a separate region of memory, sharing the lifetime of the data pointer. (I wrote a very experimental crate that does something like the latter: https://github.com/Jules-Bertholet/unsized-vec)

But since current Rust doesn’t support this, and nothing in this RFC would interfere with someday adopting either option, it’s not a concern IMO.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could make the RFC compatible with &[[U]] by simply relaxing the bound to T: ?Sized

another valid extension is to just insert more fields for metadata between the pointer and length or change the length to be a struct-- these match how the pointer metadata APIs work more closely.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another valid extension is to just insert more fields for metadata between the pointer and length or change the length to be a struct-- these match how the pointer metadata APIs work more closely.

changing the metadata (length) from usize to a struct would be rust-lang/libs-team#246


You could either require homogeneity like you suggest, or you could have the metadata be its own pointer to a separate region of memory, sharing the lifetime of the data pointer.

if we still want to have zero-cost unsizing the type &[[U]] must only be able to represent rectangular matrices:

// unsizing from &[i32; 5] to &[i32] is free
let y: &[i32] = &[1, 2, 3, 4, 5];

// unsizing a `&[[f64; 3]; 2]` to `&[[f64]]` should also not involve any allocation.
let x: &[[f64]] = &[[1.0, 0.0, 0.0], [0.0, 1.0, 0.0]];

Copy link
Member

@programmerjake programmerjake Feb 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changing the metadata (length) from usize to a struct would be rust-lang/libs-team#246

I don't really think so, that's just wrapping the metadata type in a struct whenever you're not using it as part of a pointer type. This is explicitly similar to mem::Discriminant which wraps an enum's discriminant in a struct, even though the discriminant might be a well-known type, e.g. u8 for a repr(u8) fieldless enum.

for [T] where T: Sized, the metadata type would still be usize; the metadata could be a struct or some other type when using custom extern types -- that kind of custom extern type doesn't exist in Rust yet.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both homogeneous &[dyn Trait] and rectangular &[[T]] sound useful, especially because the retangular case benefits from more optimizations.

We could consider unsizing options for more general types too like struct Foo<T>(T,T), so then a homogeneous Foo<dyn Trait>.

}
```
Comment on lines +50 to +56
Copy link
Member

@workingjubilee workingjubilee Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the RFC does describe that the non-null requirements are upheld, because *const T where T: Sized lacks a "niche", the way this is written could be misread to preclude our "niches", which obviously cannot be the case because we promise niche-based transformations in certain cases. There are two major niches we may wish to be able to exploit:

  1. The well-known "null pointer" niche for data. This one is a promised transformation.
  2. A less well-known case is that if T is non-zero-sized, the size of the slice reference is bounded by isize::MAX, which means that we have a very large niche on the len as the top bit can never be set.

This RFC should probably at least mention the guaranteed niche transformations exist and their implications, just to make clear it is not contradicting or overruling them. A weaker version was described in RFC 3391 but we strengthened that decision to generalize to similarly-shaped enums, not just Result or Option, post-hoc. I documented them in this test, I don't know where to look it up in the reference: https://github.com/rust-lang/rust/blob/ed49386d3aa3a445a9889707fd405df01723eced/tests/ui/rfcs/rfc-3391-result-ffi-guarantees.rs

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another niche that isn't currently exploited but may be in the future is alignment, e.g. the pointer in &[u32] is not only non-null but addresses 1, 2, and 3 are also invalid (assuming align_of::<u32>() == 4). This is a smaller niche than the length being at most isize::MAX / size_of::<T>(), but it also applies for ZSTs.

Copy link
Member

@workingjubilee workingjubilee Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, perhaps more generally we should be clear that "the representation is stable" still allows for transformations to happen "around it", particularly when it comes to ADT tag layout. This means that unsafe code that interacts with a blob of bytes that happens to be EnumType<&[T]> should be cautious. This was always the case but will only "get worse" over time.


The precise ABI of slice references is not guaranteed, so `&[T]` may not be
passed by-value or returned by-value from an `extern "C" fn`.
Comment on lines +58 to +59
Copy link
Member

@workingjubilee workingjubilee Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I don't think is well-described here, or at least I feel that based on some other comments it might be at risk of being glossed over when some people engage with this RFC, is that calling convention (AKA parameter and return passing) and in-memory layout are both very different things.

Often, "ABI" is used to describe both, because they are both technically part of the literal "application binary interface". Due to arguments and returns sometimes passing through the stack, and almost always when the number of them is large enough, in-memory layout becomes very relevant for almost every calling convention. But they're not the same.

In particular, every calling convention is a beautiful and unique snowflake. Thus, we could define the calling convention for &[T] specifically when passed over extern "C" and extern "C-unwind" without defining it in other cases, and especially without defining it for extern "Rust" (i.e. the "default" ABI).

BUT I would advise caution in promising compatibility in a very specific way like "just for extern "C"" like that, as people will misread, misinterpret, and fail to consider edge cases. We cannot solve such entirely, but we should be very careful about tentative almost-promises. It might only be appropriate to mention as a future possibility.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to amend the RFC to clarify this point! The sentence you highlighted seems like it covers this to me, but perhaps I need to make it bolder/brighter/a headline.

Do you have other suggestions for how I could make this more straightforward?

Copy link
Member

@workingjubilee workingjubilee Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simply be more boring and avoid vague terms like ABI, and specify exactly what you specify: "This guarantees the in-memory layout only, it does not concern how it is passed or returned through functions".


The validity requirements for the in-memory representation of slice references
are the same as [those documented on `std::slice::from_raw_parts`](https://doc.rust-lang.org/std/slice/fn.from_raw_parts.html) for shared slice references, and
[those documented on `std::slice::from_raw_parts_mut`](https://doc.rust-lang.org/std/slice/fn.from_raw_parts_mut.html)
for mutable slice references.

Namely:

* `data` must be non-null, valid for reads (for shared references) or writes
(for mutable references) for `len * mem::size_of::<T>()` many bytes,
and it must be properly aligned. This means in particular:

* The entire memory range of this slice must be contained within a single allocated object!
Slices can never span across multiple allocated objects.
* `data` must be non-null and aligned even for zero-length slices or slices of ZSTs. One
reason for this is that enum layout optimizations may rely on references
(including slices of any length) being aligned and non-null to distinguish
them from other data. You can obtain a pointer that is usable as `data`
for zero-length slices using [`NonNull::dangling()`].

* `data` must point to `len` consecutive properly initialized values of type `T`.

* The total size `len * mem::size_of::<T>()` of the slice must be no larger than `isize::MAX`,
and adding that size to `data` must not "wrap around" the address space.
See the safety documentation of [`pointer::offset`].

## `str`

The layout of `&str` is the same as that of `&[u8]`, and the layout of
`&mut str` is the same as that of `&mut [u8]`. More generally, `str` behaves like
`#[repr(transparent)] struct str([u8]);`. Safe Rust functions may assume that
`str` holds valid UTF8, but [it is not immediate undefined-behavior to store
non-UTF8 data in `str`](https://doc.rust-lang.org/std/primitive.str.html#invariant).

## Pointers

Raw pointers to slices such as `*const [T]` or `*mut str` use the same layout
as slice references, but do not necessarily point to anything.

# Drawbacks
[drawbacks]: #drawbacks

## Zero-sized types

One could imagine representing `&[T]` as only `len` for zero-sized `T`.
This proposal would preclude that choice in favor of a standard representation
for slices regardless of the underlying type.

Alternatively, we could choose to guarantee that the data pointer is present if
and only if `size_of::<T> != 0`. This has the possibility of breaking exising
code which smuggles pointers through the `data` value in `from_raw_parts` /
`into_raw_parts`.

## Uninhabited types

Similarly, we could be *extra* tricky and make `&[!]` or other `&[Uninhabited]`
types into a ZST since the slice can only ever be length zero.

If we want to maintain the pointer field, we could also make `&[!]` *just* a
pointer since we know the length can only be zero.

Either option may offer modest performance benefits for highly generic code
which happens to create empty slices of uninhabited types, but this is unlikely
to be worth the cost of maintaining a special case.
Comment on lines +121 to +123
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand, we could opt not to guarantee this since no one seems to have much of a use case for this right now anyway.


## Compatibility with C++ `std::span`

The largest drawback of this layout and set of validity requirements is that it
may preclude `&[T]` from being representationally equivalent to C++'s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I keep getting tripped up reading this section by thinking it says we're committing to something that diverges from C++. I would phrase it as something like

Because C++ does not guarantee a layout for std::span, stabilizing one for Rust now means C++ could choose to stabilize a different layout and prevent the two from ever being compatible. However, this is unlikely to ever happen because MSVC uses the same representation proposed in this RFC and has stringent stability requirements.

`std::span<T, std::dynamic_extent>`.

* `std::span` does not currently guarantee its layout. In practice, pointer + length
is the common representation. This is even observable using `is_layout_compatible`
[on MSVC](https://godbolt.org/z/Y8ardrshY), though not
[on GCC](https://godbolt.org/z/s4v4xehnG) nor
[on Clang](https://godbolt.org/z/qsd1K5oGq). Future changes to guarantee a
different layout in the C++ standard (unlikely due to MSVC ABI stabilitiy
requirements) could preclude matching the layout with `&[T]`.

* Unlike Rust, `std::span` allows the `data` pointer to be `nullptr`. One
possibile workaround for this would be to guarantee that `Option<&[T]>` uses
`data: std::ptr::null(), len: 0` to represent the `None` case, making
`std::span<T>` equivalent to `Option<&[T]>` for non-zero-sized types.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be worth a perf run to see if there's any measurable impact of doing this.


Note that this is not currently the case. The compiler currenty represents
`None::<&[u8]>` as `data: std::ptr::null(), len: uninit` (though this is
not guaranteed).

* Rust uses a dangling pointer in the representation of zero-length slices.
It's unclear whether C++ guarantees that a dangling pointer will remain
unchanged when passed through `std::span`. However, it does support
dangling pointers during regular construction via the use of
[`std::to_address`](https://en.cppreference.com/w/cpp/container/span/span)
in the iterator constructors.

Note that C++ also does not support zero-sized types, so there is no naive way
to represent types like `std::span<SomeZeroSizedRustType>`.

## Flexibility

Additionally, guaranteeing layout of Rust-native types limits the compiler's and
standard library's ability to change and take advantage of new optimization
opportunities.

# Rationale and alternatives
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

crABI provides a way to pass slices to/from C without locking in our representation #3470, that is probably worth mentioning somewhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a related way, maybe we could do something like guarantee the layout only in repr(C) types or in extern "C" function signatures? Which retains the flexibility to change ordering in most cases.

[rationale-and-alternatives]: #rationale-and-alternatives

* We could avoid committing to a particular representation for slices.

* We could try to guarantee layout compatibility with a particular target's
`std::span` representation, though without standardization this may be
impossible. Multiple different C++ stdlib implementations may be used on
the same platform and could potentially have different span representations.
In practice, current span representations also use ptr+len pairs.

* We could avoid storing a data pointer for zero-sized types. This would result
in a more compact representation but would mean that the representation of
`&[T]` is dependent on the type of `T`. Additionally, this would break
existing code which depends on storing data in the pointer of ZST slices.

This would break popular crates such as [bitvec](https://docs.rs/crate/bitvec/1.0.1/source/doc/ptr/BitSpan.md)
(55 million downloads) and would result in strange behavior such as
`std::ptr::slice_from_raw_parts(ptr, len).as_ptr()` returning a different
pointer from the one that was passed in.

Types like `*const ()` / `&()` are widely used to pass around pointers today.
We cannot make them zero-sized, and it would be surprising to make a
different choice for `&[()]`.
Comment on lines +185 to +187
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should have warned against this and made everyone use *const u8 instead. Seems impossible to make a change like this now though.



# Prior art
[prior-art]: #prior-art

The layout in this RFC is already documented in
[the Unsafe Code Guildelines Reference.](https://rust-lang.github.io/unsafe-code-guidelines/layout/pointers.html)

# Future possibilities
[future-possibilities]: #future-possibilities

* Consider defining a separate Rust type which is repr-equivalent to the platform's
native `std::span<T, std::dynamic_extent>` to allow for easier
interoperability with C++ APIs. Unfortunately, the C++ standard does not
guarantee the layout of `std::span` (though the representation may be known
and fixed on a particular implementation, e.g. libc++/libstdc++/MSVC).
Zero-sized types would also not be supported with a naive implementation of
such a type.