Skip to content

Commit 09a64ea

Browse files
committed
Guarantee slice representation
1 parent b0e56db commit 09a64ea

File tree

1 file changed

+196
-0
lines changed

1 file changed

+196
-0
lines changed

text/0000-guaranteed-slice-repr.md

+196
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,196 @@
1+
- Feature Name: guaranteed_slice_repr
2+
- Start Date: 2025-02-18
3+
- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000)
4+
- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000)
5+
6+
# Summary
7+
[summary]: #summary
8+
9+
This RFC guarantees the in-memory representation of slice and str references.
10+
Specifically, `&[T]` and `&mut [T]` are guaranteed to have the same layout as:
11+
12+
```rust
13+
#[repr(C)]
14+
struct Slice<T> {
15+
data: *const T,
16+
len: usize,
17+
}
18+
```
19+
20+
The layout of `&str` is the same as that of `&[u8]`, and the layout of
21+
`&mut str` is the same as that of `&mut [u8]`.
22+
23+
# Motivation
24+
[motivation]: #motivation
25+
26+
This RFC allows non-Rust (e.g. C or C++) code to read from or write to existing
27+
slices and to declare slice fields or locals.
28+
29+
For example, guaranteeing the representation of slice references allows
30+
non-Rust code to read from the `data` or `len` fields of `string` in the type
31+
below without intermediate FFI calls into Rust:
32+
33+
```rust
34+
#[repr(C)]
35+
struct HasString {
36+
string: &'static str,
37+
}
38+
```
39+
40+
Note: prior to this RFC, the type above is not even properly `repr(C)` since the
41+
size and alignment of slices were not guaranteed. However, the Rust compiler
42+
accepts the `repr(C)` declaration above without warning.
43+
44+
# Guide-level explanation
45+
[guide-level-explanation]: #guide-level-explanation
46+
47+
Slice references are represented with a pointer and length pair. Their in-memory
48+
layout is the same as a `#[repr(C)]` struct like the following:
49+
50+
```rust
51+
#[repr(C)]
52+
struct Slice<T> {
53+
data: *const T,
54+
len: usize,
55+
}
56+
```
57+
58+
The precise ABI of slice references is not guaranteed, so `&[T]` may not be
59+
passed by-value or returned by-value from an `extern "C" fn`.
60+
61+
The validity requirements for the in-memory slice representation are the same
62+
as [those documented on `std::slice::from_raw_parts`](https://doc.rust-lang.org/std/slice/fn.from_raw_parts.html).
63+
Namely:
64+
65+
* `data` must be non-null, valid for reads for `len * mem::size_of::<T>()` many bytes,
66+
and it must be properly aligned. This means in particular:
67+
68+
* The entire memory range of this slice must be contained within a single allocated object!
69+
Slices can never span across multiple allocated objects.
70+
* `data` must be non-null and aligned even for zero-length slices or slices of ZSTs. One
71+
reason for this is that enum layout optimizations may rely on references
72+
(including slices of any length) being aligned and non-null to distinguish
73+
them from other data. You can obtain a pointer that is usable as `data`
74+
for zero-length slices using [`NonNull::dangling()`].
75+
76+
* `data` must point to `len` consecutive properly initialized values of type `T`.
77+
78+
* The total size `len * mem::size_of::<T>()` of the slice must be no larger than `isize::MAX`,
79+
and adding that size to `data` must not "wrap around" the address space.
80+
See the safety documentation of [`pointer::offset`].
81+
82+
## `str`
83+
84+
The layout of `&str` is the same as that of `&[u8]`, and the layout of
85+
`&mut str` is the same as that of `&mut [u8]`. More generally, `str` behaves like
86+
`#[repr(transparent)] struct str([u8]);`. Safe Rust functions may assume that
87+
`str` holds valid UTF8, but [it is not immediate undefined-behavior to store
88+
non-UTF8 data in `str`](https://doc.rust-lang.org/std/primitive.str.html#invariant).
89+
90+
# Drawbacks
91+
[drawbacks]: #drawbacks
92+
93+
## Zero-sized types
94+
95+
One could imagine representing `&[T]` as only `len` for zero-sized `T`.
96+
This proposal would preclude that choice in favor of a standard representation
97+
for slices regardless of the underlying type.
98+
99+
Alternatively, we could choose to guarantee that the data pointer is present if
100+
and only if `size_of::<T> != 0`. This has the possibility of breaking exising
101+
code which smuggles pointers through the `data` value in `from_raw_parts` /
102+
`into_raw_parts`.
103+
104+
## Uninhabited types
105+
106+
Similarly, we could be *extra* tricky and make `&[!]` or other `&[Uninhabited]`
107+
types into a ZST since the slice can only ever be length zero.
108+
109+
If we want to maintain the pointer field, we could also make `&[!]` *just* a
110+
pointer since we know the length can only be zero.
111+
112+
Either option may offer modest performance benefits for highly generic code
113+
which happens to create empty slices of uninhabited types, but this is unlikely
114+
to be worth the cost of maintaining a special case.
115+
116+
## Compatibility with C++ `std::span`
117+
118+
The largest drawback of this layout and set of validity requirements is that it
119+
may preclude `&[T]` from being representationally equivalent to C++'s
120+
`std::span<T, std::dynamic_extent>`.
121+
122+
* `std::span` does not currently guarantee its layout. In practice, pointer + length
123+
is the common representation. This is even observable using `is_layout_compatible`
124+
[on MSVC](https://godbolt.org/z/Y8ardrshY), though not
125+
[on GCC](https://godbolt.org/z/s4v4xehnG) nor
126+
[on Clang](https://godbolt.org/z/qsd1K5oGq). Future changes to guarantee a
127+
different layout in the C++ standard (unlikely due to MSVC ABI stabilitiy
128+
requirements) could preclude matching the layout with `&[T]`.
129+
130+
* Unlike Rust, `std::span` allows the `data` pointer to be `nullptr`. One
131+
possibile workaround for this would be to guarantee that `Option<&[T]>` uses
132+
`data: std::ptr::null(), len: 0` to represent the `None` case, making
133+
`std::span<T>` equivalent to `Option<&[T]>` for non-zero-sized types.
134+
135+
Note that this is not currently the case. The compiler currenty represents
136+
`None::<&[u8]>` as `data: std::ptr::null(), len: uninit` (though this is
137+
not guaranteed).
138+
139+
* Rust uses a dangling pointer in the representation of zero-length slices.
140+
It's unclear whether C++ guarantees that a dangling pointer will remain
141+
unchanged when passed through `std::span`. However, it does support
142+
dangling pointers during regular construction via the use of
143+
[`std::to_address`](https://en.cppreference.com/w/cpp/container/span/span)
144+
in the iterator constructors.
145+
146+
Note that C++ also does not support zero-sized types, so there is no naive way
147+
to represent types like `std::span<SomeZeroSizedRustType>`.
148+
149+
## Flexibility
150+
151+
Additionally, guaranteeing layout of Rust-native types limits the compiler's and
152+
standard library's ability to change and take advantage of new optimization
153+
opportunities.
154+
155+
# Rationale and alternatives
156+
[rationale-and-alternatives]: #rationale-and-alternatives
157+
158+
* We could avoid committing to a particular representation for slices.
159+
160+
* We could try to guarantee layout compatibility with a particular target's
161+
`std::span` representation, though without standardization this may be
162+
impossible. Multiple different C++ stdlib implementations may be used on
163+
the same platform and could potentially have different span representations.
164+
In practice, current span representations also use ptr+len pairs.
165+
166+
* We could avoid storing a data pointer for zero-sized types. This would result
167+
in a more compact representation but would mean that the representation of
168+
`&[T]` is dependent on the type of `T`. Additionally, this would break
169+
existing code which depends on storing data in the pointer of ZST slices.
170+
171+
This would break popular crates such as [bitvec](https://docs.rs/crate/bitvec/1.0.1/source/doc/ptr/BitSpan.md)
172+
(55 million downloads) and would result in strange behavior such as
173+
`std::ptr::slice_from_raw_parts(ptr, len).as_ptr()` returning a different
174+
pointer from the one that was passed in.
175+
176+
Types like `*const ()` / `&()` are widely used to pass around pointers today.
177+
We cannot make them zero-sized, and it would be surprising to make a
178+
different choice for `&[()]`.
179+
180+
181+
# Prior art
182+
[prior-art]: #prior-art
183+
184+
The layout in this RFC is already documented in
185+
[the Unsafe Code Guildelines Reference.](https://rust-lang.github.io/unsafe-code-guidelines/layout/pointers.html)
186+
187+
# Future possibilities
188+
[future-possibilities]: #future-possibilities
189+
190+
* Consider defining a separate Rust type which is repr-equivalent to the platform's
191+
native `std::span<T, std::dynamic_extent>` to allow for easier
192+
interoperability with C++ APIs. Unfortunately, the C++ standard does not
193+
guarantee the layout of `std::span` (though the representation may be known
194+
and fixed on a particular implementation, e.g. libc++/libstdc++/MSVC).
195+
Zero-sized types would also not be supported with a naive implementation of
196+
such a type.

0 commit comments

Comments
 (0)