Skip to content

Commit 73ad005

Browse files
committed
Guarantee slice representation
1 parent b0e56db commit 73ad005

File tree

1 file changed

+174
-0
lines changed

1 file changed

+174
-0
lines changed

text/0000-guaranteed-slice-repr.md

+174
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,174 @@
1+
- Feature Name: guaranteed_slice_repr
2+
- Start Date: 2025-02-18
3+
- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000)
4+
- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000)
5+
6+
# Summary
7+
[summary]: #summary
8+
9+
This RFC guarantees the in-memory representation of slice and str references.
10+
Specifically, `&[T]` is guaranteed to have the same layout as:
11+
12+
```rust
13+
#[repr(C)]
14+
struct Slice<T> {
15+
data: *const T,
16+
len: usize,
17+
}
18+
```
19+
20+
The layout of `&str` is the same as that of `&[u8]`, and the layout of
21+
`&mut str` is the same as that of `&mut [u8]`.
22+
23+
# Motivation
24+
[motivation]: #motivation
25+
26+
This RFC allows non-Rust (e.g. C or C++) code to read from or write to existing
27+
slices and to declare slice fields or locals.
28+
29+
For example, guaranteeing the representation of slices allows non-Rust code to
30+
read from the `data` or `len` fields of `string` in the type below without
31+
intermediate FFI calls into Rust:
32+
33+
```rust
34+
#[repr(C)]
35+
struct HasString {
36+
string: &'static str,
37+
}
38+
```
39+
40+
Note: prior to this RFC, the type above is not even properly `repr(C)` since the
41+
size and alignment of slices were not guaranteed. However, the Rust compiler
42+
accepts `repr(C)` declaration above without warning.
43+
44+
# Guide-level explanation
45+
[guide-level-explanation]: #guide-level-explanation
46+
47+
Slices are represented with a pointer and length pair. Their in-memory layout is
48+
the same as a `#[repr(C)]` struct like the following:
49+
50+
```rust
51+
#[repr(C)]
52+
struct Slice<T> {
53+
data: *const T,
54+
len: usize,
55+
}
56+
```
57+
58+
The precise ABI of slices is not guaranteed, so `&[T]` may not be passed by-value
59+
or returned by-value from an `extern "C" fn`.
60+
61+
The validity requirements for the in-memory slice representation are the same
62+
as [those documented on `std::slice::from_raw_parts`](https://doc.rust-lang.org/std/slice/fn.from_raw_parts.html).
63+
Namely:
64+
65+
* `data` must be non-null, valid for reads for `len * mem::size_of::<T>()` many bytes,
66+
and it must be properly aligned. This means in particular:
67+
68+
* The entire memory range of this slice must be contained within a single allocated object!
69+
Slices can never span across multiple allocated objects.
70+
* `data` must be non-null and aligned even for zero-length slices or slices of ZSTs. One
71+
reason for this is that enum layout optimizations may rely on references
72+
(including slices of any length) being aligned and non-null to distinguish
73+
them from other data. You can obtain a pointer that is usable as `data`
74+
for zero-length slices using [`NonNull::dangling()`].
75+
76+
* `data` must point to `len` consecutive properly initialized values of type `T`.
77+
78+
* The total size `len * mem::size_of::<T>()` of the slice must be no larger than `isize::MAX`,
79+
and adding that size to `data` must not "wrap around" the address space.
80+
See the safety documentation of [`pointer::offset`].
81+
82+
# Drawbacks
83+
[drawbacks]: #drawbacks
84+
85+
## Zero-sized types
86+
87+
One could imagine representing `&[T]` as only `len` for zero-sized `T`.
88+
This proposal would preclude that choice in favor of a standard representation
89+
for slices regardless of the underlying type.
90+
91+
Alternatively, we could choose to guarantee that the data pointer is present if
92+
and only if `size_of::<T> != 0`. This has the possibility of breaking exising
93+
code which smuggles pointers through the `data` value in `from_raw_parts` /
94+
`into_raw_parts`.
95+
96+
## Uninhabited types
97+
98+
Similarly, we could be *extra* tricky and make `&[!]` or other `&[Uninhabited]`
99+
types into a ZST since the slice can only ever be length zero. This may offer
100+
modest performance benefits for highly generic code which happens to create
101+
empty slices of uninhabited types, but this is unlikely to be worth the
102+
cost of maintaining a special case.
103+
104+
## Compatibility with C++ `std::span`
105+
106+
The largest drawback of this layout and set of validity requirements is that it
107+
may preclude `&[T]` from being representationally equivalent to C++'s
108+
`std::span<T, std::dynamic_extent>`.
109+
110+
* `std::span` does not currently guarantee its layout. In practice, pointer + length
111+
is the common representation. This is even observable using `is_layout_compatible`
112+
[on MSVC](https://godbolt.org/z/Y8ardrshY), though not
113+
[on GCC](https://godbolt.org/z/s4v4xehnG) nor
114+
[on Clang](https://godbolt.org/z/qsd1K5oGq). Future changes to guarantee a
115+
different layout in the C++ standard (unlikely due to MSVC ABI stabilitiy
116+
requirements) could preclude matching the layout with `&[T]`.
117+
118+
* Unlike Rust, `std::span` allows the `data` pointer to be `nullptr`. One
119+
possibile workaround for this would be to guarantee that `Option<&[T]>` uses
120+
`data: std::ptr::null()` to represent the `None` case, making `std::span<T>`
121+
equivalent to `Option<&[T]>` for non-zero-sized types.
122+
123+
* Rust uses a dangling pointer in the representation of zero-length slices.
124+
It's unclear whether C++ guarantees that a dangling pointer will remain
125+
unchanged when passed through `std::span`. However, it does support
126+
dangling pointers during regular construction via the use of
127+
[`std::to_address`](https://en.cppreference.com/w/cpp/container/span/span)
128+
in the iterator constructors.
129+
130+
Note that C++ also does not support zero-sized types, so there is no naiive way
131+
to represent types like `std::span<SomeZeroSizedRustType>`.
132+
133+
## Flexibility
134+
135+
Additionally, guaranteeing layout of Rust-native types limits the compiler's and
136+
standard library's ability to change and take advantage of new optimization
137+
opportunities.
138+
139+
# Rationale and alternatives
140+
[rationale-and-alternatives]: #rationale-and-alternatives
141+
142+
* We could avoid committing to a particular representation for slices.
143+
144+
* We could try to guarantee layout compatibility with a particular target's
145+
`std::span` representation, though without standardization this may be
146+
impossible. Multiple different C++ stdlib implementations may be used on
147+
the same platform and could potentially have different span representations.
148+
In practice, current span representations also use ptr+len pairs.
149+
150+
* We could avoid storing a data pointer for zero-sized types. This would result
151+
in a more compact representation but would mean that the representation of
152+
`&[T]` is dependent on the type of `T`.
153+
154+
# Prior art
155+
[prior-art]: #prior-art
156+
157+
The layout in this RFC is already documented in
158+
[the Unsafe Code Guildelines Reference.](https://rust-lang.github.io/unsafe-code-guidelines/layout/pointers.html)
159+
160+
# Unresolved questions
161+
[unresolved-questions]: #unresolved-questions
162+
163+
* Should `&[T]` include a pointer when `T` is zero-sized?
164+
165+
# Future possibilities
166+
[future-possibilities]: #future-possibilities
167+
168+
* Consider defining a separate Rust type which is repr-equivalent to the platform's
169+
native `std::span<T, std::dynamic_extent>` to allow for easier
170+
interoperability with C++ APIs. Unfortunately, the C++ standard does not
171+
guarantee the layout of `std::span` (though the representation may be known
172+
and fixed on a particular implementation, e.g. libc++/libstdc++/MSVC).
173+
Zero-sized types would also not be supported with a naiive implementation of
174+
such a type.

0 commit comments

Comments
 (0)