|
| 1 | +- Feature Name: guaranteed_slice_repr |
| 2 | +- Start Date: 2025-02-18 |
| 3 | +- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000) |
| 4 | +- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000) |
| 5 | + |
| 6 | +# Summary |
| 7 | +[summary]: #summary |
| 8 | + |
| 9 | +This RFC guarantees the in-memory representation of slice and str references. |
| 10 | +Specifically, `&[T]` is guaranteed to have the same layout as: |
| 11 | + |
| 12 | +```rust |
| 13 | +#[repr(C)] |
| 14 | +struct Slice<T> { |
| 15 | + data: *const T, |
| 16 | + len: usize, |
| 17 | +} |
| 18 | +``` |
| 19 | + |
| 20 | +The layout of `&str` is the same as that of `&[u8]`, and the layout of |
| 21 | +`&mut str` is the same as that of `&mut [u8]`. |
| 22 | + |
| 23 | +# Motivation |
| 24 | +[motivation]: #motivation |
| 25 | + |
| 26 | +This RFC allows non-Rust (e.g. C or C++) code to read from or write to existing |
| 27 | +slices and to declare slice fields or locals. |
| 28 | + |
| 29 | +For example, guaranteeing the representation of slices allows non-Rust code to |
| 30 | +read from the `data` or `len` fields of `string` in the type below without |
| 31 | +intermediate FFI calls into Rust: |
| 32 | + |
| 33 | +```rust |
| 34 | +#[repr(C)] |
| 35 | +struct HasString { |
| 36 | + string: &'static str, |
| 37 | +} |
| 38 | +``` |
| 39 | + |
| 40 | +Note: prior to this RFC, the type above is not even properly `repr(C)` since the |
| 41 | +size and alignment of slices were not guaranteed. However, the Rust compiler |
| 42 | +accepts `repr(C)` declaration above without warning. |
| 43 | + |
| 44 | +# Guide-level explanation |
| 45 | +[guide-level-explanation]: #guide-level-explanation |
| 46 | + |
| 47 | +Slices are represented with a pointer and length pair. Their in-memory layout is |
| 48 | +the same as a `#[repr(C)]` struct like the following: |
| 49 | + |
| 50 | +```rust |
| 51 | +#[repr(C)] |
| 52 | +struct Slice<T> { |
| 53 | + data: *const T, |
| 54 | + len: usize, |
| 55 | +} |
| 56 | +``` |
| 57 | + |
| 58 | +The precise ABI of slices is not guaranteed, so `&[T]` may not be passed by-value |
| 59 | +or returned by-value from an `extern "C" fn`. |
| 60 | + |
| 61 | +The validity requirements for the in-memory slice representation are the same |
| 62 | +as [those documented on `std::slice::from_raw_parts`](https://doc.rust-lang.org/std/slice/fn.from_raw_parts.html). |
| 63 | +Namely: |
| 64 | + |
| 65 | +* `data` must be non-null, valid for reads for `len * mem::size_of::<T>()` many bytes, |
| 66 | + and it must be properly aligned. This means in particular: |
| 67 | + |
| 68 | + * The entire memory range of this slice must be contained within a single allocated object! |
| 69 | + Slices can never span across multiple allocated objects. |
| 70 | + * `data` must be non-null and aligned even for zero-length slices or slices of ZSTs. One |
| 71 | + reason for this is that enum layout optimizations may rely on references |
| 72 | + (including slices of any length) being aligned and non-null to distinguish |
| 73 | + them from other data. You can obtain a pointer that is usable as `data` |
| 74 | + for zero-length slices using [`NonNull::dangling()`]. |
| 75 | + |
| 76 | +* `data` must point to `len` consecutive properly initialized values of type `T`. |
| 77 | + |
| 78 | +* The total size `len * mem::size_of::<T>()` of the slice must be no larger than `isize::MAX`, |
| 79 | + and adding that size to `data` must not "wrap around" the address space. |
| 80 | + See the safety documentation of [`pointer::offset`]. |
| 81 | + |
| 82 | +# Drawbacks |
| 83 | +[drawbacks]: #drawbacks |
| 84 | + |
| 85 | +## Zero-sized types |
| 86 | + |
| 87 | +One could imagine representing `&[T]` as only `len` for zero-sized `T`. |
| 88 | +This proposal would preclude that choice in favor of a standard representation |
| 89 | +for slices regardless of the underlying type. |
| 90 | + |
| 91 | +Alternatively, we could choose to guarantee that the data pointer is present if |
| 92 | +and only if `size_of::<T> != 0`. This has the possibility of breaking exising |
| 93 | +code which smuggles pointers through the `data` value in `from_raw_parts` / |
| 94 | +`into_raw_parts`. |
| 95 | + |
| 96 | +## Uninhabited types |
| 97 | + |
| 98 | +Similarly, we could be *extra* tricky and make `&[!]` or other `&[Uninhabited]` |
| 99 | +types into a ZST since the slice can only ever be length zero. This may offer |
| 100 | +modest performance benefits for highly generic code which happens to create |
| 101 | +empty slices of uninhabited types, but this is unlikely to be worth the |
| 102 | +cost of maintaining a special case. |
| 103 | + |
| 104 | +## Compatibility with C++ `std::span` |
| 105 | + |
| 106 | +The largest drawback of this layout and set of validity requirements is that it |
| 107 | +may preclude `&[T]` from being representationally equivalent to C++'s |
| 108 | +`std::span<T, std::dynamic_extent>`. |
| 109 | + |
| 110 | +* `std::span` does not currently guarantee its layout. In practice, pointer + length |
| 111 | + is the common representation. This is even observable using `is_layout_compatible` |
| 112 | + [on MSVC](https://godbolt.org/z/Y8ardrshY), though not |
| 113 | + [on GCC](https://godbolt.org/z/s4v4xehnG) nor |
| 114 | + [on Clang](https://godbolt.org/z/qsd1K5oGq). Future changes to guarantee a |
| 115 | + different layout in the C++ standard (unlikely due to MSVC ABI stabilitiy |
| 116 | + requirements) could preclude matching the layout with `&[T]`. |
| 117 | + |
| 118 | +* Unlike Rust, `std::span` allows the `data` pointer to be `nullptr`. One |
| 119 | + possibile workaround for this would be to guarantee that `Option<&[T]>` uses |
| 120 | + `data: std::ptr::null()` to represent the `None` case, making `std::span<T>` |
| 121 | + equivalent to `Option<&[T]>` for non-zero-sized types. |
| 122 | + |
| 123 | +* Rust uses a dangling pointer in the representation of zero-length slices. |
| 124 | + It's unclear whether C++ guarantees that a dangling pointer will remain |
| 125 | + unchanged when passed through `std::span`. However, it does support |
| 126 | + dangling pointers during regular construction via the use of |
| 127 | + [`std::to_address`](https://en.cppreference.com/w/cpp/container/span/span) |
| 128 | + in the iterator constructors. |
| 129 | + |
| 130 | +Note that C++ also does not support zero-sized types, so there is no naiive way |
| 131 | +to represent types like `std::span<SomeZeroSizedRustType>`. |
| 132 | + |
| 133 | +## Flexibility |
| 134 | + |
| 135 | +Additionally, guaranteeing layout of Rust-native types limits the compiler's and |
| 136 | +standard library's ability to change and take advantage of new optimization |
| 137 | +opportunities. |
| 138 | + |
| 139 | +# Rationale and alternatives |
| 140 | +[rationale-and-alternatives]: #rationale-and-alternatives |
| 141 | + |
| 142 | +* We could avoid committing to a particular representation for slices. |
| 143 | + |
| 144 | +* We could try to guarantee layout compatibility with a particular target's |
| 145 | + `std::span` representation, though without standardization this may be |
| 146 | + impossible. Multiple different C++ stdlib implementations may be used on |
| 147 | + the same platform and could potentially have different span representations. |
| 148 | + In practice, current span representations also use ptr+len pairs. |
| 149 | + |
| 150 | +* We could avoid storing a data pointer for zero-sized types. This would result |
| 151 | + in a more compact representation but would mean that the representation of |
| 152 | + `&[T]` is dependent on the type of `T`. |
| 153 | + |
| 154 | +# Prior art |
| 155 | +[prior-art]: #prior-art |
| 156 | + |
| 157 | +The layout in this RFC is already documented in |
| 158 | +[the Unsafe Code Guildelines Reference.](https://rust-lang.github.io/unsafe-code-guidelines/layout/pointers.html) |
| 159 | + |
| 160 | +# Unresolved questions |
| 161 | +[unresolved-questions]: #unresolved-questions |
| 162 | + |
| 163 | +* Should `&[T]` include a pointer when `T` is zero-sized? |
| 164 | + |
| 165 | +# Future possibilities |
| 166 | +[future-possibilities]: #future-possibilities |
| 167 | + |
| 168 | +* Consider defining a separate Rust type which is repr-equivalent to the platform's |
| 169 | + native `std::span<T, std::dynamic_extent>` to allow for easier |
| 170 | + interoperability with C++ APIs. Unfortunately, the C++ standard does not |
| 171 | + guarantee the layout of `std::span` (though the representation may be known |
| 172 | + and fixed on a particular implementation, e.g. libc++/libstdc++/MSVC). |
| 173 | + Zero-sized types would also not be supported with a naiive implementation of |
| 174 | + such a type. |
0 commit comments