|
| 1 | +- Feature Name: guaranteed_slice_repr |
| 2 | +- Start Date: 2025-02-18 |
| 3 | +- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000) |
| 4 | +- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000) |
| 5 | + |
| 6 | +# Summary |
| 7 | +[summary]: #summary |
| 8 | + |
| 9 | +This RFC guarantees the in-memory representation of slice and str references. |
| 10 | +Specifically, `&[T]` and `&mut [T]` are guaranteed to have the same layout as: |
| 11 | + |
| 12 | +```rust |
| 13 | +#[repr(C)] |
| 14 | +struct Slice<T> { |
| 15 | + data: *const T, |
| 16 | + len: usize, |
| 17 | +} |
| 18 | +``` |
| 19 | + |
| 20 | +The layout of `&str` is the same as that of `&[u8]`, and the layout of |
| 21 | +`&mut str` is the same as that of `&mut [u8]`. |
| 22 | + |
| 23 | +# Motivation |
| 24 | +[motivation]: #motivation |
| 25 | + |
| 26 | +This RFC allows non-Rust (e.g. C or C++) code to read from or write to existing |
| 27 | +slices and to declare slice fields or locals. |
| 28 | + |
| 29 | +For example, guaranteeing the representation of slice references allows |
| 30 | +non-Rust code to read from the `data` or `len` fields of `string` in the type |
| 31 | +below without intermediate FFI calls into Rust: |
| 32 | + |
| 33 | +```rust |
| 34 | +#[repr(C)] |
| 35 | +struct HasString { |
| 36 | + string: &'static str, |
| 37 | +} |
| 38 | +``` |
| 39 | + |
| 40 | +Note: prior to this RFC, the type above is not even properly `repr(C)` since the |
| 41 | +size and alignment of slices were not guaranteed. However, the Rust compiler |
| 42 | +accepts the `repr(C)` declaration above without warning. |
| 43 | + |
| 44 | +# Guide-level explanation |
| 45 | +[guide-level-explanation]: #guide-level-explanation |
| 46 | + |
| 47 | +Slice references are represented with a pointer and length pair. Their in-memory |
| 48 | +layout is the same as a `#[repr(C)]` struct like the following: |
| 49 | + |
| 50 | +```rust |
| 51 | +#[repr(C)] |
| 52 | +struct Slice<T> { |
| 53 | + data: *const T, |
| 54 | + len: usize, |
| 55 | +} |
| 56 | +``` |
| 57 | + |
| 58 | +The precise ABI of slice references is not guaranteed, so `&[T]` may not be |
| 59 | +passed by-value or returned by-value from an `extern "C" fn`. |
| 60 | + |
| 61 | +The validity requirements for the in-memory slice representation are the same |
| 62 | +as [those documented on `std::slice::from_raw_parts`](https://doc.rust-lang.org/std/slice/fn.from_raw_parts.html). |
| 63 | +Namely: |
| 64 | + |
| 65 | +* `data` must be non-null, valid for reads for `len * mem::size_of::<T>()` many bytes, |
| 66 | + and it must be properly aligned. This means in particular: |
| 67 | + |
| 68 | + * The entire memory range of this slice must be contained within a single allocated object! |
| 69 | + Slices can never span across multiple allocated objects. |
| 70 | + * `data` must be non-null and aligned even for zero-length slices or slices of ZSTs. One |
| 71 | + reason for this is that enum layout optimizations may rely on references |
| 72 | + (including slices of any length) being aligned and non-null to distinguish |
| 73 | + them from other data. You can obtain a pointer that is usable as `data` |
| 74 | + for zero-length slices using [`NonNull::dangling()`]. |
| 75 | + |
| 76 | +* `data` must point to `len` consecutive properly initialized values of type `T`. |
| 77 | + |
| 78 | +* The total size `len * mem::size_of::<T>()` of the slice must be no larger than `isize::MAX`, |
| 79 | + and adding that size to `data` must not "wrap around" the address space. |
| 80 | + See the safety documentation of [`pointer::offset`]. |
| 81 | + |
| 82 | +## `str` |
| 83 | + |
| 84 | +The layout of `&str` is the same as that of `&[u8]`, and the layout of |
| 85 | +`&mut str` is the same as that of `&mut [u8]`. More generally, `str` behaves like |
| 86 | +`#[repr(transparent)] struct str([u8]);`. Safe Rust functions may assume that |
| 87 | +`str` holds valid UTF8, but [it is not immediate undefined-behavior to store |
| 88 | +non-UTF8 data in `str`](https://doc.rust-lang.org/std/primitive.str.html#invariant). |
| 89 | + |
| 90 | +# Drawbacks |
| 91 | +[drawbacks]: #drawbacks |
| 92 | + |
| 93 | +## Zero-sized types |
| 94 | + |
| 95 | +One could imagine representing `&[T]` as only `len` for zero-sized `T`. |
| 96 | +This proposal would preclude that choice in favor of a standard representation |
| 97 | +for slices regardless of the underlying type. |
| 98 | + |
| 99 | +Alternatively, we could choose to guarantee that the data pointer is present if |
| 100 | +and only if `size_of::<T> != 0`. This has the possibility of breaking exising |
| 101 | +code which smuggles pointers through the `data` value in `from_raw_parts` / |
| 102 | +`into_raw_parts`. |
| 103 | + |
| 104 | +## Uninhabited types |
| 105 | + |
| 106 | +Similarly, we could be *extra* tricky and make `&[!]` or other `&[Uninhabited]` |
| 107 | +types into a ZST since the slice can only ever be length zero. |
| 108 | + |
| 109 | +If we want to maintain the pointer field, we could also make `&[!]` *just* a |
| 110 | +pointer since we know the length can only be zero. |
| 111 | + |
| 112 | +Either option may offer modest performance benefits for highly generic code |
| 113 | +which happens to create empty slices of uninhabited types, but this is unlikely |
| 114 | +to be worth the cost of maintaining a special case. |
| 115 | + |
| 116 | +## Compatibility with C++ `std::span` |
| 117 | + |
| 118 | +The largest drawback of this layout and set of validity requirements is that it |
| 119 | +may preclude `&[T]` from being representationally equivalent to C++'s |
| 120 | +`std::span<T, std::dynamic_extent>`. |
| 121 | + |
| 122 | +* `std::span` does not currently guarantee its layout. In practice, pointer + length |
| 123 | + is the common representation. This is even observable using `is_layout_compatible` |
| 124 | + [on MSVC](https://godbolt.org/z/Y8ardrshY), though not |
| 125 | + [on GCC](https://godbolt.org/z/s4v4xehnG) nor |
| 126 | + [on Clang](https://godbolt.org/z/qsd1K5oGq). Future changes to guarantee a |
| 127 | + different layout in the C++ standard (unlikely due to MSVC ABI stabilitiy |
| 128 | + requirements) could preclude matching the layout with `&[T]`. |
| 129 | + |
| 130 | +* Unlike Rust, `std::span` allows the `data` pointer to be `nullptr`. One |
| 131 | + possibile workaround for this would be to guarantee that `Option<&[T]>` uses |
| 132 | + `data: std::ptr::null(), len: 0` to represent the `None` case, making |
| 133 | + `std::span<T>` equivalent to `Option<&[T]>` for non-zero-sized types. |
| 134 | + |
| 135 | + Note that this is not currently the case. The compiler currenty represents |
| 136 | + `None::<&[u8]>` as `data: std::ptr::null(), len: uninit` (though this is |
| 137 | + not guaranteed). |
| 138 | + |
| 139 | +* Rust uses a dangling pointer in the representation of zero-length slices. |
| 140 | + It's unclear whether C++ guarantees that a dangling pointer will remain |
| 141 | + unchanged when passed through `std::span`. However, it does support |
| 142 | + dangling pointers during regular construction via the use of |
| 143 | + [`std::to_address`](https://en.cppreference.com/w/cpp/container/span/span) |
| 144 | + in the iterator constructors. |
| 145 | + |
| 146 | +Note that C++ also does not support zero-sized types, so there is no naive way |
| 147 | +to represent types like `std::span<SomeZeroSizedRustType>`. |
| 148 | + |
| 149 | +## Flexibility |
| 150 | + |
| 151 | +Additionally, guaranteeing layout of Rust-native types limits the compiler's and |
| 152 | +standard library's ability to change and take advantage of new optimization |
| 153 | +opportunities. |
| 154 | + |
| 155 | +# Rationale and alternatives |
| 156 | +[rationale-and-alternatives]: #rationale-and-alternatives |
| 157 | + |
| 158 | +* We could avoid committing to a particular representation for slices. |
| 159 | + |
| 160 | +* We could try to guarantee layout compatibility with a particular target's |
| 161 | + `std::span` representation, though without standardization this may be |
| 162 | + impossible. Multiple different C++ stdlib implementations may be used on |
| 163 | + the same platform and could potentially have different span representations. |
| 164 | + In practice, current span representations also use ptr+len pairs. |
| 165 | + |
| 166 | +* We could avoid storing a data pointer for zero-sized types. This would result |
| 167 | + in a more compact representation but would mean that the representation of |
| 168 | + `&[T]` is dependent on the type of `T`. Additionally, this would break |
| 169 | + existing code which depends on storing data in the pointer of ZST slices. |
| 170 | + |
| 171 | + This would break popular crates such as [bitvec](https://docs.rs/crate/bitvec/1.0.1/source/doc/ptr/BitSpan.md) |
| 172 | + (55 million downloads) and would result in strange behavior such as |
| 173 | + `std::ptr::slice_from_raw_parts(ptr, len).as_ptr()` returning a different |
| 174 | + pointer from the one that was passed in. |
| 175 | + |
| 176 | + Types like `*const ()` / `&()` are widely used to pass around pointers today. |
| 177 | + We cannot make them zero-sized, and it would be surprising to make a |
| 178 | + different choice for `&[()]`. |
| 179 | + |
| 180 | + |
| 181 | +# Prior art |
| 182 | +[prior-art]: #prior-art |
| 183 | + |
| 184 | +The layout in this RFC is already documented in |
| 185 | +[the Unsafe Code Guildelines Reference.](https://rust-lang.github.io/unsafe-code-guidelines/layout/pointers.html) |
| 186 | + |
| 187 | +# Future possibilities |
| 188 | +[future-possibilities]: #future-possibilities |
| 189 | + |
| 190 | +* Consider defining a separate Rust type which is repr-equivalent to the platform's |
| 191 | + native `std::span<T, std::dynamic_extent>` to allow for easier |
| 192 | + interoperability with C++ APIs. Unfortunately, the C++ standard does not |
| 193 | + guarantee the layout of `std::span` (though the representation may be known |
| 194 | + and fixed on a particular implementation, e.g. libc++/libstdc++/MSVC). |
| 195 | + Zero-sized types would also not be supported with a naive implementation of |
| 196 | + such a type. |
0 commit comments