Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guarantee soundness of pointer-to-int transmutes #1752

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

joshlf
Copy link
Contributor

@joshlf joshlf commented Mar 6, 2025

@rustbot rustbot added the S-waiting-on-review Status: The marked PR is awaiting review from a maintainer label Mar 6, 2025
Despite pointers and references being similar to `usize`s in the machine code emitted on most platforms,
the semantics of transmuting a reference or pointer type to a non-pointer type is currently undecided.
Thus, it may not be valid to transmute a pointer or reference type, `P`, to a `[u8; size_of::<P>()]`.
A pointer or reference type, `P`, is guaranteed to have all of its bytes initialized. Thus, it is always
Copy link
Member

@RalfJung RalfJung Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The phrasing is odd here -- it should say that a ptr or reference requires all its bytes to be initialized, shouldn't it? Bit validity primarily defines "which sequences of bytes can I transmute to this type T" (and which values do they then represent). This here is like saying "a bool value is guaranteed to be represented as 0x00 or 0x01", which is true but not what we usually say when we discuss validity of bool.

Furthermore, it is the representation relation of integers that decides whether bytes with provenance are permitted there, so it is odd to discuss that here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The phrasing is odd here -- it should say that a ptr or reference requires all its bytes to be initialized, shouldn't it? Bit validity primarily defines "which sequences of bytes can I transmute to this type T" (and which values do they then represent). This here is like saying "a bool value is guaranteed to be represented as 0x00 or 0x01", which is true but not what we usually say when we discuss validity of bool.

My goal here is to document what guarantees I can rely on if I currently have a pointer and want to look at its bytes. This prose amounts to a guarantee that we won't loosen the validity of pointers to permit uninit bytes. I would be concerned that the opposite wording would read like a sufficient-but-not-necessary condition that we could theoretically loosen in the future.

This is not just hypothetical: there's an argument that the current state of affairs is that int-to-ptr transmutes are well-defined, but that is not currently taken to imply that ptr-to-int transmutes are well-defined.

Furthermore, it is the representation relation of integers that decides whether bytes with provenance are permitted there, so it is odd to discuss that here.

I just wanted to give the looser warning that ptr-to-int-to-ptr might not result in an identical ptr from the AM's perspective. Do you have a suggestion of how to word this better?

Copy link
Member

@RalfJung RalfJung Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just wanted to give the looser warning that ptr-to-int-to-ptr might not result in an identical ptr from the AM's perspective. Do you have a suggestion of how to word this better?

I am objecting mostly to the placement, not the wording. Conceptually, what we eventually need is for every type a description of which byte sequences are valid for this type, and which value is represented by each valid byte sequence. This is, however, complicated by the fact that we're trying to give partial guarantees while some details are still being worked out... I don't have a coherent plan for how to deal with this transitional phase.

Copy link

@ia0 ia0 Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what we eventually need is for every type a description of which byte sequences are valid for this type

+1 and we also need a similar description for the library invariant of such language types

My goal here is to document what guarantees I can rely on if I currently have a pointer and want to look at its bytes.

Just mentioning a possible trap here. The programmer is not allowed to assume the language invariant by name (at least until that invariant is advertised as stable). This is an implicit rule. For example1 you cannot write the following robust function:

/// Robustness: `ptr` only needs to be valid, it doesn't need to be safe.
pub fn foo(ptr: *const u8);

Instead, one has to explicitly list all properties from the library invariant that are not relied on (of course, you can't go beyond the language invariant):

/// Robustness: `ptr` may be uninitialized.
pub fn foo(ptr: *const u8);

Said otherwise, as a programmer you can't assume that a value is valid (sounds weird yes), only the language implementation can do this. As a programmer you assume the property given to you (which is usually stronger than the language invariant) described as a diff from the library invariant.

Another possible problem with the word "guarantee", is who guarantees what to whom? There's at least 2 options:

  • A value producer guarantees a set of possible byte sequences to a value consumer (this cannot mention the language invariant by name unless it is stable).
  • The language guarantees to the programmer that the language invariant will only weaken (said otherwise, that the language won't introduce new undefined behaviors).

Footnotes

  1. For illustration purposes, those examples assume that the library invariant of pointers requires the bytes to be initialized and the language invariant doesn't. This doesn't reflect any consensus and is not meant to.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FTR, by the stability promise, you can actually assume a given value is valid (assuming it is ever documented as such or otherwise permitted, such as by allowing its creation in safe code). What you cannot do is assume a given value is not valid.

Also, you can rely on the validity invariant by name (or any other invariant) if your reliance is for the fact of the matter alone. IE. if your only requirement to the function parameter is "Call is valid under language rules/language invariant", then you can name the validity invariant by name. Of course, there are few useful functions with such an invariant. (mem::forget() however, is a trivial example of a function where the definition can be such that the parameter only need be valid, and not necessarily safe - note however that there is no library guarantee mem::forget is, in fact, such a function).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, you can rely on the validity invariant by name (or any other invariant) if your reliance is for the fact of the matter alone

Indeed, if your function is polymorphic over the invariant it supports, then you can overspecify by naming the validity invariant. But I believe this is almost always useless and inaccurate (you better say you are polymorphic over the invariant).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: The marked PR is awaiting review from a maintainer
Projects
None yet
Development

Successfully merging this pull request may close these issues.

What about: Pointer-to-integer transmutes?
5 participants