Skip to content

Decide on when MIR Discriminant() operation is UB #91095

Open
@RalfJung

Description

@RalfJung

We do not currently have a clear description of what the semantics of the Discriminant() MIR operation, and the corresponding intrinsic (exposed via mem::discriminant()), are -- specifically, what are the safety preconditions of this operation, and when is it UB?

Note that this operation works on all types, not just enums. For valid values of non non-enum types it returns some valid integer value (currently, 0).

The implementation in Miri (to be restored with #91088) does the minimum amount of work necessary to determine the discriminant: if the type has no discriminant (since there are not at least 2 variants), the operation is always defined; otherwise it reads the tag (which encodes the discriminant) and causes UB if that is uninitialized or does not encode a valid discriminant. (There are some thorny question here around what happens if the discriminant has provenance; I would like to keep that out of scope for this issue -- it should likely be treated like a ptr-to-int transmute, whatever we end up doing with that: rust-lang/unsafe-code-guidelines#286.)

The codegen backend adds some extra UB for the case where the type is uninhabited:

/// Obtain the actual discriminant of a value.
pub fn codegen_get_discr<Bx: BuilderMethods<'a, 'tcx, Value = V>>(
self,
bx: &mut Bx,
cast_to: Ty<'tcx>,
) -> V {
let cast_to = bx.cx().immediate_backend_type(bx.cx().layout_of(cast_to));
if self.layout.abi.is_uninhabited() {
return bx.cx().const_undef(cast_to);
}

We also have a related MIR optimization in https://github.com/rust-lang/rust/blob/93542a8240c5f926ac5f3f99cef99366082f9c2b/compiler/rustc_mir_transform/src/uninhabited_enum_branching.rs. I am not quite sure what this does though, it seems to be more about assuming that if a particular enum variant is uninhabited then we will never see the discriminant for that variant, and can hence remove it from the SwitchInt?

An 'obvious' choice is to say that the value passed to the Discriminant operator must fully satisfy its validity invariant -- that would certainly justify both the MIR optimization and what the codegen backend does. However, this also has problems:

These observations make me doubtful that requiring full validity is the right thing. Making the fewest assumptions is appealing IMO, but not compatible with our codegen backend nor with the MIR optimizations -- the optimization seems to kick in even for operations of the form Discriminant(*ptr), so the validity invariant of ptr itself does not help either. It could be possible to strike some middle ground, but that feels like a rather ad-hoc adjustment to the current set of optimizations.

To summarize:

  • Miri implements "minimal UB", requiring just enough to actually be able to compute the discriminant. This is incompatible with what MIR optimizations and the codegen backend do.
  • A principled alternative would be requiring full validity of the value, but that is incompatible with code emitted for dropping, and with aliasing assumptions.
  • Some middle ground is probably possible but seems entirely ad-hoc.

Cc @wesleywiser @tmandry @rust-lang/wg-mir-opt

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-MIRArea: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.htmlA-codegenArea: Code generationA-intrinsicsArea: IntrinsicsA-mir-optArea: MIR optimizationsC-discussionCategory: Discussion or questions that doesn't represent real issues.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions