-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should values be well-typed? #561
Comments
Pretty hard "no" for this from me. "Erroneous behavior" should be defined in the operational semantics and checkable with sanitizers like Miri. "Does this type satisfy its library/safety invariant" is in many cases not decidable (e.g. for |
This is type system EB, not opsem EB. This is similar to the situation we have with library UB (not checkable) and opsem UB (checkable). (I'm using "opsem" instead of "language" to be clear that I mean calculus.) And regardless of this "not opsem" consideration, once we start to have contracts support, and some people start using them, and some of them are checkable, then we would be able to check both library UB and type system EB (to the extent of how checkable they are and how much people use them). |
I don't understand what "type system EB" should be or mean. You pointed at #512, which is strictly an opsem concept. If I make a guess, it sounds like it would require e.g. methods on
The status quo is that violating the library invariant is library UB. So I don't know what you are trying to imply here about the status quo. Saying that this is EB means providing more guarantees to users than we currently do -- guarantees we do not want to provide. Or maybe you're using the terms EB and "ill-typed values" in completely different ways, I can't tell. |
I might have generalized EB the same way UB was generalized:
That's not what I'm suggesting for 2 reasons:
Declaring ill-typed values EB means that if in your program a value is ill-typed (not necessarily when calling a library function), you are operating outside recommendations. The behavior is the same as today.
This is only true when calling a function (that is not robust). Saying that ill-typed values is EB covers the remaining cases where you have |
Thinking about it, a much simpler way to formulate it, is to draw the parallel with language UB:
EDIT: I've updated OP with the feedback received so far. |
First off: violating API contracts is erroneous behavior. If the violation does not require using Aside: I believe the main disconnect between how you're conceptualizing things and how Ralf and I are conceptualizing the same concepts is that you're trying to extend the concepts which we're applying exclusively to soundness w.r.t. automated compiler optimizations (more often called memory safety) in order to also reason about arbitrary program correctness properties. You do this primarily for the purpose of formally reasoning that arbitrary program correctness feeding back into memory safety still maintains program soundness, but this overgeneralization makes it much more difficult to emphasize the most important parts of direct soundness. Soundness can rely on correctness. It isn't necessary to have one unified system which can formally reason about them as one thing.
TL;DR: your generalization of EB is incorrect as I understand the tool of EB to be defined such that it may be useful. What you are describing is instead almost exactly what #539 proposes to call "language UB" and "library UB" as they are handled by tooling for arbitrary user code. Reframing Library UB as Erroneous Behavior is definitely interesting — from the language and compiler view, library UB "doesn't exist" and still has defined behavior, so EB may actually be a better description of what is occurring; any user defined invariants cannot add new cases of UB to the language, only new erroneous states. However, a strong part of the reason that EB is useful as a tool is that it is not UB. If we say that "using1" The allowed outcomes of EB as discussed in #512 are either a diagnosis (as defined by the sanitizing implementation; it may be a program crash or halt, a
It isn't "fine" to violate a library invariant. If you have a side-channel which provides you with a weaker library invariant to work with (usually called "relying on implementation details), then it can seem "fine" to violate what the library invariant is for other people, but you aren't ever actually violating invariants, you're simply just working with a different set of invariants. The difference is only one thing: the full set of library invariants is extensible, and thus tooling cannot diagnose library invariants unless you tell it how to do so. It may seem like side-channel guarantees allow you to violate the library invariants, but not any language invariants. But this isn't the case; we have explicitly reserved the right to make cases of library UB into defined behavior in the future (your correct program must not rely on absence of language UB to prove its own correctness), and there are multiple cases of Rust level UB which we do not currently exploit nor do we tell LLVM about that you could choose to rely on the side-channel of knowing implementation details to write a program that works if it's only compiled under a toolchain that uses the same implementation details.
Generally? No. If some sort of future contracts functionality is applied to the type? In that case, likely yes. But this also allows the user to in turn distinguish between the different levels that the invariants may exist at. Pinned Footnotes
|
I updated OP to account for your feedback. The main changes are:
This issue is not about violating API contracts, but producing a value violating the safety invariant. This doesn't only happen at API boundaries. I consider this part of the type system contract with the programmer because it's about types (safety invariant). This is why I called it "type system EB". What you are referring to is "library UB". And I guess you are saying that library UB is (or could be) language EB. But that would be a different issue.
It seems to me you're confusing soundness and language UB. There is no such thing as "direct soundness". It's simply language UB. Soundness is a type system property that well-typed programs don't have language UB. Rust also uses the word soundness for programs, which could either mean that the program doesn't have language UB, or a proof of that fact. The distinction you want to make is in the context of an operation at a specific program point and between the following:
The fact that one part of a program relies on guarantees of another part to ensure that the whole program does not have language UB is not correctness. Not all guarantees are about correctness. In particular, those guarantees (used to prove the absence of language UB) are usually weaker than those used to prove that the program is correct.
No, the generalization is perfectly parallel with UB:
My bad, that wording was meant in opposition to the (immediate) UB of the first question. I fixed it into "maybe later UB" to make it clear.
Yes, I believe |
I suspect a possible source of confusion is that the unsafe code guidelines repository has been almost exclusively used for opsem rules, that it has become the default assumption that issues are about opsem rules. This issue is not about opsem rules, it's about unsafe code guidelines, namely how to write unsafe code. It's not about which unsafe code has language UB (which is a narrower sense of how to write unsafe code). It was my mistake to mention erroneous behavior, since its current narrow definition is about opsem exclusively. This mention is now removed, so hopefully this source of confusion has been removed. But I feel that clarifying the reasoning may help too. |
The UCG currently requires that produced values satisfy their validity invariant. This issue asks whether the UCG should also recommend that produced values satisfy their safety invariant. Here are examples of values that would follow the requirement but not the recommendation:
&str
that are not UTF-8 (somehow this seems a common example).&mut T
that are pinned (return value ofPin::get_unchecked_mut()
).&mut [u8]
that point to uninitialized bytes (Document current justification for not requiring recursive reference validity (in particular,&mut uninit
not being immediate UB) #346).&'static T
that are not'static
(Box::leak()
eventually followed byBox::from_raw()
).Those values can be considered "unsafe" because they cannot be passed to arbitrary safe code (aka safe code outside the scope of unsafe) at their current type without possibly breaking soundness.
The decision is not obvious, because there are many violations of this recommendation at this time (and more getting added). Besides this legacy issue, there is a non-trivial trade-off between making the language simpler to explain on one side, and making the language harder to use on the other (most probably why there are so many violations at this time).
Difference with library UB
This issue is not about the UCG recommending programs to not have library UB. This would be a different issue.
Library UB occurs when the safety contract of a library API is violated. This issue is about producing a value that does not satisfy its safety invariant. It is independent of the notion of library API and their safety contracts. It is a notion of the type system and in particular types (possibly library types) and their safety invariants.
Language simpler to explain
The rules for unsafe Rust are simpler for users who follow recommendations, because they don't need to know about the validity invariant (since you can't produce a value violating the validity invariant without also violating the safety invariant). This is valuable for a few reasons:
The language is also simpler to make sense of. There is no need to account for ill-typed values (no need for an update type when a refinement type would suffice, see also this thread).
This would also be a way to provide partial guarantees about validity invariants in the form of new and weaker types whose safety invariant matches the partial guarantees we want to provide for that validity invariant. Note that multiple types (of the type system) can have the same underlying calculus type (the opsem concept that gives the notion of validity invariant). This is for example the case with pointers. This gives programmers access to a hierarchy of types with different safety invariants for the same underlying calculus type, to choose the most appropriate type for the needed operation (see next section).
Language harder to use
The type of an ill-typed value needs to be changed (weakened in the hierarchy of types) until the value becomes well-typed at that new type. In the initial examples, that would be:
&[u8]
instead of&str
.Pin<&mut T>
orNonNull<T>
instead of&mut T
.&mut [MaybeUninit<u8>]
instead of&mut [u8]
.NonNull<T>
or&'a T
(with a correct and possibly too strict'a
) instead of&'static T
.Besides the inconvenience of propagating those changes (when starting from an ill-typed program), one needs to find a suitable weakened type (even when starting a program from scratch). To maximize the chances of finding a suitable type, the hierarchy of types needs to be as small grain as possible, and go up (or down depending how you look at it) to the most weakened types, namely the calculus types. The situation has improved in that regard, like the introduction of
MaybeUninit
, and will probably continue, like withUnsafeAlias
. One could hope that it would continue to improve, and a suitable type can always be found without having to weaken too much and use*mut T
ifNonNull<T>
would suffice (and it would not exist).Values are usually only temporarily ill-typed, since they should eventually reach safe code. When values cross a safe API, their type may need to be adjusted such that they can be used at their maximum capability without unsafe. For example, if unsafe code took a
&mut [MaybeUninit<u8>]
as argument, it could return a&mut [u8]
to the initialized part as convenience to its user, maximizing what a safe user could do compared to an unsafe user.Changing the type of a value crossing a safe API is only possible in some situations. If the type is under a type constructor, one needs layout guarantees like those of slices. If unsafe code receives a
Foo<Dirty>
, cleans it, and returns aFoo<Clean>
(whereClean
is a semantic subtype ofDirty
), it can't simply transmute the value ifFoo
doesn't provide any layout guarantees. It needs to deconstruct and reconstruct the value.Those difficulties seem quite consequent for multiple reasons:
The main question (at least for me) is whether we believe those difficulties will be lifted over time, and it's already time to start having type hygiene, or whether we have to wait and continue wearing our cowboy hats.
Terminology
The term EB (#512) could be generalized the same way the term UB has been. We would get:
The UCG rules mentioned at the beginning of the issue would read:
The text was updated successfully, but these errors were encountered: