Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

__CPROVER_forall fails when dealing with flattening of nested structure #8570

Closed
hanno-becker opened this issue Jan 16, 2025 · 28 comments · Fixed by #8578
Closed

__CPROVER_forall fails when dealing with flattening of nested structure #8570

hanno-becker opened this issue Jan 16, 2025 · 28 comments · Fixed by #8578

Comments

@hanno-becker
Copy link

hanno-becker commented Jan 16, 2025

I'm observing unexpected behaviour when trying to use __CPROVER_forall to access the fields of a nested array structure when cast as a flat array of cells. cc @tautschnig @remi-delmas-3000

Minimal example:

/* instructions

foo:
	goto-cc harness.c --function foo_harness -o a.out
	goto-instrument --dfcc foo_harness --enforce-contract foo a.out b.out
	cbmc b.out --bitwuzla
*/

#include <stdint.h>

typedef struct __attribute__((packed)) {
    int data[2];
} arr;

typedef struct __attribute__((packed)) {
    arr vec[2];
} arrvec;

void foo(arrvec *x)
  __CPROVER_requires(__CPROVER_is_fresh(x, sizeof(arrvec)))
  __CPROVER_requires(x->vec[1].data[0] < 42)
{
    // OK:
    __CPROVER_assert(((int*)x)[2] < 42, "");
    // NOT OK:
    __CPROVER_assert(__CPROVER_forall {unsigned k; k == 2 ==> ((int*)x)[k] < 42}, "");
    // OK:
    __CPROVER_assert(__CPROVER_forall {unsigned k; k == 2 ==> ((int (*)[2])x)[k/2][k % 2] < 42}, "");
}

void foo_harness(void) {
    arr *x;
    foo(x);
}
@tautschnig
Copy link
Collaborator

Taking a look. Here is a simplified version that equally fails and neither uses contracts nor requires the use of an SMT solver - cbmc harness.c will do:

typedef struct __attribute__((packed)) {
    int data[2];
} arr;

typedef struct __attribute__((packed)) {
    arr vec[2];
} arrvec;

int main() {
    arrvec A;
    arrvec *x = &A;
    __CPROVER_assume(x->vec[1].data[0] < 42);
    // OK:
    __CPROVER_assert(((int*)x)[2] < 42, "");
    // NOT OK:
    __CPROVER_assert(__CPROVER_forall {unsigned k; k == 2 ==> ((int*)x)[k] < 42}, "");
    // OK:
    __CPROVER_assert(__CPROVER_forall {unsigned k; k == 2 ==> ((int (*)[2])x)[k/2][k % 2] < 42}, "");
}

@tautschnig
Copy link
Collaborator

Turns out this isn't a problem related to quantifiers. The following further simplification (still equivalent to the original one) also fails:

typedef struct __attribute__((packed)) {
    int data[2];
} arr;

typedef struct __attribute__((packed)) {
    arr vec[2];
} arrvec;

int main() {
    arrvec A;
    arrvec *x = &A;
    __CPROVER_assume(x->vec[1].data[0] < 42);
    unsigned k;
    __CPROVER_assert(k != 2 || ((int*)x)[k] < 42, "");
}

It seems the problem is with field sensitivity during symbolic execution, where we appear to (spuriously) turn ((int*)x)k effectively into A.vec[0].data[k].

hanno-becker added a commit to pq-code-package/mlkem-native that referenced this issue Jan 16, 2025
Previously, runtime assertions via debug_assert_xxx and CBMC
assertions via cassert(...) were separate.

This commit modifies the implementation of the debug assertion
macros so that when CBMC is used, debug assertions are intepreted
as proof obligations.

This removes some redundancy and non-uniformity in the code,
and also reduces the likelihood that debug assertions and CBMC
contracts get out of sync. In some case, this actually happened,
and the commit fixes this. The commit also adds further bounds
assertions in alignment with pre/post conditions.

A slight nuisance is that the debug assertions cannot flatten
nested structures like polyvec for the bounds check, running
into issue diffblue/cbmc#8570. We
work around this by introducing a new `xxx_2d` (for 2-dimensional)
macro which takes two dimensions and uses a two-step array
access, circumventing the above CBMC issue.

Signed-off-by: Hanno Becker <[email protected]>
hanno-becker added a commit to pq-code-package/mlkem-native that referenced this issue Jan 16, 2025
Previously, runtime assertions via debug_assert_xxx and CBMC
assertions via cassert(...) were separate.

This commit modifies the implementation of the debug assertion
macros so that when CBMC is used, debug assertions are intepreted
as proof obligations.

This removes some redundancy and non-uniformity in the code,
and also reduces the likelihood that debug assertions and CBMC
contracts get out of sync. In some case, this actually happened,
and the commit fixes this. The commit also adds further bounds
assertions in alignment with pre/post conditions.

A slight nuisance is that the debug assertions cannot flatten
nested structures like polyvec for the bounds check, running
into issue diffblue/cbmc#8570. We
work around this by introducing a new `xxx_2d` (for 2-dimensional)
macro which takes two dimensions and uses a two-step array
access, circumventing the above CBMC issue.

Signed-off-by: Hanno Becker <[email protected]>
hanno-becker added a commit to pq-code-package/mlkem-native that referenced this issue Jan 17, 2025
Previously, runtime assertions via debug_assert_xxx and CBMC
assertions via cassert(...) were separate.

This commit modifies the implementation of the debug assertion
macros so that when CBMC is used, debug assertions are intepreted
as proof obligations.

This removes some redundancy and non-uniformity in the code,
and also reduces the likelihood that debug assertions and CBMC
contracts get out of sync. In some case, this actually happened,
and the commit fixes this. The commit also adds further bounds
assertions in alignment with pre/post conditions.

A slight nuisance is that the debug assertions cannot flatten
nested structures like polyvec for the bounds check, running
into issue diffblue/cbmc#8570. We
work around this by introducing a new `xxx_2d` (for 2-dimensional)
macro which takes two dimensions and uses a two-step array
access, circumventing the above CBMC issue.

Signed-off-by: Hanno Becker <[email protected]>
hanno-becker added a commit to pq-code-package/mlkem-native that referenced this issue Jan 17, 2025
Previously, runtime assertions via debug_assert_xxx and CBMC
assertions via cassert(...) were separate.

This commit modifies the implementation of the debug assertion
macros so that when CBMC is used, debug assertions are intepreted
as proof obligations.

This removes some redundancy and non-uniformity in the code,
and also reduces the likelihood that debug assertions and CBMC
contracts get out of sync. In some case, this actually happened,
and the commit fixes this. The commit also adds further bounds
assertions in alignment with pre/post conditions.

A slight nuisance is that the debug assertions cannot flatten
nested structures like polyvec for the bounds check, running
into issue diffblue/cbmc#8570. We
work around this by introducing a new `xxx_2d` (for 2-dimensional)
macro which takes two dimensions and uses a two-step array
access, circumventing the above CBMC issue.

Signed-off-by: Hanno Becker <[email protected]>
hanno-becker added a commit to pq-code-package/mlkem-native that referenced this issue Jan 17, 2025
Previously, runtime assertions via debug_assert_xxx and CBMC
assertions via cassert(...) were separate.

This commit modifies the implementation of the debug assertion
macros so that when CBMC is used, debug assertions are intepreted
as proof obligations.

This removes some redundancy and non-uniformity in the code,
and also reduces the likelihood that debug assertions and CBMC
contracts get out of sync. In some case, this actually happened,
and the commit fixes this. The commit also adds further bounds
assertions in alignment with pre/post conditions.

A slight nuisance is that the debug assertions cannot flatten
nested structures like polyvec for the bounds check, running
into issue diffblue/cbmc#8570. We
work around this by introducing a new `xxx_2d` (for 2-dimensional)
macro which takes two dimensions and uses a two-step array
access, circumventing the above CBMC issue.

Signed-off-by: Hanno Becker <[email protected]>
hanno-becker added a commit to pq-code-package/mlkem-native that referenced this issue Jan 17, 2025
Previously, runtime assertions via debug_assert_xxx and CBMC
assertions via cassert(...) were separate.

This commit modifies the implementation of the debug assertion
macros so that when CBMC is used, debug assertions are intepreted
as proof obligations.

This removes some redundancy and non-uniformity in the code,
and also reduces the likelihood that debug assertions and CBMC
contracts get out of sync. In some case, this actually happened,
and the commit fixes this. The commit also adds further bounds
assertions in alignment with pre/post conditions.

A slight nuisance is that the debug assertions cannot flatten
nested structures like polyvec for the bounds check, running
into issue diffblue/cbmc#8570. We
work around this by introducing a new `xxx_2d` (for 2-dimensional)
macro which takes two dimensions and uses a two-step array
access, circumventing the above CBMC issue.

Signed-off-by: Hanno Becker <[email protected]>
@hanno-becker
Copy link
Author

@tautschnig @remi-delmas-3000 Any fix in sight?

@rod-chapman
Copy link
Collaborator

What behaviour were you expecting? In Michael's final example, k is uninitialized, so any reference to k looks like UB to me.

Flattened out, x is pointing at 4 integers, so k could be anything in 0 .. 3. Let's try that:

int main()
{
  arrvec A;
  arrvec *x = &A;
  __CPROVER_assume(x->vec[1].data[0] < 42);
  unsigned k;
  k = 0;
  __CPROVER_assert(k != 2 || ((int *)x)[k] < 42, "");
  k = 1;
  __CPROVER_assert(k != 2 || ((int *)x)[k] < 42, "");
  k = 2;
  __CPROVER_assert(k != 2 || ((int *)x)[k] < 42, "");
  k = 3;
  __CPROVER_assert(k != 2 || ((int *)x)[k] < 42, "");
}

then with CBMC 6.4.1

% cbmc main.c       
CBMC version 6.4.1 (cbmc-6.4.1) 64-bit arm64 macos
Type-checking main
Generating GOTO Program
Adding CPROVER library (arm64)
Removal of function pointers and virtual functions
Generic Property Instrumentation
Starting Bounded Model Checking
Passing problem to propositional reduction
converting SSA
Running propositional reduction
SAT checker inconsistent: instance is UNSATISFIABLE

** Results:
main.c function main
[main.array_bounds.1] line 15 array.vec dynamic object upper bound in x->vec[(signed long int)1]: SUCCESS
[main.array_bounds.2] line 15 array.vec[].data dynamic object upper bound in x->vec[(signed long int)1].data[(signed long int)0]: SUCCESS
[main.pointer_dereference.1] line 15 dereference failure: dead object in x->vec: SUCCESS
[main.pointer_dereference.2] line 15 dereference failure: pointer outside object bounds in x->vec: SUCCESS
[main.assertion.1] line 18 assertion: SUCCESS
[main.pointer_dereference.3] line 18 dereference failure: dead object in ((signed int *)x)[(signed long int)k]: SUCCESS
[main.pointer_dereference.4] line 18 dereference failure: pointer outside object bounds in ((signed int *)x)[(signed long int)k]: SUCCESS
[main.assertion.2] line 20 assertion: SUCCESS
[main.pointer_dereference.5] line 20 dereference failure: dead object in ((signed int *)x)[(signed long int)k]: SUCCESS
[main.pointer_dereference.6] line 20 dereference failure: pointer outside object bounds in ((signed int *)x)[(signed long int)k]: SUCCESS
[main.assertion.3] line 22 assertion: SUCCESS
[main.pointer_dereference.7] line 22 dereference failure: dead object in ((signed int *)x)[(signed long int)k]: SUCCESS
[main.pointer_dereference.8] line 22 dereference failure: pointer outside object bounds in ((signed int *)x)[(signed long int)k]: SUCCESS
[main.assertion.4] line 24 assertion: SUCCESS
[main.pointer_dereference.9] line 24 dereference failure: dead object in ((signed int *)x)[(signed long int)k]: SUCCESS
[main.pointer_dereference.10] line 24 dereference failure: pointer outside object bounds in ((signed int *)x)[(signed long int)k]: SUCCESS

** 0 of 16 failed (1 iterations)
VERIFICATION SUCCESSFUL

which doesn't look right to me, since we've only assumed that one element is < 42. What did I miss?

@hanno-becker
Copy link
Author

@rod-chapman I don't follow, in 3 of 4 asserts, the k != 2 clause holds.

@tautschnig
Copy link
Collaborator

When evaluating ((int *)x)[k] < 42 the only possible value of k is 2. But I am not entirely convinced that ((int *)x)[<whatever>] is behaviour covered by the C standard. It would be ok for char * instead of int *.

That said: I know which piece in the CBMC code base is causing us to go wrong (?) here, but I am wondering whether we are mis-optimising there or the input code is the culprit.

@rod-chapman
Copy link
Collaborator

Oh... so we're inferring that "(k is uninitialized && not (k != 2)) infers k == 2".
Not sure that's wise.

@rod-chapman
Copy link
Collaborator

If your code evaluated "k != 2 || ((int *)x)[k] < 42" at runtime with k uninitialized, then it's UB, because the left hand side of the (short circuit) || operator has to be evaluated first, right?

@hanno-becker
Copy link
Author

hanno-becker commented Jan 30, 2025

But I am not entirely convinced that ((int *)x)[] is behaviour covered by the C standard. It would be ok for char * instead of int *.

@tautschnig I would assume the original example still fails with uint8_t -- could you check?

@hanno-becker
Copy link
Author

If your code evaluated "k != 2 || ((int *)x)[k] < 42" at runtime with k uninitialized, then it's UB, because the left hand side of the (short circuit) || operator has to be evaluated first, right?

I don't understand what you mean -- k is initialized.

@rod-chapman
Copy link
Collaborator

In Michael's example:

    unsigned k;
    __CPROVER_assert(k != 2 || ((int*)x)[k] < 42, "");

k looks uninitialized to me...

@hanno-becker
Copy link
Author

Ah, I was looking at your example. I think this isn't really related to the issue at hand, though -- you could add k = some_uninterpreted_function() and the question is unchanged.

@rod-chapman
Copy link
Collaborator

Disagree I think.

k = some_uninterpreted_function()

means that k is initialized to a legal value of type "unsigned", so evaluating its R-value is well-defined.

"K uninitialized" is UB. Remember there are computers out there with scalar types where there are arbitrary bit patterns that are not a valid value of whatever type you think it should be. Boolean is the worst offender, because the number of bits stored (typically 8) exceeds the number of bits of information in the type (1).

@tautschnig
Copy link
Collaborator

But I am not entirely convinced that ((int *)x)[] is behaviour covered by the C standard. It would be ok for char * instead of int *.

@tautschnig I would assume the original example still fails with uint8_t -- could you check?

The following variant passes just fine, but note that 11 is a value that depends on endianness:

typedef struct __attribute__((packed)) {
    int data[2];
} arr;

typedef struct __attribute__((packed)) {
    arr vec[2];
} arrvec;

int main() {
    arrvec A;
    arrvec *x = &A;
    __CPROVER_assume(x->vec[1].data[0] < 42);
    unsigned k;
    __CPROVER_assert(k != 11 || ((char*)x)[k] < 42, "");
}

@hanno-becker
Copy link
Author

@rod-chapman I'm not disagreeing that k is uninitialized in Michael's example. I mean that it's orthogonal to the issue at hand.

@hanno-becker
Copy link
Author

@tautschnig Thank you, this is interesting. We should look into the C standard, then, to figure out if the original example is valid to begin with.

@rod-chapman
Copy link
Collaborator

If a local variable like k is uninitialized, then you should reject the program before generating GOTO at all and just stop.

@hanno-becker
Copy link
Author

Yes but this has nothing to do with the question at hand.

@kroening
Copy link
Member

Continuing the side conversation on uninitialised local variables: a check that enforces initialisation is coming, with #8545

@kroening
Copy link
Member

To get back to the original question, the C standard does not guarantee the behaviour in the original example; this clearly relies on implementation-defined behaviour, and hence, may break depending on architecture and compiler.

@hanno-becker
Copy link
Author

hanno-becker commented Jan 30, 2025

@kroening Thanks for chiming in! For my education, what exactly is platform/compiler-specific in the original example? Is the layout of the structure not fully specified by the C standard (except for the layout of int itself, but that should not matter since we only access at int-granularity), at least with the given __packed__ attributes?

@rod-chapman
Copy link
Collaborator

I'm not sure. Is "packed" advisory or mandatory in C? (It's an "Advisory" pragma in Ada, so cannot actually be relied upon to do anything, so you should never use it!).

I don't see any strictly "implementation defined" behaviour in Michael's example. That has a very specific meaning in the RM, which is not the same as "unspecified" or "undefined" (which are far worse.)

@kroening
Copy link
Member

@kroening Thanks for chiming in! For my education, what exactly is platform/compiler-specific in the original example? Is the layout of the structure not fully specified by the C standard (except for the layout of int itself, but that should not matter since we only access at int-granularity), at least with the given __packed__ attributes?

The C standard does not say anything at all about packing structs (no, there is no "advisory" either). So yes, a compiler may well insert arbitrary padding between those fields.

The C standard does guarantee that there is no padding within multi-dimensional arrays, however. This could be a standard-endorsed way to achieve what you are trying to do.

@tautschnig
Copy link
Collaborator

The C standard doesn't provide for __packed__ (or any other way to enforce packing). So we're anyway outside C standard territory I guess.

Patch forthcoming, just trying to make sure it doesn't break any existing tests.

@hanno-becker
Copy link
Author

hanno-becker commented Jan 30, 2025

This specific issue cannot be about padding, though, because if padding was the issue, CBMC should reject it regardless of whether the access is through char or int; but in @tautschnig's example above it makes a difference.

typedef struct __attribute__((packed)) {
    int data[2];
} arr;

typedef struct __attribute__((packed)) {
    arr vec[2];
} arrvec;

int main() {
    arrvec A;
    arrvec *x = &A;
    __CPROVER_assume(x->vec[1].data[0] < 42);
    unsigned k;
   // OK
    __CPROVER_assert(k != 11 || ((char*)x)[k] < 42, "");
  // NOT OK
    __CPROVER_assert(k != 2 || ((int*)x)[k] < 42, "");
}

(@tautschnig you want 8 instead of 11 here on little-endian, right?)

So it is still not clear to me what it is that causes CBMC to reject the int-version, but not the char-one.

tautschnig added a commit to tautschnig/cbmc that referenced this issue Jan 30, 2025
Value-set based dereferencing must not take an access path through an
object that precludes a subsequent index expression from accessing a
different part of the object. Such a situation can arise when the value
set has a known (constant) offset for the pointer that would identify
one particular element in an array (within that object). The code using
value-set based dereferencing, however, may be trying to resolve a
subexpression of an index expression, where said index expression would
lead to a different element that may itself be part of a different array
within the same overall object.

Fixes: diffblue#8570
@tautschnig
Copy link
Collaborator

Proposed fix to be found in #8578.

@kroening
Copy link
Member

This specific issue cannot be about padding, though, because if padding was the issue, CBMC should reject it regardless of whether the access is through char or int; but in @tautschnig's example above it makes a difference.

CBMC won't catch cases where padding would have made a difference.

@hanno-becker
Copy link
Author

Ok, then CBMC should not reject the original example. Thanks @tautschnig for fixing this already!

tautschnig added a commit to tautschnig/cbmc that referenced this issue Jan 31, 2025
Value-set based dereferencing must not take an access path through an
object that precludes a subsequent index expression from accessing a
different part of the object. Such a situation can arise when the value
set has a known (constant) offset for the pointer that would identify
one particular element in an array (within that object). The code using
value-set based dereferencing, however, may be trying to resolve a
subexpression of an index expression, where said index expression would
lead to a different element that may itself be part of a different array
within the same overall object.

Fixes: diffblue#8570
tautschnig added a commit to tautschnig/cbmc that referenced this issue Jan 31, 2025
Value-set based dereferencing must not take an access path through an
object that precludes a subsequent index expression from accessing a
different part of the object. Such a situation can arise when the value
set has a known (constant) offset for the pointer that would identify
one particular element in an array (within that object). The code using
value-set based dereferencing, however, may be trying to resolve a
subexpression of an index expression, where said index expression would
lead to a different element that may itself be part of a different array
within the same overall object.

Fixes: diffblue#8570
tautschnig added a commit to tautschnig/cbmc that referenced this issue Feb 9, 2025
Value-set based dereferencing must not take an access path through an
object that precludes a subsequent index expression from accessing a
different part of the object. Such a situation can arise when the value
set has a known (constant) offset for the pointer that would identify
one particular element in an array (within that object). The code using
value-set based dereferencing, however, may be trying to resolve a
subexpression of an index expression, where said index expression would
lead to a different element that may itself be part of a different array
within the same overall object.

Fixes: diffblue#8570
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants