Skip to content

Conversation

@bettinaheim
Copy link
Collaborator

@bettinaheim bettinaheim commented Dec 11, 2025

  • I haven't properly merged the mlir test changes in Python bridge revision #3537 with the changes on the feature branch
  • I have removed test_assignment (added in Python bridge revision #3537) from this PR (currently segfaults, not yet sure why)
  • I undid the flattening of the symbol table in the feature branch for now (to be done as a separate change)
  • There are a couple of FIXMEs related to code clean up in the ast_bridge.py and utils.py. I'll go and clean them up (before or after pulling this onto the feature branch it shouldn't matter)
  • One of the new tests in test_kernel_features.py is still commented out; it gives a comprehensive error on main but should work with the changes on the feature branch, but currently segfaults

Tests I ran locally:
I ran all tests in python/tests/, excluding backends, domain and remote;

  • I have failures for tests that return lists from kernels; these tests run on the main branch, but there is still a (comprehensive) error given that this is not supported on the feature branch. Indeed, if I comment that check out, I see that something is segfaulting. Are we maybe still missing some changes from main that yet need to be ported?
  • The only other failure is for test_internal_library_kernels (unknown function call); I haven't check yet if this one is passing on the feature branch or not
  • All other tests (that I ran) pass

@copy-pr-bot
Copy link

copy-pr-bot bot commented Dec 11, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

"""
parts = y.split('.')
for i in range(len(parts), 0, -1):
for i in range(len(parts)):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last part in the path should not be a python module. (It is because of a misspelling in our tests.)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fixing an empty modName issue in https://github.com/NVIDIA/cuda-quantum/pull/3682/files

def simple_list_bool(n: int, t: list[bool]) -> list[bool]:
qubits = cudaq.qvector(n)
return t
return t.copy()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of the copy() method changes here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes it explicit what the behavior is under the hood. We ("have to") make a couple of copies where python does not and passes by reference instead. To ensure that we don't have unexpected behaviors for users (code not behaving as python should), we force the copy that we have to make to be explicitly visible in the source code. More details in the description of #3537

Copy link
Collaborator

@schweitzpgi schweitzpgi Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a user facing CUDA-Q design change that should have been gone through the architectural review process, I believe. @boschmitt @amccaskey

We should be converting local python bindings to independent variables. Making copies will just exaggerate the use of stack space and complicate compiler analyses needlessly.

In Python a local symbol is maintained in a dictionary as a reference to a value. If the value is a reference object, then this amounts to a double indirection. One cannot bind the same symbol to more than one value in the local scope. One can bind more than one symbol to the same reference object.

   a = [1, 2]
   b = a
   a[0] = 4
   assert b[0] == 4  # True!

This functionality can be mimicked as classic variables without resorting to copies. Still, I'm not clear where the original Python front end proposal landed on this issue.

How to not use copies:

For value types, symbol binding and variables have a tight correspondence.

  a = 4
  b = a

can be lowered as

  %0 = alloca integer
  %1 = alloca integer
  store 4, %0
  // symbol 'a' is %0
  %2 = load %0
  store %2, %1
  // symbol 'b' is %1

and the observable semantics are the same.

For reference types

  a = [0, 1]
  b = a

this can be lowered as follows

  %0 - alloca list<2 x integer>
  // symbol a -> %0
  store 0, %0[0]
  store 1, %0[1]
  // symbol b -> %0

I believe this was the way CUDA-Q was operating.

This is a problem when it comes to runtime control flow issues.

  a = [0, 1]
  if b:
    c = a
  else:
    c = [3, 4]
  # c is not statically known to be [0,1] or [3, 4] here

We can still solve this without making copies, and having natural looking python. This can be achieved by noting what was stated earlier: for a reference binding, the interpreter binds the symbol to a reference meaning we have a double indirection.

  %0 - alloca list<2 x integer>
  store 0, %0[0]
  store 1, %0[1]
  // %0 is the list object [0, 1]
  %1 = alloca ptr<list<...>>
  store %0, %1
  // symbol 'a' is %1
  %2 = alloca ptr<list<...>>
  // symbol 'b' is %2 (and undefined)
  %3 - alloca list<2 x integer>
  store 3, %3[0]
  store 4, %3[1]
  // %3 is the list object [3, 4]
  structured.if (%88) {
    store %0, %2
  } else {
    store %3, %2
  }
  // %2 is correctly either a reference to [0, 1] or [3, 4] here

The point being that all this can be done correctly by

  • knowing the difference between value and reference types
  • using a reference indirection correctly
  • preserving the correspondence between symbol bindings in python to variables in the IR
  • not changing or exposing properties of the implementation to the user-facing interface

Copy link
Collaborator

@schweitzpgi schweitzpgi Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll explicitly call out here the deal with "scope" wrt Python as well.

We are not concerned with any scope inside the kernel other than symbols at local scope. Symbols referenced at any other scope are lambda lifted and passed by value (deep copy).

By the nature of Python's symbol binding process, all values are "free" on the heap (and reference counted, etc.) There is no lexical scope and no defined lifetime. At some point, the interpreter will just garbage collect unreferenced objects on the heap.

In our device execution domain, we don't have a heap and certainly do not have garbage collection. We instead have variables and place values into those variables (stack slots).

By virtue of this scopeless, vague lifetime execution model in our host language, Python, we can simply promote local symbols to variables that dominate everything in the kernel (place them in the entry block) and we can also promote objects of reference type in the same way (construct them as early as possible, which isn't the same as always the entry block, but it may often be the case).

If we follow the same rules as the C++ limitations, the storage for the reference objects must be known at compile-time, so it is possible to allocate that storage in the entry block, even if the initialization of its element values is done along various control-flow paths.

def test_return_dataclass(n: int, t: MyClass) -> MyClass:
qubits = cudaq.qvector(n)
return t
return t.copy(deep=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a corresponding change in the bridge somewhere that necessitates these copies and deep copies?

By virtue of the fact this is a kernel running on a device, everything on the kernel side is logically a deep copy already. All device code is run in a different memory address space, logically speaking. So it's sort of weird to have device code making extra copies of its copies to return via copies to the Python interpreter.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See response above.

@schweitzpgi
Copy link
Collaborator

schweitzpgi commented Dec 12, 2025

I don't understand the addition of .copy() method calls all over the tests. Is it just annotation? Or is it some sort of fix?

To get to the nuts and bolts of how this must work in a Python kernel:

  • Every local variable in a kernel must be strongly typed. It cannot have one type at one point and another type at another point in the kernel. If any such code is seen, it is a semantics error and a diagnostic must be raised.
@cudaq.kernel
def bug():
  i = 4
  i = ['now', 'a', 'list']   # ERROR!
  • There are no nested scopes. All local variables are global to the entire kernel.
  • All local variables must therefore be promoted to the entry block.
  • Each symbol name is a unique variable. In Python, this is achieved by garbage collecting unreferenced chafe.
  • In our kernel world, we accomplish the same thing by allowing a symbol (now a variable) to bind to exactly 1 strongly typed value.
  • All values are "garbage collected" when the kernel returns to the caller.
  • We do not simulate or use reference counting as it is not useful for our limited inventory of types and the restrictions of the device.
  • Assignments of value types should work exactly like Python.
  i = 4
  j = i

means the symbol i and the symbol j both are bound to the value 4 by copy (value semantics).

In the more nebulous domain:

  • Assignments of reference types should not work like Python to be consistent with C++. That is,
  i = [4, 5, 6]
  i = [8, 9]    # ERROR!

This case is binding the symbol i to both a "vector" of length 3 and a "vector" of length 2, which we don't support in C++ and treat as distinct types.

It's worth having a conversation on this case... Do we have any code examples that do this?

@schweitzpgi
Copy link
Collaborator

I'm going to cut the line and merge this so we can keep making progress. Just noting that there may be some things to revisit here as well.

@schweitzpgi schweitzpgi merged commit 4598524 into NVIDIA:features/python.redesign.0 Dec 12, 2025
7 of 8 checks passed
github-actions bot pushed a commit that referenced this pull request Dec 12, 2025
@bettinaheim
Copy link
Collaborator Author

bettinaheim commented Dec 12, 2025

I don't understand the addition of .copy() method calls all over the tests. Is it just annotation? Or is it some sort of fix?

It's kind of an annotation for the users sake indicating what is going on under the hood.

To get to the nuts and bolts of how this must work in a Python kernel:

  • Every local variable in a kernel must be strongly typed. It cannot have one type at one point and another type at another point in the kernel. If any such code is seen, it is a semantics error and a diagnostic must be raised.
@cudaq.kernel
def bug():
  i = 4
  i = ['now', 'a', 'list']   # ERROR!
  • There are no nested scopes. All local variables are global to the entire kernel.
  • All local variables must therefore be promoted to the entry block.

I need to come back to this with the change to flatten the symbol table. (Nothing "bad" happens as it is - the IR we generate is always valid and matches Python behavior - but we are not respecting Python scoping as written). We can be a bit more permissive than what you outline also when respecting Python scoping - we merely have to be strict when a variable is redefined in an inner scope (wrt MLIR scoping). I'll give it some more thought.

  • Each symbol name is a unique variable. In Python, this is achieved by garbage collecting unreferenced chafe.
  • In our kernel world, we accomplish the same thing by allowing a symbol (now a variable) to bind to exactly 1 strongly typed value.
  • All values are "garbage collected" when the kernel returns to the caller.
  • We do not simulate or use reference counting as it is not useful for our limited inventory of types and the restrictions of the device.
  • Assignments of value types should work exactly like Python.
  i = 4
  j = i

means the symbol i and the symbol j both are bound to the value 4 by copy (value semantics).

Yes.

In the more nebulous domain:

  • Assignments of reference types should not work like Python to be consistent with C++. That is,

The goal that I am aiming for is that either it behaves like Python, or we give an explicit and comprehensive error that this is not allowed. I do want to fully get rid of cases that compile and run but don't behave as python would.

  i = [4, 5, 6]
  i = [8, 9]    # ERROR!

This case is binding the symbol i to both a "vector" of length 3 and a "vector" of length 2, which we don't support in C++ and treat as distinct types.

It's worth having a conversation on this case... Do we have any code examples that do this?

The new assignment tests that are still missing contain a good number of tests along similar lines, though I may not have added one specifically related to lengths of lists

@schweitzpgi
Copy link
Collaborator

I don't understand the addition of .copy() method calls all over the tests. Is it just annotation? Or is it some sort of fix?

It's kind of an annotation for the users sake indicating what is going on under the hood.

To get to the nuts and bolts of how this must work in a Python kernel:

  • Every local variable in a kernel must be strongly typed. It cannot have one type at one point and another type at another point in the kernel. If any such code is seen, it is a semantics error and a diagnostic must be raised.
@cudaq.kernel
def bug():
  i = 4
  i = ['now', 'a', 'list']   # ERROR!
  • There are no nested scopes. All local variables are global to the entire kernel.
  • All local variables must therefore be promoted to the entry block.

I need to come back to this with the change to flatten the symbol table. (Nothing "bad" happens as it is - the IR we generate is always valid and matches Python behavior - but we are not respecting Python scoping as written). We can be a bit more permissive than what you outline also when respecting Python scoping - we merely have to be strict when a variable is redefined in an inner scope (wrt MLIR scoping). I'll give it some more thought.

  • Each symbol name is a unique variable. In Python, this is achieved by garbage collecting unreferenced chafe.
  • In our kernel world, we accomplish the same thing by allowing a symbol (now a variable) to bind to exactly 1 strongly typed value.
  • All values are "garbage collected" when the kernel returns to the caller.
  • We do not simulate or use reference counting as it is not useful for our limited inventory of types and the restrictions of the device.
  • Assignments of value types should work exactly like Python.
  i = 4
  j = i

means the symbol i and the symbol j both are bound to the value 4 by copy (value semantics).

Yes.

In the more nebulous domain:

  • Assignments of reference types should not work like Python to be consistent with C++. That is,

The goal that I am aiming for is that either it behaves like Python, or we give an explicit and comprehensive error that this is not allowed. I do want to fully get rid of cases that compile and run but don't behave as python would.

  i = [4, 5, 6]
  i = [8, 9]    # ERROR!

This case is binding the symbol i to both a "vector" of length 3 and a "vector" of length 2, which we don't support in C++ and treat as distinct types.
It's worth having a conversation on this case... Do we have any code examples that do this?

The new assignment tests that are still missing contain a good number of tests along similar lines, though I may not have added one specifically related to lengths of lists

Thanks. It's good that these issues are getting thought about for sure!

See also #3681 (comment). I think more conversations on what this ought to look like, particularly as presented to the user, would be a good thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants