Pulling in changes from PR #3537 #3681

bettinaheim · 2025-12-11T18:50:08Z

I haven't properly merged the mlir test changes in Python bridge revision #3537 with the changes on the feature branch
I have removed test_assignment (added in Python bridge revision #3537) from this PR (currently segfaults, not yet sure why)
I undid the flattening of the symbol table in the feature branch for now (to be done as a separate change)
There are a couple of FIXMEs related to code clean up in the ast_bridge.py and utils.py. I'll go and clean them up (before or after pulling this onto the feature branch it shouldn't matter)
One of the new tests in test_kernel_features.py is still commented out; it gives a comprehensive error on main but should work with the changes on the feature branch, but currently segfaults

Tests I ran locally:
I ran all tests in python/tests/, excluding backends, domain and remote;

I have failures for tests that return lists from kernels; these tests run on the main branch, but there is still a (comprehensive) error given that this is not supported on the feature branch. Indeed, if I comment that check out, I see that something is segfaulting. Are we maybe still missing some changes from main that yet need to be ported?
The only other failure is for test_internal_library_kernels (unknown function call); I haven't check yet if this one is passing on the feature branch or not
All other tests (that I ran) pass

Signed-off-by: Bettina Heim <[email protected]>

copy-pr-bot · 2025-12-11T18:50:11Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

python/tests/kernel/test_kernel_features.py

Signed-off-by: Bettina Heim <[email protected]>

schweitzpgi · 2025-12-11T23:32:54Z

python/cudaq/kernel/utils.py

    """
    parts = y.split('.')
-    for i in range(len(parts), 0, -1):
+    for i in range(len(parts)):


The last part in the path should not be a python module. (It is because of a misspelling in our tests.)

I'm fixing an empty modName issue in https://github.com/NVIDIA/cuda-quantum/pull/3682/files

schweitzpgi · 2025-12-11T23:39:43Z

python/tests/kernel/test_direct_call_return_kernel.py

    def simple_list_bool(n: int, t: list[bool]) -> list[bool]:
        qubits = cudaq.qvector(n)
-        return t
+        return t.copy()


What's the point of the copy() method changes here?

It makes it explicit what the behavior is under the hood. We ("have to") make a couple of copies where python does not and passes by reference instead. To ensure that we don't have unexpected behaviors for users (code not behaving as python should), we force the copy that we have to make to be explicitly visible in the source code. More details in the description of #3537

This is a user facing CUDA-Q design change that should have been gone through the architectural review process, I believe. @boschmitt @amccaskey

We should be converting local python bindings to independent variables. Making copies will just exaggerate the use of stack space and complicate compiler analyses needlessly.

In Python a local symbol is maintained in a dictionary as a reference to a value. If the value is a reference object, then this amounts to a double indirection. One cannot bind the same symbol to more than one value in the local scope. One can bind more than one symbol to the same reference object.

a = [1, 2] b = a a[0] = 4 assert b[0] == 4 # True!

This functionality can be mimicked as classic variables without resorting to copies. Still, I'm not clear where the original Python front end proposal landed on this issue.

How to not use copies:

For value types, symbol binding and variables have a tight correspondence.

a = 4 b = a

can be lowered as

%0 = alloca integer %1 = alloca integer store 4, %0 // symbol 'a' is %0 %2 = load %0 store %2, %1 // symbol 'b' is %1

and the observable semantics are the same.

For reference types

a = [0, 1] b = a

this can be lowered as follows

%0 - alloca list<2 x integer> // symbol a -> %0 store 0, %0[0] store 1, %0[1] // symbol b -> %0

I believe this was the way CUDA-Q was operating.

This is a problem when it comes to runtime control flow issues.

a = [0, 1] if b: c = a else: c = [3, 4] # c is not statically known to be [0,1] or [3, 4] here

We can still solve this without making copies, and having natural looking python. This can be achieved by noting what was stated earlier: for a reference binding, the interpreter binds the symbol to a reference meaning we have a double indirection.

%0 - alloca list<2 x integer> store 0, %0[0] store 1, %0[1] // %0 is the list object [0, 1] %1 = alloca ptr<list<...>> store %0, %1 // symbol 'a' is %1 %2 = alloca ptr<list<...>> // symbol 'b' is %2 (and undefined) %3 - alloca list<2 x integer> store 3, %3[0] store 4, %3[1] // %3 is the list object [3, 4] structured.if (%88) { store %0, %2 } else { store %3, %2 } // %2 is correctly either a reference to [0, 1] or [3, 4] here

The point being that all this can be done correctly by

knowing the difference between value and reference types

using a reference indirection correctly

preserving the correspondence between symbol bindings in python to variables in the IR

not changing or exposing properties of the implementation to the user-facing interface

I'll explicitly call out here the deal with "scope" wrt Python as well.

We are not concerned with any scope inside the kernel other than symbols at local scope. Symbols referenced at any other scope are lambda lifted and passed by value (deep copy).

By the nature of Python's symbol binding process, all values are "free" on the heap (and reference counted, etc.) There is no lexical scope and no defined lifetime. At some point, the interpreter will just garbage collect unreferenced objects on the heap.

In our device execution domain, we don't have a heap and certainly do not have garbage collection. We instead have variables and place values into those variables (stack slots).

By virtue of this scopeless, vague lifetime execution model in our host language, Python, we can simply promote local symbols to variables that dominate everything in the kernel (place them in the entry block) and we can also promote objects of reference type in the same way (construct them as early as possible, which isn't the same as always the entry block, but it may often be the case).

If we follow the same rules as the C++ limitations, the storage for the reference objects must be known at compile-time, so it is possible to allocate that storage in the entry block, even if the initialization of its element values is done along various control-flow paths.

schweitzpgi · 2025-12-11T23:45:05Z

python/tests/kernel/test_direct_call_return_kernel.py

    def test_return_dataclass(n: int, t: MyClass) -> MyClass:
        qubits = cudaq.qvector(n)
-        return t
+        return t.copy(deep=True)


Is there a corresponding change in the bridge somewhere that necessitates these copies and deep copies?

By virtue of the fact this is a kernel running on a device, everything on the kernel side is logically a deep copy already. All device code is run in a different memory address space, logically speaking. So it's sort of weird to have device code making extra copies of its copies to return via copies to the Python interpreter.

See response above.

python/tests/kernel/test_kernel_features.py

python/tests/visualization/test_draw.py

schweitzpgi · 2025-12-12T00:10:46Z

I don't understand the addition of .copy() method calls all over the tests. Is it just annotation? Or is it some sort of fix?

To get to the nuts and bolts of how this must work in a Python kernel:

Every local variable in a kernel must be strongly typed. It cannot have one type at one point and another type at another point in the kernel. If any such code is seen, it is a semantics error and a diagnostic must be raised.

@cudaq.kernel
def bug():
  i = 4
  i = ['now', 'a', 'list']   # ERROR!

There are no nested scopes. All local variables are global to the entire kernel.
All local variables must therefore be promoted to the entry block.
Each symbol name is a unique variable. In Python, this is achieved by garbage collecting unreferenced chafe.
In our kernel world, we accomplish the same thing by allowing a symbol (now a variable) to bind to exactly 1 strongly typed value.
All values are "garbage collected" when the kernel returns to the caller.
We do not simulate or use reference counting as it is not useful for our limited inventory of types and the restrictions of the device.
Assignments of value types should work exactly like Python.

  i = 4
  j = i

means the symbol i and the symbol j both are bound to the value 4 by copy (value semantics).

In the more nebulous domain:

Assignments of reference types should not work like Python to be consistent with C++. That is,

  i = [4, 5, 6]
  i = [8, 9]    # ERROR!

This case is binding the symbol i to both a "vector" of length 3 and a "vector" of length 2, which we don't support in C++ and treat as distinct types.

It's worth having a conversation on this case... Do we have any code examples that do this?

schweitzpgi · 2025-12-12T01:02:06Z

I'm going to cut the line and merge this so we can keep making progress. Just noting that there may be some things to revisit here as well.

bettinaheim · 2025-12-12T09:49:03Z

I don't understand the addition of .copy() method calls all over the tests. Is it just annotation? Or is it some sort of fix?

It's kind of an annotation for the users sake indicating what is going on under the hood.

To get to the nuts and bolts of how this must work in a Python kernel:

Every local variable in a kernel must be strongly typed. It cannot have one type at one point and another type at another point in the kernel. If any such code is seen, it is a semantics error and a diagnostic must be raised.
@cudaq.kernel
def bug():
  i = 4
  i = ['now', 'a', 'list']   # ERROR!
There are no nested scopes. All local variables are global to the entire kernel.

All local variables must therefore be promoted to the entry block.

I need to come back to this with the change to flatten the symbol table. (Nothing "bad" happens as it is - the IR we generate is always valid and matches Python behavior - but we are not respecting Python scoping as written). We can be a bit more permissive than what you outline also when respecting Python scoping - we merely have to be strict when a variable is redefined in an inner scope (wrt MLIR scoping). I'll give it some more thought.

Each symbol name is a unique variable. In Python, this is achieved by garbage collecting unreferenced chafe.

In our kernel world, we accomplish the same thing by allowing a symbol (now a variable) to bind to exactly 1 strongly typed value.

All values are "garbage collected" when the kernel returns to the caller.

We do not simulate or use reference counting as it is not useful for our limited inventory of types and the restrictions of the device.

Assignments of value types should work exactly like Python.
  i = 4
  j = i
means the symbol i and the symbol j both are bound to the value 4 by copy (value semantics).

Yes.

In the more nebulous domain:

Assignments of reference types should not work like Python to be consistent with C++. That is,

The goal that I am aiming for is that either it behaves like Python, or we give an explicit and comprehensive error that this is not allowed. I do want to fully get rid of cases that compile and run but don't behave as python would.

  i = [4, 5, 6]
  i = [8, 9]    # ERROR!
This case is binding the symbol i to both a "vector" of length 3 and a "vector" of length 2, which we don't support in C++ and treat as distinct types.

It's worth having a conversation on this case... Do we have any code examples that do this?

The new assignment tests that are still missing contain a good number of tests along similar lines, though I may not have added one specifically related to lengths of lists

schweitzpgi · 2025-12-12T17:16:50Z

I don't understand the addition of .copy() method calls all over the tests. Is it just annotation? Or is it some sort of fix?

It's kind of an annotation for the users sake indicating what is going on under the hood.
To get to the nuts and bolts of how this must work in a Python kernel:

Every local variable in a kernel must be strongly typed. It cannot have one type at one point and another type at another point in the kernel. If any such code is seen, it is a semantics error and a diagnostic must be raised.
@cudaq.kernel
def bug():
  i = 4
  i = ['now', 'a', 'list']   # ERROR!
There are no nested scopes. All local variables are global to the entire kernel.

All local variables must therefore be promoted to the entry block.
I need to come back to this with the change to flatten the symbol table. (Nothing "bad" happens as it is - the IR we generate is always valid and matches Python behavior - but we are not respecting Python scoping as written). We can be a bit more permissive than what you outline also when respecting Python scoping - we merely have to be strict when a variable is redefined in an inner scope (wrt MLIR scoping). I'll give it some more thought.
Each symbol name is a unique variable. In Python, this is achieved by garbage collecting unreferenced chafe.

In our kernel world, we accomplish the same thing by allowing a symbol (now a variable) to bind to exactly 1 strongly typed value.

All values are "garbage collected" when the kernel returns to the caller.

We do not simulate or use reference counting as it is not useful for our limited inventory of types and the restrictions of the device.

Assignments of value types should work exactly like Python.
  i = 4
  j = i
means the symbol i and the symbol j both are bound to the value 4 by copy (value semantics).
Yes.

In the more nebulous domain:

Assignments of reference types should not work like Python to be consistent with C++. That is,

The goal that I am aiming for is that either it behaves like Python, or we give an explicit and comprehensive error that this is not allowed. I do want to fully get rid of cases that compile and run but don't behave as python would.
  i = [4, 5, 6]
  i = [8, 9]    # ERROR!
This case is binding the symbol i to both a "vector" of length 3 and a "vector" of length 2, which we don't support in C++ and treat as distinct types.
It's worth having a conversation on this case... Do we have any code examples that do this?
The new assignment tests that are still missing contain a good number of tests along similar lines, though I may not have added one specifically related to lengths of lists

Thanks. It's good that these issues are getting thought about for sure!

See also #3681 (comment). I think more conversations on what this ought to look like, particularly as presented to the user, would be a good thing.

bettinaheim added 7 commits December 8, 2025 14:38

wip - merging in bridge and utils changes

14dae51

Signed-off-by: Bettina Heim <[email protected]>

kernel features tests pass - one tests disabled due to crash

9c52235

Signed-off-by: Bettina Heim <[email protected]>

grabbing more from PR 3537

28755bd

Signed-off-by: Bettina Heim <[email protected]>

Merge branch 'features/python.redesign.0' into python_feature

bd22e2a

addressing fixme in visit_name

1ab0f43

Signed-off-by: Bettina Heim <[email protected]>

fixing uccsd issue

91dce08

Signed-off-by: Bettina Heim <[email protected]>

Merge branch 'features/python.redesign.0' into python_feature

86bf42c

Signed-off-by: Bettina Heim <[email protected]>

bettinaheim commented Dec 11, 2025

View reviewed changes

python/tests/kernel/test_kernel_features.py Show resolved Hide resolved

bettinaheim added 3 commits December 11, 2025 18:58

removing assignment tests for now

865a996

Signed-off-by: Bettina Heim <[email protected]>

Merge branch 'features/python.redesign.0' into python_feature

29d7481

Signed-off-by: Bettina Heim <[email protected]>

formatting

395a3e5

Signed-off-by: Bettina Heim <[email protected]>

schweitzpgi reviewed Dec 11, 2025

View reviewed changes

python/tests/kernel/test_kernel_features.py Show resolved Hide resolved

schweitzpgi reviewed Dec 11, 2025

View reviewed changes

python/tests/visualization/test_draw.py Show resolved Hide resolved

schweitzpgi merged commit 4598524 into NVIDIA:features/python.redesign.0 Dec 12, 2025
7 of 8 checks passed

github-actions bot pushed a commit that referenced this pull request Dec 12, 2025

Cleaning up docs preview for PR #3681.

7e7dd1a

Pulling in changes from PR #3537 #3681

Pulling in changes from PR #3537 #3681

Uh oh!

Conversation

bettinaheim commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

copy-pr-bot bot commented Dec 11, 2025

Uh oh!

Uh oh!

schweitzpgi Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

1tnguyen Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

schweitzpgi Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

bettinaheim Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

schweitzpgi Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

schweitzpgi Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

schweitzpgi Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

bettinaheim Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

schweitzpgi commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

schweitzpgi commented Dec 12, 2025

Uh oh!

Uh oh!

bettinaheim commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

schweitzpgi commented Dec 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bettinaheim commented Dec 11, 2025 •

edited

Loading

schweitzpgi Dec 12, 2025 •

edited

Loading

schweitzpgi Dec 12, 2025 •

edited

Loading

schweitzpgi commented Dec 12, 2025 •

edited

Loading

bettinaheim commented Dec 12, 2025 •

edited

Loading