-
Notifications
You must be signed in to change notification settings - Fork 309
Pulling in changes from PR #3537 #3681
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pulling in changes from PR #3537 #3681
Conversation
Signed-off-by: Bettina Heim <[email protected]>
Signed-off-by: Bettina Heim <[email protected]>
Signed-off-by: Bettina Heim <[email protected]>
Signed-off-by: Bettina Heim <[email protected]>
Signed-off-by: Bettina Heim <[email protected]>
Signed-off-by: Bettina Heim <[email protected]>
Signed-off-by: Bettina Heim <[email protected]>
Signed-off-by: Bettina Heim <[email protected]>
Signed-off-by: Bettina Heim <[email protected]>
| """ | ||
| parts = y.split('.') | ||
| for i in range(len(parts), 0, -1): | ||
| for i in range(len(parts)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last part in the path should not be a python module. (It is because of a misspelling in our tests.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fixing an empty modName issue in https://github.com/NVIDIA/cuda-quantum/pull/3682/files
| def simple_list_bool(n: int, t: list[bool]) -> list[bool]: | ||
| qubits = cudaq.qvector(n) | ||
| return t | ||
| return t.copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the point of the copy() method changes here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes it explicit what the behavior is under the hood. We ("have to") make a couple of copies where python does not and passes by reference instead. To ensure that we don't have unexpected behaviors for users (code not behaving as python should), we force the copy that we have to make to be explicitly visible in the source code. More details in the description of #3537
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a user facing CUDA-Q design change that should have been gone through the architectural review process, I believe. @boschmitt @amccaskey
We should be converting local python bindings to independent variables. Making copies will just exaggerate the use of stack space and complicate compiler analyses needlessly.
In Python a local symbol is maintained in a dictionary as a reference to a value. If the value is a reference object, then this amounts to a double indirection. One cannot bind the same symbol to more than one value in the local scope. One can bind more than one symbol to the same reference object.
a = [1, 2]
b = a
a[0] = 4
assert b[0] == 4 # True!This functionality can be mimicked as classic variables without resorting to copies. Still, I'm not clear where the original Python front end proposal landed on this issue.
How to not use copies:
For value types, symbol binding and variables have a tight correspondence.
a = 4
b = acan be lowered as
%0 = alloca integer
%1 = alloca integer
store 4, %0
// symbol 'a' is %0
%2 = load %0
store %2, %1
// symbol 'b' is %1and the observable semantics are the same.
For reference types
a = [0, 1]
b = athis can be lowered as follows
%0 - alloca list<2 x integer>
// symbol a -> %0
store 0, %0[0]
store 1, %0[1]
// symbol b -> %0I believe this was the way CUDA-Q was operating.
This is a problem when it comes to runtime control flow issues.
a = [0, 1]
if b:
c = a
else:
c = [3, 4]
# c is not statically known to be [0,1] or [3, 4] hereWe can still solve this without making copies, and having natural looking python. This can be achieved by noting what was stated earlier: for a reference binding, the interpreter binds the symbol to a reference meaning we have a double indirection.
%0 - alloca list<2 x integer>
store 0, %0[0]
store 1, %0[1]
// %0 is the list object [0, 1]
%1 = alloca ptr<list<...>>
store %0, %1
// symbol 'a' is %1
%2 = alloca ptr<list<...>>
// symbol 'b' is %2 (and undefined)
%3 - alloca list<2 x integer>
store 3, %3[0]
store 4, %3[1]
// %3 is the list object [3, 4]
structured.if (%88) {
store %0, %2
} else {
store %3, %2
}
// %2 is correctly either a reference to [0, 1] or [3, 4] hereThe point being that all this can be done correctly by
- knowing the difference between value and reference types
- using a reference indirection correctly
- preserving the correspondence between symbol bindings in python to variables in the IR
- not changing or exposing properties of the implementation to the user-facing interface
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll explicitly call out here the deal with "scope" wrt Python as well.
We are not concerned with any scope inside the kernel other than symbols at local scope. Symbols referenced at any other scope are lambda lifted and passed by value (deep copy).
By the nature of Python's symbol binding process, all values are "free" on the heap (and reference counted, etc.) There is no lexical scope and no defined lifetime. At some point, the interpreter will just garbage collect unreferenced objects on the heap.
In our device execution domain, we don't have a heap and certainly do not have garbage collection. We instead have variables and place values into those variables (stack slots).
By virtue of this scopeless, vague lifetime execution model in our host language, Python, we can simply promote local symbols to variables that dominate everything in the kernel (place them in the entry block) and we can also promote objects of reference type in the same way (construct them as early as possible, which isn't the same as always the entry block, but it may often be the case).
If we follow the same rules as the C++ limitations, the storage for the reference objects must be known at compile-time, so it is possible to allocate that storage in the entry block, even if the initialization of its element values is done along various control-flow paths.
| def test_return_dataclass(n: int, t: MyClass) -> MyClass: | ||
| qubits = cudaq.qvector(n) | ||
| return t | ||
| return t.copy(deep=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a corresponding change in the bridge somewhere that necessitates these copies and deep copies?
By virtue of the fact this is a kernel running on a device, everything on the kernel side is logically a deep copy already. All device code is run in a different memory address space, logically speaking. So it's sort of weird to have device code making extra copies of its copies to return via copies to the Python interpreter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See response above.
|
I don't understand the addition of To get to the nuts and bolts of how this must work in a Python kernel:
@cudaq.kernel
def bug():
i = 4
i = ['now', 'a', 'list'] # ERROR!
i = 4
j = imeans the symbol In the more nebulous domain:
i = [4, 5, 6]
i = [8, 9] # ERROR!This case is binding the symbol It's worth having a conversation on this case... Do we have any code examples that do this? |
|
I'm going to cut the line and merge this so we can keep making progress. Just noting that there may be some things to revisit here as well. |
4598524
into
NVIDIA:features/python.redesign.0
It's kind of an annotation for the users sake indicating what is going on under the hood.
I need to come back to this with the change to flatten the symbol table. (Nothing "bad" happens as it is - the IR we generate is always valid and matches Python behavior - but we are not respecting Python scoping as written). We can be a bit more permissive than what you outline also when respecting Python scoping - we merely have to be strict when a variable is redefined in an inner scope (wrt MLIR scoping). I'll give it some more thought.
Yes.
The goal that I am aiming for is that either it behaves like Python, or we give an explicit and comprehensive error that this is not allowed. I do want to fully get rid of cases that compile and run but don't behave as python would.
The new assignment tests that are still missing contain a good number of tests along similar lines, though I may not have added one specifically related to lengths of lists |
Thanks. It's good that these issues are getting thought about for sure! See also #3681 (comment). I think more conversations on what this ought to look like, particularly as presented to the user, would be a good thing. |
Tests I ran locally:
I ran all tests in python/tests/, excluding backends, domain and remote;