-
Notifications
You must be signed in to change notification settings - Fork 37
Call zeInitDrivers in L0 provider #1101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The tests are failing on CI most likely because we have an older L0 loader version installed and zeInitDrivers might not work. I'm not sure if we need to support older version so perhas we can just update the runners? If we do need to support older version, we will probably need to implement a fallback to zeInit (which is supported by older loaders but deprecated now). |
src/libumf.c
Outdated
fini_ze_global_state(); | ||
fini_cu_global_state(); | ||
LOG_DEBUG("UMF library finalized"); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need it earlier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted, I need that to be able to free memory in fini_ze_global_state, but we don't actually need to keep the driver handles so it's not necessary.
src/provider/provider_level_zero.c
Outdated
@@ -267,6 +297,15 @@ static void init_ze_global_state(void) { | |||
return; | |||
} | |||
ze_lib_handle = lib_handle; | |||
|
|||
umf_result_t result = ze_init_drivers(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, I am thinking about who is responsible for calling zeInitDrivers
.
I suppose that the client who uses UMF and Level Zero is responsible for that. At the moment, the client calls UMF to create Level Zero it should already create the context and get the device, so the driver should be initialized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @vinser52
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, UMF needs to call init function, even if the client called it already.
So, I suppose it should be the client's responsibility to call |
1d741ab
to
b302607
Compare
src/provider/provider_level_zero.c
Outdated
ze_init_driver_type_desc_t desc = { | ||
.stype = ZE_STRUCTURE_TYPE_INIT_DRIVER_TYPE_DESC, | ||
.pNext = NULL, | ||
.flags = UINT32_MAX}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want only GPU Drivers, you need to set this to _GPU otherwise you will get back GPUs and NPUs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
According to the L0 spec, zeInitDrivers must be called (by every library) before calling any other APIs. Not calling zeInitDrivers causes crash when using statically linked L0 loader in UR.
.github/workflows/reusable_basic.yml
Outdated
--proxy | ||
--umf-version ${{env.UMF_VERSION}} | ||
--shared-library | ||
# macos-build: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is MacOS CI disabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should not be disabled here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
.github/workflows/reusable_basic.yml
Outdated
--proxy | ||
--umf-version ${{env.UMF_VERSION}} | ||
--shared-library | ||
# macos-build: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should not be disabled here.
ze_init_driver_type_desc_t desc = { | ||
.stype = ZE_STRUCTURE_TYPE_INIT_DRIVER_TYPE_DESC, | ||
.pNext = NULL, | ||
.flags = ZE_INIT_DRIVER_TYPE_FLAG_GPU}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why only GPUs?
That is why I was against calling zeInit
or zeInitDrivers
from UMF's code. It is the client's responsibility to initialize L0 properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also against initializing Level Zero in UMF. Could we avoid that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't call zeInit/zeInitDrivers, the entire provider won't work. As for what specifically we are initializing - GPU is currently the only thing that makes sense from the UMF perspective. If there will be some other device types in the future that we want to support we can extend params to allow user to specify that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not know how user can create level zero provider without zeInit on their side. They need to provoide context/device handle, which need level zero inited?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, without zeInitDrivers/zeInit calling any L0 function results in a crash. I can prepare some simple reproducer.
This PR is based on: https://github.com/oneapi-src/unified-memory-framework/pull/1086/files
And fixes problems in oneapi-src/unified-runtime#2667