Skip to content

Call zeInitDrivers in L0 provider #1101

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Conversation

igchor
Copy link
Member

@igchor igchor commented Feb 14, 2025

@igchor igchor requested a review from a team as a code owner February 14, 2025 19:53
@igchor
Copy link
Member Author

igchor commented Feb 14, 2025

The tests are failing on CI most likely because we have an older L0 loader version installed and zeInitDrivers might not work. I'm not sure if we need to support older version so perhas we can just update the runners?

If we do need to support older version, we will probably need to implement a fallback to zeInit (which is supported by older loaders but deprecated now).

src/libumf.c Outdated
Comment on lines 66 to 69
fini_ze_global_state();
fini_cu_global_state();
LOG_DEBUG("UMF library finalized");

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need it earlier?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted, I need that to be able to free memory in fini_ze_global_state, but we don't actually need to keep the driver handles so it's not necessary.

@@ -267,6 +297,15 @@ static void init_ze_global_state(void) {
return;
}
ze_lib_handle = lib_handle;

umf_result_t result = ze_init_drivers();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, I am thinking about who is responsible for calling zeInitDrivers.
I suppose that the client who uses UMF and Level Zero is responsible for that. At the moment, the client calls UMF to create Level Zero it should already create the context and get the device, so the driver should be initialized.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @vinser52

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, UMF needs to call init function, even if the client called it already.

@vinser52
Copy link
Contributor

The tests are failing on CI most likely because we have an older L0 loader version installed and zeInitDrivers might not work. I'm not sure if we need to support older version so perhas we can just update the runners?

If we do need to support older version, we will probably need to implement a fallback to zeInit (which is supported by older loaders but deprecated now).

So, I suppose it should be the client's responsibility to call zeInit/zeInitDrivers, as we have today in the tests.

@igchor igchor force-pushed the call_ze_init branch 2 times, most recently from 1d741ab to b302607 Compare February 18, 2025 22:31
ze_init_driver_type_desc_t desc = {
.stype = ZE_STRUCTURE_TYPE_INIT_DRIVER_TYPE_DESC,
.pNext = NULL,
.flags = UINT32_MAX};

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want only GPU Drivers, you need to set this to _GPU otherwise you will get back GPUs and NPUs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

According to the L0 spec, zeInitDrivers must be called
(by every library) before calling any other APIs.

Not calling zeInitDrivers causes crash when using
statically linked L0 loader in UR.
--proxy
--umf-version ${{env.UMF_VERSION}}
--shared-library
# macos-build:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is MacOS CI disabled?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should not be disabled here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

--proxy
--umf-version ${{env.UMF_VERSION}}
--shared-library
# macos-build:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should not be disabled here.

ze_init_driver_type_desc_t desc = {
.stype = ZE_STRUCTURE_TYPE_INIT_DRIVER_TYPE_DESC,
.pNext = NULL,
.flags = ZE_INIT_DRIVER_TYPE_FLAG_GPU};
Copy link
Contributor

@vinser52 vinser52 Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why only GPUs?
That is why I was against calling zeInit or zeInitDrivers from UMF's code. It is the client's responsibility to initialize L0 properly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also against initializing Level Zero in UMF. Could we avoid that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't call zeInit/zeInitDrivers, the entire provider won't work. As for what specifically we are initializing - GPU is currently the only thing that makes sense from the UMF perspective. If there will be some other device types in the future that we want to support we can extend params to allow user to specify that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not know how user can create level zero provider without zeInit on their side. They need to provoide context/device handle, which need level zero inited?

Copy link
Contributor

@vinser52 vinser52 Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for what @lplewa wrote.

@igchor are there any issues when adapter statically linked with libze_loader? If so, can we have a simple test/reproducer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, without zeInitDrivers/zeInit calling any L0 function results in a crash. I can prepare some simple reproducer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants