-
Notifications
You must be signed in to change notification settings - Fork 78
NVD_GPU environment variable does not work on dual Nvidia GPU system #413
Description
Hi,
my system has 2 Nvidia GPUs:
$ nvidia-smi -L
GPU 0: NVIDIA P104-100 (UUID: GPU-fb525077-4b23-625f-1edb-e1e6c559cce7)
GPU 1: NVIDIA GeForce GT 1030 (UUID: GPU-fd84cda2-8a73-aa66-4106-9504ad09cc6f)
The P104 does not have HW decoder (and video outputs), so the only monitor is connected to the 1030.
I've installed the nvidia-vaapi-driver version 0.0.14 according to the manual, and vainfo succeeds, but lists 0 entrypoints, even if pointed to the correct GPU by using NVD_GPU:
$ NVD_LOG=1 vainfo
libva info: VA-API version 1.20.0
libva info: User environment variable requested driver 'nvidia'
libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/nvidia_drv_video.so
libva info: Found init function __vaDriverInit_1_0
5582.086991989 [17151-17151] ../src/vabackend.c:2260 __vaDriverInit_1_0 Initialising NVIDIA VA-API Driver
5582.087009673 [17151-17151] ../src/vabackend.c:2268 __vaDriverInit_1_0 Got DRM FD: 1 4
5582.087020659 [17151-17151] ../src/vabackend.c:2280 __vaDriverInit_1_0 Now have 0 (0 max) instances
5582.087024914 [17151-17151] ../src/vabackend.c:2307 __vaDriverInit_1_0 Selecting Direct backend
5582.123241262 [17151-17151] ../src/direct/nv-driver.c: 305 init_nvdriver Initing nvdriver...
5582.123290800 [17151-17151] ../src/direct/nv-driver.c: 323 init_nvdriver NVIDIA kernel driver version: 575.57.08, major version: 575, minor version: 57
5582.123298413 [17151-17151] ../src/direct/nv-driver.c: 330 init_nvdriver Got dev info: 200 1 0 fe
libva info: va_openDriver() returns 0
vainfo: VA-API version: 1.20 (libva 2.12.0)
vainfo: Driver version: VA-API NVDEC driver [direct backend]
vainfo: Supported profile and entrypoints
5582.265742460 [17151-17151] ../src/vabackend.c:2168 nvTerminate Terminating 0x63929e8c3070
5582.265826639 [17151-17151] ../src/vabackend.c:2182 nvTerminate Now have 0 (0 max) instances
$ NVD_LOG=1 NVD_GPU=1 vainfo
libva info: VA-API version 1.20.0
libva info: User environment variable requested driver 'nvidia'
libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/nvidia_drv_video.so
libva info: Found init function __vaDriverInit_1_0
5611.513418768 [17168-17168] ../src/vabackend.c:2260 __vaDriverInit_1_0 Initialising NVIDIA VA-API Driver
5611.513429831 [17168-17168] ../src/vabackend.c:2268 __vaDriverInit_1_0 Got DRM FD: 1 -1
5611.513434021 [17168-17168] ../src/vabackend.c:2280 __vaDriverInit_1_0 Now have 0 (0 max) instances
5611.513438163 [17168-17168] ../src/vabackend.c:2307 __vaDriverInit_1_0 Selecting Direct backend
5611.547755089 [17168-17168] ../src/direct/direct-export-buf.c: 107 direct_initExporter Searching for GPU: 0 1 128
5611.547792056 [17168-17168] ../src/direct/direct-export-buf.c: 107 direct_initExporter Searching for GPU: 1 1 129
5611.547802843 [17168-17168] ../src/direct/direct-export-buf.c: 129 direct_initExporter Found NVIDIA GPU 1 at /dev/dri/renderD129
5611.547807709 [17168-17168] ../src/direct/nv-driver.c: 305 init_nvdriver Initing nvdriver...
5611.547833969 [17168-17168] ../src/direct/nv-driver.c: 323 init_nvdriver NVIDIA kernel driver version: 575.57.08, major version: 575, minor version: 57
5611.547840756 [17168-17168] ../src/direct/nv-driver.c: 330 init_nvdriver Got dev info: 200 1 0 fe
libva info: va_openDriver() returns 0
vainfo: VA-API version: 1.20 (libva 2.12.0)
vainfo: Driver version: VA-API NVDEC driver [direct backend]
vainfo: Supported profile and entrypoints
5611.687465645 [17168-17168] ../src/vabackend.c:2168 nvTerminate Terminating 0x5ed447b4e070
5611.687560134 [17168-17168] ../src/vabackend.c:2182 nvTerminate Now have 0 (0 max) instances
I've found a place in nv-driver.c that looks definitely wrong cause nv_rm_control is declared to return bool:
const int ret = nv_rm_control(context->nvctlFd, context->clientObject, context->clientObject, NV0000_CTRL_CMD_GPU_GET_UUID_FROM_GPU_ID, 0, sizeof(uuidParams), &uuidParams);
if (ret) {
return false;
}After fixing it like that:
const bool ret = nv_rm_control(context->nvctlFd, context->clientObject, context->clientObject, NV0000_CTRL_CMD_GPU_GET_UUID_FROM_GPU_ID, 0, sizeof(uuidParams), &uuidParams);
if (!ret) {
return false;
}the driver started to be able to list the entrypoints even without setting NVD_GPU:
$ NVD_LOG=1 vainfo
libva info: VA-API version 1.20.0
libva info: User environment variable requested driver 'nvidia'
libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/nvidia_drv_video.so
libva info: Found init function __vaDriverInit_1_0
5933.271500838 [18020-18020] ../src/vabackend.c:2260 __vaDriverInit_1_0 Initialising NVIDIA VA-API Driver
5933.271515169 [18020-18020] ../src/vabackend.c:2268 __vaDriverInit_1_0 Got DRM FD: 1 4
5933.271528650 [18020-18020] ../src/vabackend.c:2280 __vaDriverInit_1_0 Now have 0 (0 max) instances
5933.271535499 [18020-18020] ../src/vabackend.c:2307 __vaDriverInit_1_0 Selecting Direct backend
5933.312221875 [18020-18020] ../src/direct/nv-driver.c: 305 init_nvdriver Initing nvdriver...
5933.312270417 [18020-18020] ../src/direct/nv-driver.c: 323 init_nvdriver NVIDIA kernel driver version: 575.57.08, major version: 575, minor version: 57
5933.312278119 [18020-18020] ../src/direct/nv-driver.c: 330 init_nvdriver Got dev info: 200 1 0 fe
libva info: va_openDriver() returns 0
vainfo: VA-API version: 1.20 (libva 2.12.0)
vainfo: Driver version: VA-API NVDEC driver [direct backend]
vainfo: Supported profile and entrypoints
VAProfileMPEG2Simple : VAEntrypointVLD
VAProfileMPEG2Main : VAEntrypointVLD
VAProfileVC1Simple : VAEntrypointVLD
VAProfileVC1Main : VAEntrypointVLD
VAProfileVC1Advanced : VAEntrypointVLD
VAProfileH264Main : VAEntrypointVLD
VAProfileH264High : VAEntrypointVLD
VAProfileH264ConstrainedBaseline: VAEntrypointVLD
VAProfileHEVCMain : VAEntrypointVLD
VAProfileVP9Profile0 : VAEntrypointVLD
VAProfileHEVCMain10 : VAEntrypointVLD
VAProfileHEVCMain12 : VAEntrypointVLD
VAProfileVP9Profile2 : VAEntrypointVLD
5933.441319854 [18020-18020] ../src/vabackend.c:2168 nvTerminate Terminating 0x59d10a544070
5933.441411411 [18020-18020] ../src/vabackend.c:2182 nvTerminate Now have 0 (0 max) instances
Now, when I try to play a video in Firefox, it correctly lists the HW decoding capabilities of the GPU on about:support page, but gives the "invalid device ordinal' (101)" error in the log when I try to play a video:
$ MOZ_DISABLE_RDD_SANDBOX=1 NVD_LOG=1 firefox
libva info: VA-API version 1.20.0
libva info: User environment variable requested driver 'nvidia'
libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/nvidia_drv_video.so
libva info: Found init function __vaDriverInit_1_0
6093.560121910 [18503-18533] ../src/vabackend.c:2260 __vaDriverInit_1_0 Initialising NVIDIA VA-API Driver
6093.560130921 [18503-18533] ../src/vabackend.c:2268 __vaDriverInit_1_0 Got DRM FD: 1 26
6093.560142661 [18503-18533] ../src/vabackend.c:2280 __vaDriverInit_1_0 Now have 0 (0 max) instances
6093.560148185 [18503-18533] ../src/vabackend.c:2307 __vaDriverInit_1_0 Selecting Direct backend
6093.626611700 [18503-18533] ../src/direct/nv-driver.c: 305 init_nvdriver Initing nvdriver...
6093.626688161 [18503-18533] ../src/direct/nv-driver.c: 323 init_nvdriver NVIDIA kernel driver version: 575.57.08, major version: 575, minor version: 57
6093.626705166 [18503-18533] ../src/direct/nv-driver.c: 330 init_nvdriver Got dev info: 200 1 0 fe
libva info: va_openDriver() returns 0
6093.848971320 [18503-18841] ../src/vabackend.c: 988 nvCreateSurfaces2 Creating surface 1920x1088, format 1 (0x7715dc4e9060)
6093.849013008 [18503-18841] ../src/vabackend.c: 988 nvCreateSurfaces2 Creating surface 1920x1088, format 1 (0x7715dc4e83e0)
6093.849027056 [18503-18841] ../src/vabackend.c: 988 nvCreateSurfaces2 Creating surface 1920x1088, format 1 (0x7715dc4e8fc0)
6093.849038518 [18503-18841] ../src/vabackend.c: 988 nvCreateSurfaces2 Creating surface 1920x1088, format 1 (0x7715dc4e8ca0)
6093.849048740 [18503-18841] ../src/vabackend.c: 988 nvCreateSurfaces2 Creating surface 1920x1088, format 1 (0x7715dc4e80c0)
6093.849058929 [18503-18841] ../src/vabackend.c: 988 nvCreateSurfaces2 Creating surface 1920x1088, format 1 (0x7715dc4e8a20)
6093.849069940 [18503-18841] ../src/vabackend.c: 988 nvCreateSurfaces2 Creating surface 1920x1088, format 1 (0x7715dc4e91a0)
6093.849078532 [18503-18841] ../src/vabackend.c: 988 nvCreateSurfaces2 Creating surface 1920x1088, format 1 (0x7715dc4e8160)
6093.849086987 [18503-18841] ../src/vabackend.c: 988 nvCreateSurfaces2 Creating surface 1920x1088, format 1 (0x7715dc4e9240)
6093.849096000 [18503-18841] ../src/vabackend.c: 988 nvCreateSurfaces2 Creating surface 1920x1088, format 1 (0x7715dc4e8de0)
6093.849104633 [18503-18841] ../src/vabackend.c: 988 nvCreateSurfaces2 Creating surface 1920x1088, format 1 (0x7715dc4e92e0)
6093.849113084 [18503-18841] ../src/vabackend.c: 988 nvCreateSurfaces2 Creating surface 1920x1088, format 1 (0x7715dc4e8f20)
6093.849122041 [18503-18841] ../src/vabackend.c: 988 nvCreateSurfaces2 Creating surface 1920x1088, format 1 (0x7715dc4e8660)
6093.849130975 [18503-18841] ../src/vabackend.c: 988 nvCreateSurfaces2 Creating surface 1920x1088, format 1 (0x7715dc4e8200)
6093.849140053 [18503-18841] ../src/vabackend.c: 988 nvCreateSurfaces2 Creating surface 1920x1088, format 1 (0x7715dc4e8e80)
6093.849148689 [18503-18841] ../src/vabackend.c: 988 nvCreateSurfaces2 Creating surface 1920x1088, format 1 (0x7715dc4e8b60)
6093.849158149 [18503-18841] ../src/vabackend.c: 988 nvCreateSurfaces2 Creating surface 1920x1088, format 1 (0x7715dc4e9100)
6093.849167167 [18503-18841] ../src/vabackend.c: 988 nvCreateSurfaces2 Creating surface 1920x1088, format 1 (0x7715dc4e8340)
6093.849176043 [18503-18841] ../src/vabackend.c: 988 nvCreateSurfaces2 Creating surface 1920x1088, format 1 (0x7715dc4e8480)
6093.849184821 [18503-18841] ../src/vabackend.c: 988 nvCreateSurfaces2 Creating surface 1920x1088, format 1 (0x7715dc4e8700)
6093.849193501 [18503-18841] ../src/vabackend.c: 988 nvCreateSurfaces2 Creating surface 1920x1088, format 1 (0x7715dc4e85c0)
6093.849202402 [18503-18841] ../src/vabackend.c: 988 nvCreateSurfaces2 Creating surface 1920x1088, format 1 (0x7715dc4e9380)
6093.849210843 [18503-18841] ../src/vabackend.c: 988 nvCreateSurfaces2 Creating surface 1920x1088, format 1 (0x7715dc4e82a0)
6093.849219440 [18503-18841] ../src/vabackend.c: 988 nvCreateSurfaces2 Creating surface 1920x1088, format 1 (0x7715dc4e88e0)
6093.855416765 [18503-18841] ../src/vabackend.c:1051 nvCreateContext Creating context with 24 render targets, at 1920x1088
6093.882865331 [18503-18841] ../src/vabackend.c:1134 nvCreateContext Creating decoder: 0x7715dc203000 for context id: 34
6093.883274713 [18503-18844] ../src/vabackend.c: 403 resolveSurfaces [RT] Resolve thread for 0x7715d6c07000 started
6093.885448279 [18503-18841] ../src/direct/direct-export-buf.c: 190 direct_allocateBackingImage Allocating BackingImages: 0x7715dca4fa50 1920x1088
6093.886631203 [18503-18841] ../src/direct/direct-export-buf.c: 75 import_to_cuda CUDA ERROR 'invalid device ordinal' (101)
6093.887047438 [18503-18841] ../src/direct/direct-export-buf.c: 330 direct_realiseSurface Unable to realise surface: 0x7715dc4e88e0 (0)
6093.887231219 [18503-18841] ../src/vabackend.c:2147 nvExportSurfaceHandle Unable to export surface
6093.888541902 [18503-18842] ../src/vabackend.c:1170 nvDestroyContext Destroying context: 34
6093.888655329 [18503-18842] ../src/vabackend.c: 321 destroyContext Signaling resolve thread to exit
6093.888706285 [18503-18842] ../src/vabackend.c: 327 destroyContext Waiting for resolve thread to exit
6094.034130437 [18503-18844] ../src/direct/direct-export-buf.c: 190 direct_allocateBackingImage Allocating BackingImages: 0x7715dca4fd90 1920x1088
6094.034950442 [18503-18844] ../src/direct/direct-export-buf.c: 75 import_to_cuda CUDA ERROR 'invalid device ordinal' (101)
6094.035203983 [18503-18844] ../src/direct/direct-export-buf.c: 330 direct_realiseSurface Unable to realise surface: 0x7715dc4e88e0 (0)
6094.111230459 [18503-18844] ../src/vabackend.c: 458 resolveSurfaces [RT] Resolve thread for 0x7715d6c07000 exiting
6094.111279813 [18503-18842] ../src/vabackend.c: 329 destroyContext Finished waiting for resolve thread with 0
6094.111402997 [18503-18842] ../src/vabackend.c:1023 nvDestroySurfaces Destroying surface 0 (0x7715dc4e88e0)
6094.111411050 [18503-18842] ../src/vabackend.c:1023 nvDestroySurfaces Destroying surface -1 (0x7715dc4e82a0)
6094.111414857 [18503-18842] ../src/vabackend.c:1023 nvDestroySurfaces Destroying surface -1 (0x7715dc4e9380)
6094.111418464 [18503-18842] ../src/vabackend.c:1023 nvDestroySurfaces Destroying surface -1 (0x7715dc4e85c0)
6094.111421853 [18503-18842] ../src/vabackend.c:1023 nvDestroySurfaces Destroying surface -1 (0x7715dc4e8700)
6094.111425209 [18503-18842] ../src/vabackend.c:1023 nvDestroySurfaces Destroying surface -1 (0x7715dc4e8480)
6094.111428587 [18503-18842] ../src/vabackend.c:1023 nvDestroySurfaces Destroying surface -1 (0x7715dc4e8340)
6094.111432771 [18503-18842] ../src/vabackend.c:1023 nvDestroySurfaces Destroying surface -1 (0x7715dc4e9100)
6094.111437154 [18503-18842] ../src/vabackend.c:1023 nvDestroySurfaces Destroying surface -1 (0x7715dc4e8b60)
6094.111441223 [18503-18842] ../src/vabackend.c:1023 nvDestroySurfaces Destroying surface -1 (0x7715dc4e8e80)
6094.111445244 [18503-18842] ../src/vabackend.c:1023 nvDestroySurfaces Destroying surface -1 (0x7715dc4e8200)
6094.111449276 [18503-18842] ../src/vabackend.c:1023 nvDestroySurfaces Destroying surface -1 (0x7715dc4e8660)
6094.111453225 [18503-18842] ../src/vabackend.c:1023 nvDestroySurfaces Destroying surface -1 (0x7715dc4e8f20)
6094.111457209 [18503-18842] ../src/vabackend.c:1023 nvDestroySurfaces Destroying surface -1 (0x7715dc4e92e0)
6094.111461242 [18503-18842] ../src/vabackend.c:1023 nvDestroySurfaces Destroying surface -1 (0x7715dc4e8de0)
6094.111466864 [18503-18842] ../src/vabackend.c:1023 nvDestroySurfaces Destroying surface -1 (0x7715dc4e9240)
6094.111474748 [18503-18842] ../src/vabackend.c:1023 nvDestroySurfaces Destroying surface -1 (0x7715dc4e8160)
6094.111481457 [18503-18842] ../src/vabackend.c:1023 nvDestroySurfaces Destroying surface -1 (0x7715dc4e91a0)
6094.111488247 [18503-18842] ../src/vabackend.c:1023 nvDestroySurfaces Destroying surface -1 (0x7715dc4e8a20)
6094.111494665 [18503-18842] ../src/vabackend.c:1023 nvDestroySurfaces Destroying surface -1 (0x7715dc4e80c0)
6094.111501065 [18503-18842] ../src/vabackend.c:1023 nvDestroySurfaces Destroying surface -1 (0x7715dc4e8ca0)
6094.111507339 [18503-18842] ../src/vabackend.c:1023 nvDestroySurfaces Destroying surface -1 (0x7715dc4e8fc0)
6094.111513839 [18503-18842] ../src/vabackend.c:1023 nvDestroySurfaces Destroying surface -1 (0x7715dc4e83e0)
6094.111520304 [18503-18842] ../src/vabackend.c:1023 nvDestroySurfaces Destroying surface -1 (0x7715dc4e9060)
6094.111529986 [18503-18842] ../src/vabackend.c:2168 nvTerminate Terminating 0x77161ff36d40
6094.111650697 [18503-18842] ../src/vabackend.c:2182 nvTerminate Now have 0 (0 max) instances
The error is returned by the cuExternalMemoryGetMappedMipmappedArray function after succeeding call of cuImportExternalMemory. I can only guess that maybe the memory that has been allocated earlier by alloc_image (direct-export-buf.c:192) and used in the cuImportExternalMemory and cuExternalMemoryGetMappedMipmappedArray calls was allocated on the wrong GPU, because alloc_image -> alloc_memory -> nv_alloc_object call uses NV01_MEMORY_LOCAL_USER flag that suggests the memory is allocated on the default Nvidia device, which is the headless P104 on my system, and I was unable to change that.