Potential issue with QNAP transcoding new NVIDIA_GPU_DRV_5.1.0.0822_x86_64

QNAP have shipped a new Nvidia GPU driver to their app store and my QNAP set to auto update installed it yesterday.

I seem to have lost hardware transcoding with the P400 (rarely use myself but my GF is in hospital with limited data)

troubleshooting performed so far.
I have stopped and started Plex (I don’t have the SortMyQPKGs so occasionally need to do this to ensure driver is loaded first?)

I have uninstalled and 5.1.0.0822, rebooted and attempted to try 5.0.4.1. Even after a second reboot the kerneldriver in app was still showing the new version?)
This didn’t help and have removed and reapplied the latest driver pckg (rebooting after each change)

There are a few log exerts below and any info thought may help.

Is this likely like the previously thread referenced below and will require a change from QNAP or what can I do or provide to assist troubleshoot this issue further?

Any help appreciated.

  1. TVS-873e
  2. Nvidia Card model P400
  3. QTS version 5.0.1.2145 (not QuTS)
  4. Version of NVIDIA_GPU_DRV
  5. What you see in the Control Panel (attached)
  6. QNAP Support ticket number(s) (this provides more details for QNAP) - Not yet logged

Plex Server Version# 1.28.2.6151
Player s tested, Chrome and Safari on Mac, Chome on Windows Plex app on Android Phone (Pixel 4a5G)

From Logs (can provide full log if needed)

DEBUG - [Req#2c8/Transcode] TPU: hardware transcoding: enabled, but no hardware decode accelerator found

ERROR - [Req#37e/Transcode] [FFMPEG] - → CUDA_ERROR_SYSTEM_DRIVER_MISMATCH: system has unsupported display driver / cuda driver combination

image

Seems like a similar historical issue

the same problem occurs with me.
Since yesterday’s NVIDIA update to version 5.1.0.0822, hardware transcoding with the GTX 1650 has stopped working. exactly the same problem as @DjGrazzy.
My configuration:
QNAP TS 473
NVIDIA GTX 1650
QTS version 5.0.1.2145
NVIDIA GPU Driver 5.1.0.0822
NVKernelDriver 5.0.1.2145
Plex server Version 1.28.2.6151

I would contact QNAP (they’re very helpful over the phone).

Specifically, You want a complete removal of the ‘new’ driver

Question: Are you trying to use it for 2160p (4K) ? According to specs, it’s not 4K capable.

Recommended Gaming Resolutions: 1600x900 1920x1080 2560x1440

@ChuckPa : So for me I can say that I usually transcode 2160p (4K). But your question was probably directed to @DjGrazzy

Plex hasn’t changed anything (yet). :smiling_imp:

If upgrading their driver to 5.1, you know what I’m going to say :wink:

As this is the first I’m hearing of this, I’ll make some time to see what happens on our TS-877 (GT1050)

Same issue here since updating to the new Nvidia driver. Even tried the latest beta Plex release but with no success. Uninstalled and reinstalled the driver with reboots at each step.

My configuration:
QNAP TS 673a
NVIDIA Quadro P400
QTS version 5.0.1.2145
NVIDIA GPU Driver 5.1.0.0822
NVKernelDriver 5.0.1.2145
Plex server Version 1.29.0.6219

Getting an inconsistency

I’m not seeing the new Nvidia driver offered for the TS-877

Was the installation from a beta ?

App Center - QNAP shows ok here? not a beta.

scratch that… seems the update is only available for QNAP OS 5.0.1 which I note you are not running

Thanks… It somehow missed updating. Updating now

Update.

Yes, Something is out of sync here.

Investigating further. Currently returning the host to 5.0.0 firmware

Glad to see this thread and it’s not just me. Wish I’d never updated to 5.0.1, something always seems to go wrong.

From my investigation on this so far. There’s no /dev/nvidia0 showing up on my NAS. I do have /dev/nvidiactl and /dev/nvidia-uvm and also in /dev/dri look to have renderD128 and renderD129 (My NAS has a Intel CPU with HW acceleration)

I’ve also tried my container with ffmpeg where I do hardware encoding without /dev/nvidia0 but the other nvidia devices and it goes to software encoding. So this isn’t just Plex, it’s broken across the board in my books.

Just to be clear OS 5.0.1 with the old Nvidia drivers was working fine it was only the update to the drivers which caused the issue.

I filed a bug report about this earlier today (QNAP Ticket Number Q-202209-67541). Did a lot of troubleshooting on it last night, and the issue is not Plex-specific. The actual issue here is that the QNAP NVIDIA Driver package was updated (515.48) but the QNAP NVIDIA Kernel driver module was not (460.39). You will note that the /dev/nvidia0 device on the QNAP host has disappeared (though it and the CPU’s GPU, still appear under /dev/dri/). If you try to manually execute nvidia-shi, it will tell you that it can’t load the proper Kernel module.

This all leads to the following error with ffmpeg during a transcode session:

Sep 22, 2022 14:20:16.254 [0x7f3646f6db00] Error — [Req#52e17/Transcode] [FFMPEG] - cu->cuInit(0) failed
Sep 22, 2022 14:20:16.254 [0x7f3646f6db00] Error — [Req#52e17/Transcode] [FFMPEG] -  -> CUDA_ERROR_SYSTEM_DRIVER_MISMATCH: system has unsupported display driver / cuda driver combination
Sep 22, 2022 14:20:16.254 [0x7f3646f6db00] Error — [Req#52e17/Transcode] [FFMPEG] - 

Sep 22, 2022 14:20:16.255 [0x7f3646f6db00] Error — [Req#52e17/Transcode] [FFMPEG] - Failed to initialise VAAPI connection: -1 (unknown libva error).

Good times.

Ah you got further than me and worked it out. Nice one. I’ve also raised a ticket with them on this.

I’ve checked the manual downloads and there’s no new GPU Kernel driver for my NAS so no means to fix this.

Can we not roll back the GPU Driver package until they fix this? Sounds like someone tried it and had no luck but feel like I’m going to try it as no doubt a fix for this is days / weeks away.

I tried that last night and it didn’t work. The old driver will not load on the new version of QTS. I also tried downgrading the QTS version and the package stayed on the newer version. I re-upgraded before I got around to removing the “new” drivers and trying to manually load the old drivers though.

I had a little bit of luck with manually copying the old (460.39) version of the driver files to the QNAP filesystem and rigging them in place – enough that I could get things to “match” between driver version and kernel module driver version. I was able to run the nvidia-shi command and have it show that the versions “match”:

[/share/CACHEDEV1_DATA/Downloads/Drivers/NVIDIA/NVIDIA-Linux-x86_64-460.39] # ./nvidia-smi
Thu Sep 22 10:46:25 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.39       Driver Version: 460.39       CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 105...  Off  | 00000000:0F:00.0 Off |                  N/A |
| 35%   38C    P8    N/A /  75W |      2MiB /  4039MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
[/share/CACHEDEV1_DATA/Downloads/Drivers/NVIDIA/NVIDIA-Linux-x86_64-460.39] #

With that driver (and associated library via ldconfig) loaded, the /dev/nvidia0 device was properly created. All of that mess led me to realize what the error from ffmpeg was trying to tell me from the start. The kernel driver module version and the actual driver version on the filesystem need to match, and when they don’t, ffmpeg is angry.

[~] # cat /share/CACHEDEV1_DATA/.qpkg/NVIDIA_GPU_DRV/version
515.48.07
[~] # cat /share/CACHEDEV1_DATA/.qpkg/NvKernelDriver/version
460.39
[~] #

I’ve turned off automatic updates, uninstalled the GPU driver package and loaded on the old 5.0.4.1 driver from a qpkg package. This has worked for me and I’ve got back my /dev/nvidia0 plus transcoding is working fine.

This will do me as I don’t need the new driver so will wait for a proper fix. Of course cannot turn on auto update apps so I’m just on notify so it doesn’t go off and reinstall the broken new one.

On QTS 5.0.1 or 5.0.0 (downgrade)?

So you don’t worry, I’m doing the 5.0.1 → 5.0.0 downgrade

I’ve already done the manual .uninstall.sh for the two packages.

cd /share/*/.qpkg/NVIDIA_GPU_DRV
./uninstall.sh

# Now repeat for NvKernelDriver

Reboot machine after

No I’m still on 5.0.1 but just on the older driver. I only installed 5.0.1 today as held off for a bit but I can see people here saying that 5.0.1 shipped with the older drivers originally and it’s only upgraded yesterday roughly. So the older drivers were working just fine on 5.0.1 before the upgrade and they are working for me.

I’ll wait until one of you volunteers for the next untested QNAP update before I stick my toes in :wink: How this gets past their QA if they have any, it’s literally impossible for it to work. I can only think they released a driver upgrade for a couple of their NAS’s to work with this and forgot the rest.

1 Like