Plex Media server crashing frequently (not always) on Nvidia GPU transcoding

Server Version#: v1.40.5.8921-836b34c27
Player Version#: Various (Not relevant)

I’ve recently added an Nvidia Quadro P2000 to my Linux server. I’m running Ubuntu and have nvidia drivers 550.107.02 installed. I’ve enabled hardware transcoding, and often (not always) when a client tried to play something using transcoding, the server will crash. It writes a minidump and uploads it.

I’ve opened the minidump with windbg, and the best I can figure out is it’s crashing in libnvcuvid.so.1

It does work sometimes, and the times where it does work, the client can stream the whole movie/episode without further issue.

I’ve also compiled ffmpeg with nvenc support, and I can run ffmpeg transcoding as many times as I want with no issues. There’s no clues I can see in the logs. Here’s what the end of a log typically looks like:

Sep 02, 2024 14:36:57.036 [140637676071736] DEBUG - [Req#5464/Transcode] Codecs: hardware transcoding: testing API nvdec for device 'pci:0000:02:00.0' (GP106GL [Quadro P2000])
Sep 02, 2024 14:36:57.099 [140637663415096] DEBUG - [Req#546e/Transcode] Codecs: Testing with profile 'High'
Sep 02, 2024 14:36:57.158 [140637239950136] VERBOSE - [Req#547a/Transcode] [FFMPEG] - Loaded Nvenc version 12.2
Sep 02, 2024 14:36:57.158 [140637239950136] VERBOSE - [Req#547a/Transcode] [FFMPEG] - Nvenc initialized successfully
Sep 02, 2024 14:36:57.200 [140637235555128] VERBOSE - [Req#54a1/Transcode] [FFMPEG] - Nvenc unloaded
Sep 02, 2024 14:36:57.414 [140638037535544] VERBOSE - We didn't receive any data from 192.168.111.149:52352 in time, dropping connection.
Sep 02, 2024 14:36:57.920 [140638035426104] VERBOSE - WebSocket: processed 1 frame(s)
Sep 02, 2024 14:36:58.060 [140637650758456] DEBUG - [Req#5477/Transcode] Codecs: testing h264 (decoder) with hwdevice nvdec
Sep 02, 2024 14:36:58.060 [140637650758456] VERBOSE - [Req#5477/Transcode] [FFMPEG] - Rescanning for external libs: '/var/lib/plexmediaserver/Library/Application\ Support/Plex\ Media\ Server/Codecs/27d3929-731e70e17b964ba367f4016a-linux-x86_64/'
Sep 02, 2024 14:36:58.061 [140637650758456] DEBUG - [Req#5477/Transcode] Codecs: hardware transcoding: testing API nvdec for device 'pci:0000:02:00.0' (GP106GL [Quadro P2000])
Sep 02, 2024 14:36:58.145 [140637663415096] DEBUG - [Req#546e/Transcode] Codecs: testing h264_nvenc (encoder)
Sep 02, 2024 14:36:58.145 [140637663415096] DEBUG - [Req#546e/Transcode] Codecs: hardware transcoding: testing API nvenc for device 'pci:0000:02:00.0' (GP106GL [Quadro P2000])
Sep 02, 2024 14:36:58.146 [140637676071736] DEBUG - [Req#5464/Transcode] Codecs: Testing with profile 'High'
Sep 02, 2024 14:36:58.179 [140637667633976] VERBOSE - [Req#546b/Transcode] [FFMPEG] - Loaded Nvenc version 12.2
Sep 02, 2024 14:36:58.179 [140637667633976] VERBOSE - [Req#546b/Transcode] [FFMPEG] - Nvenc initialized successfully
Sep 02, 2024 14:36:58.410 [140638037535544] VERBOSE - Didn't receive a request from 192.168.111.149:52352: stream truncated
Sep 02, 2024 14:36:59.570 [140637135919928] VERBOSE - [Req#54bc/Transcode] [FFMPEG] - Nvenc unloaded
Sep 02, 2024 14:36:59.628 [140638035426104] VERBOSE - WebSocket: processed 1 frame(s)
Sep 02, 2024 14:36:59.671 [140637231336248] VERBOSE - [Req#54b9/Transcode] [FFMPEG] - Loaded Nvenc version 12.2
Sep 02, 2024 14:36:59.671 [140637231336248] VERBOSE - [Req#54b9/Transcode] [FFMPEG] - Nvenc initialized successfully
Sep 02, 2024 14:36:59.716 [140637235555128] DEBUG - [Req#54a1/Transcode] Codecs: testing h264 (decoder) with hwdevice nvdec
Sep 02, 2024 14:36:59.716 [140637235555128] VERBOSE - [Req#54a1/Transcode] [FFMPEG] - Rescanning for external libs: '/var/lib/plexmediaserver/Library/Application\ Support/Plex\ Media\ Server/Codecs/27d3929-731e70e17b964ba367f4016a-linux-x86_64/'
Sep 02, 2024 14:36:59.718 [140637235555128] DEBUG - [Req#54a1/Transcode] Codecs: hardware transcoding: testing API nvdec for device 'pci:0000:02:00.0' (GP106GL [Quadro P2000])

are you using your own FFMPEG with PMS?

No - I am not aware how to replace the built in one. I just built another standalone version to test out the GPU.

I did notice after one of the recent crashes, the GPU was completely unresponsive. I had to rmmod all the drivers, use a echo 1 > /sys/devices/pci0000:00/0000:00:02.0/0000:02:00.0/remove to shut down the GPU, echo "1" > /sys/bus/pci/rescan and then reload the drivers to get ffmpeg nvenc and plex to start working again.

I just noticed these 3 lines in my kernel dmesg log which correspond to PMS crashes:

[1522024.721830] PMS ReqHandler[41883]: segfault at 7f240d515a3c ip 00007f24411be911 sp 00007f240ba37ec0 error 4 in libnvcuvid.so.550.107.02[7f24411a7000+bc6000]
[1523793.364144] PMS ReqHandler[39743]: segfault at 7fe879dcb89c ip 00007fe87ef97911 sp 00007fe8792acec0 error 4
[1528646.298440] PMS ReqHandler[23607]: segfault at 7f7a2b84285c ip 00007f7a41402911 sp 00007f7a2b347ec0 error 4 in libnvcuvid.so.550.107.02[7f7a413eb000+bc6000]

550-107-02 ??

It’s crashing in the Nvidia drivers and taking down PMS with it.

I am running stable on 550.90.07
Ubuntu vetted drivers are 550.90.07 (as of last week when my RTX2000 arrived)

Did you install nvidia-driver-550 from ubuntu apt ?

No, I used the download the drivers from nvidia website that gave me a “.run” file which I ran to install that version. It gave me a download called NVIDIA-Linux-x86_64-550.107.02.run

I tried 560 drivers too, and had no difference. I could try an older version maybe?

experience shows us (painfully).

  1. The .run files are often “Bleeding Edge”
  2. Ubuntu vetted drivers are stable and more than enough
    – My new card only wants 550 drivers right now.

Would you consider undoing the other Nvidia drivers you have and switching to the vetted drivers ?

This is what Ubuntu shows

nvidia-utils-510/jammy-updates,jammy-security 525.147.05-0ubuntu2.22.04.1 amd64
nvidia-utils-515-server/jammy-updates,jammy-security 525.147.05-0ubuntu2.22.04.1 amd64
nvidia-utils-515/jammy-updates,jammy-security 525.147.05-0ubuntu2.22.04.1 amd64
nvidia-utils-520/jammy-updates,jammy-security 525.147.05-0ubuntu2.22.04.1 amd64
nvidia-utils-525-server/jammy-updates,jammy-security 525.147.05-0ubuntu2.22.04.1 amd64
nvidia-utils-525/jammy-updates,jammy-security 525.147.05-0ubuntu2.22.04.1 amd64
nvidia-utils-530/jammy-updates,jammy-security 535.183.01-0ubuntu0.22.04.1 amd64
nvidia-utils-535-server/jammy-updates,jammy-security 535.183.01-0ubuntu0.22.04.1 amd64
nvidia-utils-535/jammy-updates,jammy-security 535.183.01-0ubuntu0.22.04.1 amd64
nvidia-utils-545/jammy-updates 545.29.06-0ubuntu0.22.04.2 amd64
nvidia-utils-550-server/jammy-updates,jammy-security 550.90.07-0ubuntu0.22.04.1 amd64
nvidia-utils-550/jammy-updates,jammy-security,now 550.90.07-0ubuntu0.22.04.1 amd64 [installed,automatic]

I have:

[chuck@lizum ~.2002]$ dpkg -l | grep nvidia | grep 550
ii  libnvidia-cfg1-550:amd64                   550.90.07-0ubuntu0.22.04.1                        amd64        NVIDIA binary OpenGL/GLX configuration library
ii  libnvidia-common-550                       550.90.07-0ubuntu0.22.04.1                        all          Shared files used by the NVIDIA libraries
ii  libnvidia-compute-550:amd64                550.90.07-0ubuntu0.22.04.1                        amd64        NVIDIA libcompute package
ii  libnvidia-compute-550:i386                 550.90.07-0ubuntu0.22.04.1                        i386         NVIDIA libcompute package
ii  libnvidia-decode-550:amd64                 550.90.07-0ubuntu0.22.04.1                        amd64        NVIDIA Video Decoding runtime libraries
ii  libnvidia-decode-550:i386                  550.90.07-0ubuntu0.22.04.1                        i386         NVIDIA Video Decoding runtime libraries
ii  libnvidia-encode-550:amd64                 550.90.07-0ubuntu0.22.04.1                        amd64        NVENC Video Encoding runtime library
ii  libnvidia-encode-550:i386                  550.90.07-0ubuntu0.22.04.1                        i386         NVENC Video Encoding runtime library
ii  libnvidia-extra-550:amd64                  550.90.07-0ubuntu0.22.04.1                        amd64        Extra libraries for the NVIDIA driver
ii  libnvidia-fbc1-550:amd64                   550.90.07-0ubuntu0.22.04.1                        amd64        NVIDIA OpenGL-based Framebuffer Capture runtime library
ii  libnvidia-fbc1-550:i386                    550.90.07-0ubuntu0.22.04.1                        i386         NVIDIA OpenGL-based Framebuffer Capture runtime library
ii  libnvidia-gl-550:amd64                     550.90.07-0ubuntu0.22.04.1                        amd64        NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
ii  libnvidia-gl-550:i386                      550.90.07-0ubuntu0.22.04.1                        i386         NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
ii  nvidia-compute-utils-550                   550.90.07-0ubuntu0.22.04.1                        amd64        NVIDIA compute utilities
ii  nvidia-dkms-550                            550.90.07-0ubuntu0.22.04.1                        amd64        NVIDIA DKMS package
ii  nvidia-driver-550                          550.90.07-0ubuntu0.22.04.1                        amd64        NVIDIA driver metapackage
ii  nvidia-firmware-550-550.90.07              550.90.07-0ubuntu0.22.04.1                        amd64        Firmware files used by the kernel module
ii  nvidia-kernel-common-550                   550.90.07-0ubuntu0.22.04.1                        amd64        Shared files used with the kernel module
ii  nvidia-kernel-source-550                   550.90.07-0ubuntu0.22.04.1                        amd64        NVIDIA kernel source package
ii  nvidia-utils-550                           550.90.07-0ubuntu0.22.04.1                        amd64        NVIDIA driver support binaries
ii  xserver-xorg-video-nvidia-550              550.90.07-0ubuntu0.22.04.1                        amd64        NVIDIA binary Xorg driver
[chuck@lizum ~.2003]$ 

I run PMS while the ai container is running – without issues.

Oh, absolutely (undoing the other drivers). The distro on this server is Xenial, so I can’t use the jammy package. I need to get the server updated to jammy, but need to schedule the downtime for other services running on it to perform the upgrade.

So, your suggestion seems reasonable, and I’ll just have to jump through all the other hoops to get there.

Have you seen this sort of instability before? I’d like to know it’s worth the effort, since I’ll have to spend a lot of hours to get from Xenial to Jammy on this server (it runs a lot of stuff)

In the mean time, I’m installing the .run version of that exact version (550.90.07)

Ubuntu 16?

As friendly FYI, that hasn’t been supported since Ubuntu EOL’d it in 2021.

At this point, I recommend:

  1. get a copy of all your customized files (partitions, users, repos, fstab, etc)
  2. get a list of your packages (a little shell script to get just the names and strip version info because that’s useless
  3. If you’re building a server, consider Ubuntu server (console only)
    – No gui junk to fill it up unnecessarily
  4. If you need GUI, use ‘minimal’
  5. For your /home (always problematic)
    – make a tarball of it this time.
    – as you install the new version, give it its own partition (my swap on p3)
    – You then reload the tarball into /home and you’ve lost nothing
    – NEXT OS upgrade, you don’t format /home. It’ll come through completely safe as long as you tell it to add it.
[chuck@lizum ~.2006]$ df -h
Filesystem     1M-blocks   Used Available Use% Mounted on
efivarfs               1      1         1  63% /sys/firmware/efi/efivars
/dev/nvme0n1p2    124870 101666     23205  82% /
/dev/nvme0n1p4   1048191 556695    491496  54% /home
/dev/nvme0n1p1       499     65       434  13% /boot/efi
  1. When you’re at this point (base os installed and home dir reloaded)
  2. load the packages
sudo while read package
do
  apt -y install $package
done   <  filename_containing_list_of_packages

I can reload from cold, after OS install finishes, in about 30 minutes and be fully up again

Even though it’s EOL’d, there’s extended support for security patches that hasn’t run out yet.

Thanks for the suggestions - I’ll keep them in mind.

Running drivers 550.90.07 from the .run didn’t help. I’ll work on getting this up to a current release, and if I’m still having issues, I’ll come back. Thanks for your help.

The other thing against you … is Ubuntu 16.

What kernel do you have on it 4.x ?
Ask because 4.x is long since obsolete.
Kernel 5.x has been retired for all the newer CPUs.
6.5.0 is the defacto standard now with 6.8.0/6.8.1 (and newer) coming online as standard.

PMS and Nvidia are being updated to those standards.

Yes, the downloads page says Ubuntu 16.04 & Debian 8 but you can’t install a new system with either.

I have a ticket to Engineering to change the minimum to Ubuntu 20.
( 5 year LTS standard limitation ) which will update to 22 and 12 next year.

The minimum Linux OS versions for NVIDIA are Ubuntu Desktop 18.04 or Ubuntu Desktop 20.04 LTS, depending on the installation method. Ubuntu Desktop 20.04 LTS is required for building and flashing DRIVE OS Yocto components.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.