Drops HW transcoding mid-stream and starts software transcoding

Server Version#: 1.29.2.6364
Player Version#: 4.87.2
Unraid: 6.11.5
Nvidia Driver: 515.86.01

I’ve been HW transcoding for many months now with a GTX 1070 and recently noticed that Plex is now dropping the HW transcode in favor of software transcoding.

Any stream I start that requires a transcode initializes my GTX 1070, but then hangs up and reverts to software transcoding off my Xeon-E3. If I restart unraid, and begin a fresh Plex session, I’ll get encode/decode activity on the GPU dashboard but then it stops and begins software transcoding. Every new session after that defaults right to software transcoding but the GPU is also active according to nvidia-smi…just no output and the CPU spikes.

/dev/dri/ lists:
drwxrwxrwx 2 root root 100 Nov 24 23:19 by-path/
crwxrwxrwx 1 root video 226, 0 Nov 24 23:19 card0
crwxrwxrwx 1 root video 226, 1 Nov 24 23:19 card1
crwxrwxrwx 1 root video 226, 128 Nov 24 23:19 renderD128

nvidia-smi shows the card and picks up the GPU process during a transcoding request
Plex Media Server Logs_2022-11-25_13-21-57.zip (2.4 MB)

  1. Please turn VERBOSE logging off. It’s not necessary in 99% of the cases

  2. There appears to be a problem with the video. It’s using hardware then encounters an error in the data stream and falls back to software so it can continue to play.

Nov 25, 2022 12:01:12.323 [0x149b82a6bb38] DEBUG - Request: [127.0.0.1:34992 (Loopback)] POST /video/:/transcode/session/5wdd8mbz48o9kkkp01xvnakl/44e51fe2-fbbc-4b01-9edd-9e28767bd4dc/progress/log?level=0&message=%5BParsed_scale_cuda_1%20%40%200x1472bc52b7c0%5D%20Failed%20to%20configure%20output%20pad%20on%20Parsed_scale_cuda_1 (15 live) #16a1d Signed-in Token (jaconsass) (range: bytes=0-)  / Accept => */* / Connection => keep-alive / Host => 127.0.0.1:32400 / Icy-MetaData => 1 / Range => bytes=0- / User-Agent => Lavf/58.65.101 / X-Plex-Http-Pipeline => infinite / X-Plex-Token => xxxxxxxxxxxxxxxxxxxx4a0e-90b0-6688edf214c4
Nov 25, 2022 12:01:12.323 [0x149b82a6bb38] ERROR - [Req#16a1d/Transcode/5wdd8mbz48o9kkkp01xvnakl/44e51fe2-fbbc-4b01-9edd-9e28767bd4dc] [Parsed_scale_cuda_1 @ 0x1472bc52b7c0] Failed to configure output pad on Parsed_scale_cuda_1
Nov 25, 2022 12:01:12.323 [0x149b84df6b38] DEBUG - Completed: [127.0.0.1:34992] 200 POST /video/:/transcode/session/5wdd8mbz48o9kkkp01xvnakl/44e51fe2-fbbc-4b01-9edd-9e28767bd4dc/progress/log?level=0&message=%5BParsed_scale_cuda_1%20%40%200x1472bc52b7c0%5D%20Failed%20to%20configure%20output%20pad%20on%20Parsed_scale_cuda_1 (15 live) 0ms 195 bytes (pipelined: 64) (range: bytes=0-) 
Nov 25, 2022 12:01:12.323 [0x149b81e39b38] DEBUG - Request: [127.0.0.1:34992 (Loopback)] POST /video/:/transcode/session/5wdd8mbz48o9kkkp01xvnakl/44e51fe2-fbbc-4b01-9edd-9e28767bd4dc/progress/log?level=0&message=%5BAVHWDeviceContext%20%40%200x1472c6ade780%5D%20cu-%3EcuMemFree%28%28CUdeviceptr%29data%29%20failed%20-%3E%20CUDA_ERROR_ILLEGAL_ADDRESS%3A%20an%20illegal%20memory%20access%20was%20encountered (15 live) #16a1e Signed-in Token (jaconsass) (range: bytes=0-)  / Accept => */* / Connection => keep-alive / Host => 127.0.0.1:32400 / Icy-MetaData => 1 / Range => bytes=0- / User-Agent => Lavf/58.65.101 / X-Plex-Http-Pipeline => infinite / X-Plex-Token => xxxxxxxxxxxxxxxxxxxx4a0e-90b0-6688edf214c4
Nov 25, 2022 12:01:12.324 [0x149b82838b38] DEBUG - Request: [127.0.0.1:34992 (Loopback)] POST /video/:/transcode/session/5wdd8mbz48o9kkkp01xvnakl/44e51fe2-fbbc-4b01-9edd-9e28767bd4dc/progress/log?level=0&message=Failed%20to%20inject%20frame%20into%20filter%20network%3A%20Generic%20error%20in%20an%20external%20library (15 live) #16a20 Signed-in Token (jaconsass) (range: bytes=0-)  / Accept => */* / Connection => keep-alive / Host => 127.0.0.1:32400 / Icy-MetaData => 1 / Range => bytes=0- / User-Agent => Lavf/58.65.101 / X-Plex-Http-Pipeline => infinite / X-Plex-Token => xxxxxxxxxxxxxxxxxxxx4a0e-90b0-6688edf214c4
Nov 25, 2022 12:01:12.324 [0x149b82838b38] ERROR - [Req#16a20/Transcode/5wdd8mbz48o9kkkp01xvnakl/44e51fe2-fbbc-4b01-9edd-9e28767bd4dc] Failed to inject frame into filter network: Generic error in an external library
Nov 25, 2022 12:01:12.324 [0x149b84bf3b38] DEBUG - Completed: [127.0.0.1:34992] 200 POST /video/:/transcode/session/5wdd8mbz48o9kkkp01xvnakl/44e51fe2-fbbc-4b01-9edd-9e28767bd4dc/progress/log?level=0&message=Failed%20to%20inject%20frame%20into%20filter%20network%3A%20Generic%20error%20in%20an%20external%20library (15 live) 0ms 195 bytes (pipelined: 67) (range: bytes=0-) 
Nov 25, 2022 12:01:12.324 [0x149b80064b38] DEBUG - Request: [127.0.0.1:34992 (Loopback)] POST /video/:/transcode/session/5wdd8mbz48o9kkkp01xvnakl/44e51fe2-fbbc-4b01-9edd-9e28767bd4dc/progress/log?level=0&message=Error%20while%20processing%20the%20decoded%20data%20for%20stream%20%230%3A0 (15 live) #16a21 Signed-in Token (jaconsass) (range: bytes=0-)  / Accept => */* / Connection => keep-alive / Host => 127.0.0.1:32400 / Icy-MetaData => 1 / Range => bytes=0- / User-Agent => Lavf/58.65.101 / X-Plex-Http-Pipeline => infinite / X-Plex-Token => xxxxxxxxxxxxxxxxxxxx4a0e-90b0-6688edf214c4
Nov 25, 2022 12:01:12.324 [0x149b80064b38] ERROR - [Req#16a21/Transcode/5wdd8mbz48o9kkkp01xvnakl/44e51fe2-fbbc-4b01-9edd-9e28767bd4dc] Error while processing the decoded data for stream #0:0
Nov 25, 2022 12:01:12.324 [0x149b84df6b38] DEBUG - Completed: [127.0.0.1:34992] 200 POST /video/:/transcode/session/5wdd8mbz48o9kkkp01xvnakl/44e51fe2-fbbc-4b01-9edd-9e28767bd4dc/progress/log?level=0&message=Error%20while%20processing%20the%20decoded%20data%20for%20stream%20%230%3A0 (15 live) 0ms 195 bytes (pipelined: 68) (range: bytes=0-) 
Nov 25, 2022 12:01:12.349 [0x149b84bf3b38] VERBOSE - Didn't receive a request from 127.0.0.1:34992: End of file
Nov 25, 2022 12:01:12.368 [0x149b84ff9b38] VERBOSE - JobManager: child process with handle 31292 exited
Nov 25, 2022 12:01:12.368 [0x149b84ff9b38] DEBUG - Jobs: '/usr/lib/plexmediaserver/Plex Transcoder' exit code for process 31292 is 1 (failure)
Nov 25, 2022 12:01:12.369 [0x149b80267b38] DEBUG - Streaming Resource: Changing client to use software decoding

Which Nvidia drivers are you using?

Does this occur for all videos?

Occurs on all videos, 4K, VC-1, etc.

listed the drivers in the initial post. 515.86.01. I’ve tried a few drivers, all with the same result:
v525.53
v520.56.06
v515.86.01
v515.76
v470.141.03

I’m not sure which drivers I should be using anymore…

Not sure if this is relevant, but sometimes the GPU fires away on the decoder side with no activity on the encoder.

I wouldn’t use the “still wet” 515.86.01. They were just released three days ago and as of yet ‘unproven’.
Just as with anything (Windows, Synology, and Plex) holding back a bit is often a far better choice.

I recommend backing down to 510.85.02

510.85.02 has more than enough to satisfy Plex requirements now and for some time into the future.

After you have completed the downgrade and confirmed nvidia-smi works correctly, we’ll see where you stand with it and go from there.

510 is not an option with the Nvidia plugin. The ones previously listed are all I have access to.

apt list | grep nvidia

produces:

primus-nvidia/focal 0~20150328-10 amd64
xserver-xorg-video-nvidia-390/focal-updates,focal-security 390.154-0ubuntu0.20.04.1 amd64
xserver-xorg-video-nvidia-390/focal-updates,focal-security 390.154-0ubuntu0.20.04.1 i386
xserver-xorg-video-nvidia-418-server/focal-updates,focal-security 418.226.00-0ubuntu0.20.04.2 amd64
xserver-xorg-video-nvidia-418/focal 430.50-0ubuntu3 amd64
xserver-xorg-video-nvidia-430/focal-updates,focal-security 440.100-0ubuntu0.20.04.1 amd64
xserver-xorg-video-nvidia-435/focal-updates 455.45.01-0ubuntu0.20.04.1 amd64
xserver-xorg-video-nvidia-440-server/focal-updates,focal-security 450.203.03-0ubuntu0.20.04.1 amd64
xserver-xorg-video-nvidia-440/focal-updates,focal-security 450.119.03-0ubuntu0.20.04.1 amd64
xserver-xorg-video-nvidia-450-server/focal-updates,focal-security 450.203.03-0ubuntu0.20.04.1 amd64
xserver-xorg-video-nvidia-450/focal-updates,focal-security 460.91.03-0ubuntu0.20.04.1 amd64
xserver-xorg-video-nvidia-455/focal-updates,focal-security 460.91.03-0ubuntu0.20.04.1 amd64
xserver-xorg-video-nvidia-460-server/focal-updates,focal-security 470.141.10-0ubuntu0.20.04.2 amd64
xserver-xorg-video-nvidia-460/focal-updates,focal-security 470.141.03-0ubuntu0.20.04.1 amd64
xserver-xorg-video-nvidia-465/focal-updates,focal-security 470.141.03-0ubuntu0.20.04.1 amd64
xserver-xorg-video-nvidia-470-server/focal-updates,focal-security 470.141.10-0ubuntu0.20.04.2 amd64
xserver-xorg-video-nvidia-470/focal-updates,focal-security 470.141.03-0ubuntu0.20.04.1 amd64
xserver-xorg-video-nvidia-495/focal-updates,focal-security 510.85.02-0ubuntu0.20.04.1 amd64
xserver-xorg-video-nvidia-510-server/focal-updates,focal-security 510.85.02-0ubuntu0.20.04.1 amd64
xserver-xorg-video-nvidia-510/focal-updates,focal-security 510.85.02-0ubuntu0.20.04.1 amd64
xserver-xorg-video-nvidia-515-server/focal-updates,focal-security 515.65.01-0ubuntu0.20.04.1 amd64
xserver-xorg-video-nvidia-515/focal-updates,focal-security 515.76+really.515.65.01-0ubuntu0.20.04.1 amd64
xserver-xorg-video-nvidia-520/focal-updates,focal-security 520.56.06-0ubuntu0.20.04.1 amd64
[chuck@lizum testfiles.2009]$ 

Are you 100% certain?

I see everything mainstream from 390 → 520

Maybe my question should be, how do I grab that specific set of drivers for Unraid?

IF I could get unraid installed in a VM (I’ve been trying to get a boot thumb for 3 weeks now) I’d more than be willing to help figure it out. ( I don’t have any windows machines and their manual method considers my thumbs “not compatible”)

I don’t think you put drivers in Unraid

The base OS (Unraid) stays as it is.

In the normal VMs have you

  1. Pass the hardware into the VM. (since you’re abstracting the hardware)
  2. Install whichever OS you want.
  3. Install third party drivers, such as the Nvidia drivers
  4. Install apps (Plex)
  5. Mount media (NFS)
  6. Create Libraries

Maybe the better choice here is Ubuntu 20.04.5 in the VM?

Example:

  1. I have ESXi (the hypervisor)
  2. In the ESXi VM, I pass the GPU card
  3. In the VM, I install Ubuntu 20.04.5, Nvidia GPU drivers 510.85.02 and Plex
  4. Configure Plex to look at /dev/dri/renderD129 (where the GPU sits on an Intel CPU with QSV capability of its own)
  5. Start up and go

( The only thing ESXI knows is that it assigned the PCI device to the VM. It does nothing else. I also do the same thing on QNAP if I’m running a VM )

I’m decent with Linux so if there’s a command to pull the drivers into the kernel, I’ll give it a shot. I put a message out to the Nvidia plugin developer to see if he knows how to downgrade.

Under normal Linux,

  1. The drivers are shared object (.so) library files and are loaded on demand by apps
  2. The kernel modules (.ko) are built into the ramdisk so the kernel can load them at startup

The normal package install
– places the files
– runs DKMS to compile any headers needed for the kernel (which nvidia does)
– builds a new ramdisk image for the kernel to load at next start

Uninstall does the reverse
– Remove the files
– Build new ramdisk image (the DKMS files will be gone)

You’re truly, and unfortunately, on your own with unraid. It’s not a distro we support.

I’ve always thought of it as a NAS product and never a hyper-convergence product as you’re expressing.

Docker on NAS boxes is easy but this is a whole new level of integration.

Is there any way you can move PMS to a “regular computer” and leave the media on Unraid?

Unfortunately, no.

Can you use the hypervisor as a plain hypervisor and pass the Nvidia card (by PCI address) into the VM?

There seems to be a lot of youtube’s about how to do it.

This is one of the first I found.

https://www.youtube.com/watch?v=kFrHDkEbCQA

@ChuckPa I’ve also had this happen of late on several occasions. Nothing recent enough where I could dig out a log. Just me noticing my NUC CPU very high due to a movie that dropped out of hw transcoding mid stream. I attributed it to the preview transcoder and flipped back a day or two ago to the release stream. Ill keep an eye open and grab logs if I see it.

A little more detail for any additional chance at a debug.

VC1 videos starts with a transcode then converts to software within the first 5 mins. I started a separate HEVC movie and the same thing. Played the first few mins using HW, then hangs up.

New logs w/o verbose attached. There are several videos played during these logs.
Plex Media Server Logs_2022-11-26_19-44-38.zip (2.8 MB)

@jaconsass

Your logs tell the tale.

Nov 26, 2022 18:27:10.900 [0x14fa3f5fcb38] INFO - Processor: 8-core Intel(R) Xeon(R) CPU E3-1275L v3 @ 2.70GHz
Nov 26, 2022 18:27:10.900 [0x14fa3f5fcb38] INFO - Compiler is - Clang 11.0.1 (https://plex.tv 9b997da8e5b47bdb4a9425b3a3b290be393b4b1f)
Nov 26, 2022 18:27:10.901 [0x14fa3f5fcb38] INFO - /usr/lib/plexmediaserver/Plex Media Server
Nov 26, 2022 18:27:10.837 [0x14fa2c9e9b38] ERROR - [Req#2a8ae/Transcode/4263d6d0-a5a5-437f-a546-b46190a9429a-1/7483ce2f-b469-4100-828d-f0b823044894] Error while decoding stream #0:0: Generic error in an external library
Nov 26, 2022 18:27:10.846 [0x14fa3a374b38] ERROR - [Req#2a8af/Transcode/4263d6d0-a5a5-437f-a546-b46190a9429a-1/7483ce2f-b469-4100-828d-f0b823044894] [hevc @ 0x149872c04a40] decoder->cvdl->cuvidMapVideoFrame(decoder->decoder, cf->idx, &devptr, &pitch, &vpp) failed -> CUDA_ERROR_MAP_FAILED: mapping of buffer object failed
Nov 26, 2022 18:27:10.846 [0x14fa2c3e0b38] ERROR - [Req#2a8b0/Transcode/4263d6d0-a5a5-437f-a546-b46190a9429a-1/7483ce2f-b469-4100-828d-f0b823044894] Error while decoding stream #0:0: Generic error in an external library
Nov 26, 2022 18:27:10.852 [0x14fa3b76fb38] ERROR - [Req#2a8b1/Transcode/4263d6d0-a5a5-437f-a546-b46190a9429a-1/7483ce2f-b469-4100-828d-f0b823044894] [hevc @ 0x149872c04a40] decoder->cvdl->cuvidMapVideoFrame(decoder->decoder, cf->idx, &devptr, &pitch, &vpp) failed -> CUDA_ERROR_MAP_FAILED: mapping of buffer object failed
Nov 26, 2022 18:27:10.852 [0x14fa398cdb38] ERROR - [Req#2a8b2/Transcode/4263d6d0-a5a5-437f-a546-b46190a9429a-1/7483ce2f-b469-4100-828d-f0b823044894] Error while decoding stream #0:0: Generic error in an external library
Nov 26, 2022 18:27:10.868 [0x14fa3cd87b38] ERROR - [Req#2a8b3/Transcode/4263d6d0-a5a5-437f-a546-b46190a9429a-1/7483ce2f-b469-4100-828d-f0b823044894] [hevc @ 0x149872c04a40] decoder->cvdl->cuvidMapVideoFrame(decoder->decoder, cf->idx, &devptr, &pitch, &vpp) failed -> CUDA_ERROR_MAP_FAILED: mapping of buffer object failed
Nov 26, 2022 18:27:10.868 [0x14fa39d6bb38] ERROR - [Req#2a8b4/Transcode/4263d6d0-a5a5-437f-a546-b46190a9429a-1/7483ce2f-b469-4100-828d-f0b823044894] Error while decoding stream #0:0: Generic error in an external library
Nov 26, 2022 18:27:10.878 [0x14fa396cab38] ERROR - [Req#2a8b5/Transcode/4263d6d0-a5a5-437f-a546-b46190a9429a-1/7483ce2f-b469-4100-828d-f0b823044894] [hevc @ 0x149872c04a40] decoder->cvdl->cuvidMapVideoFrame(decoder->decoder, cf->idx, &devptr, &pitch, &vpp) failed -> CUDA_ERROR_MAP_FAILED: mapping of buffer object failed
Nov 26, 2022 18:27:10.878 [0x14fa2cde2b38] ERROR - [Req#2a8b6/Transcode/4263d6d0-a5a5-437f-a546-b46190a9429a-1/7483ce2f-b469-4100-828d-f0b823044894] Error while decoding stream #0:0: Generic error in an external library
Nov 26, 2022 18:27:10.883 [0x14fa2c9e9b38] ERROR - [Req#2a8b7/Transcode/4263d6d0-a5a5-437f-a546-b46190a9429a-1/7483ce2f-b469-4100-828d-f0b823044894] [hevc @ 0x149872c04a40] decoder->cvdl->cuvidMapVideoFrame(decoder->decoder, cf->idx, &devptr, &pitch, &vpp) failed -> CUDA_ERROR_MAP_FAILED: mapping of buffer object failed
Nov 26, 2022 18:27:10.884 [0x14fa3a374b38] ERROR - [Req#2a8b8/Transcode/4263d6d0-a5a5-437f-a546-b46190a9429a-1/7483ce2f-b469-4100-828d-f0b823044894] Error while decoding stream #0:0: Generic error in an external library
Nov 26, 2022 18:27:10.886 [0x14fa2c3e0b38] ERROR - [Req#2a8b9/Transcode/4263d6d0-a5a5-437f-a546-b46190a9429a-1/7483ce2f-b469-4100-828d-f0b823044894] [hevc @ 0x149872c04a40] decoder->cvdl->cuvidMapVideoFrame(decoder->decoder, cf->idx, &devptr, &pitch, &vpp) failed -> CUDA_ERROR_MAP_FAILED: mapping of buffer object failed
Nov 26, 2022 18:27:10.887 [0x14fa3b76fb38] ERROR - [Req#2a8ba/Transcode/4263d6d0-a5a5-437f-a546-b46190a9429a-1/7483ce2f-b469-4100-828d-f0b823044894] Error while decoding stream #0:0: Generic error in an external library

The cuvid... looks like the Nvidia driver entry point.

I can’t see the strart of the transcode session because it’s rolled off the end of the buffer.

I hear a lot of problems with Unraid…

If you look back 2 posts, you see where I suggest Nvidia GPU passthrough into a VM.

The implication / strategy is to install Ubuntu/Debian in that VM with the Nvidia 510.85.02 drivers

In that configuration, unraid does not get involved whereas in Docker you’re relying on Unraid to handle it all.

Running a full VM is likely not desirable, it will confirm where the problem is.

I appreciate you taking a second look. I’m investigating whether its possible to run a cuda-memtest on the card through Unraid to determine if this is a hardware failure. Everyone else running the same OS and drivers doesn’t seem to have any problems.

I’d almost rather drop $100 on a 1060/1070 than take the time to add a Ubuntu VM. Unraid has been stable for me and this is the first trouble ive had with it since I starting the OS several years ago.

I’d check your drivers first.

Hardware “misbehaving” can also happen when the software expects one thing but the driver is doing something else.

Are you using a newer (I use 510.85.02 driver … ‘newer’ but not the bleeding edge… and more than enough for PMS for the foreseeable future)

Currently on 515.86.01

That is the “wet behind the ears” version released just before the US holiday.

Personally? I wouldn’t trust it this freshly released but that’s your call.
Nvidia drivers have a history of not being stable at the .00 and .01 .
I’ve been bitten by being too aggressive on their drivers.

I stay back in the ‘middle of the pack’ and at a more mature release.
That’s why I have the 510.85.02 version. It more than meets Plex’s needs.

Well, @ChuckPa I took your advice and downgraded to 470.101.03 last night. Rebooted and was able to successfully transcode all videos. Coincidentally, NVIDIA released a new r525 driver (525.60.11) this morning to the Production Branch.

Rather than try again and generate a NVIDIA-BUG-REPORT for the r515 driver, I jumped right to r525 and all is well. I’m off and running again with the 525 and won’t look back.

Either the 515 drivers are buggy for my 1070 or there was another underlying issue that was solved with the 3rd uninstall/re-install. At this point, IDK and I don’t care any more. It does seem…odd…that 515.86.01 was in production for only 6 days.

Appreciate your patience and help!