UnRAID + Nvidia GPU issues

I am having exactlly the same issue. Tested on multiple nvidia GPUs, multiple docker flavours of plex. With legacy 470 drivers HW transcoding works. I suspect the issue might be somewhere on the driver change from cuda 11 to cuda 12, but I might be wrong.

I don’t have Plex logs at the moment, but when I was troubleshooting it I did not see anything of interest. No errors on nvidia-container-toolkit debug logs as well.

Jellyfin HW transcode with the same GPU on the same system runs with no Issues.

@aLaskaratos

Recommend you upgrade to the 535 or 550 drivers on the host.

470 drivers are quite old and going to cause problems.

[chuck@lizum ~.2002]$ nvidia-smi
Mon Sep 23 05:31:03 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.107.02             Driver Version: 550.107.02     CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA RTX 2000 Ada Gene...    On  |   00000000:01:00.0  On |                  Off |
| 30%   36C    P2             23W /   70W |     600MiB /  16380MiB |      6%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      8846      G   /usr/lib/xorg/Xorg                            189MiB |
|    0   N/A  N/A      9085      G   /usr/bin/gnome-shell                           92MiB |
|    0   N/A  N/A     61603      C   ...lib/plexmediaserver/Plex Transcoder        225MiB |
+-----------------------------------------------------------------------------------------+
[chuck@lizum ~.2002]$

I’m testing with the 550.107.02 further

The issue is on 550/560 , it goes away if I try the legacy branch. If I remember correctly it first appeared somewere around the 525-530 version.

I understand.

  1. I am using 550
  2. I have not tried 560 yet (i’ll need test both 560 priorietary and 560 open

FWIW, P2200 with 550.107.02

[chuck@lizum ~.2002]$ nvidia-smi
Mon Sep 23 05:31:03 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.107.02             Driver Version: 550.107.02     CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA RTX 2000 Ada Gene...    On  |   00000000:01:00.0  On |                  Off |
| 30%   36C    P2             23W /   70W |     600MiB /  16380MiB |      6%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      8846      G   /usr/lib/xorg/Xorg                            189MiB |
|    0   N/A  N/A      9085      G   /usr/bin/gnome-shell                           92MiB |
|    0   N/A  N/A     61603      C   ...lib/plexmediaserver/Plex Transcoder        225MiB |
+-----------------------------------------------------------------------------------------+
[chuck@lizum ~.2002]$ 

May I please have your server debug logs (Restart PMS) of the start of a transcode which fails?

What’s the specific GPU model?

I will give you a full debrief as soon as I have access to my server, It is an issue I have been troubleshooting for the past week. I am on the same boat as the OP, I first noticed it a few months ago on a driver update but just sticked on a driver version that worked at the time.

I finally decided to work the issue out and updated to the latest branch to see if I could figure it out. The cards I have tested with are 1080, 1660S, 3060. OS unraid 6.12.13 kernel 6.1.106. The latest version of plex pass PMS atm on LSIO docker, but same behavior on official and hotio.

I’ve had reports of HW transcoding issues on the latest beta build.
Have you considered 8892 ?

I would test on my unraid but it’s totally messed up and I can’t figure out how to do a completely fresh rebuild where everything works. UGH

Mine on the road being equally messed up with all the changes I have been making the last week :rofl:

Will try 8892 as well.

Who do I contact to UN**** the box?

:rofl:

To simplify, I decided to recreate the issue without changing the Nvidia drivers. One fewer variable in the equation.

GPU: Tesla P4

Test #1

  • NVidia Driver v525.89.02
  • PMS v1.28.0.5999
  • Client: app.plex.tv
  • Outcome: Hardware transcode works

Test #2

  • NVidia Driver v525.89.02
  • PMS v1.41.0.8992
  • Client: app.plex.tv
  • Outcome: Hardware transcode fails

Logs attached with debug enabled.
Plex Media Server Logs_2024-09-23_11-27-27.zip (287.8 KB)

Steps to reproduce test #2:

  1. Begin playing 4k HDR video in the web client. Everything is playing fine.
  2. Manually switch the resolution to 720p 4mbps. In nvidia-smi, I briefly see a process running on the GPU, but it immediately ends.
  3. Client does not play the video. Transcoding has failed.

Here is a bit more info from me as well:

  • Nvidia Driver v550.107.02
  • PMS Version 1.41.0.8992
  • GPU 1660Super
  • Client: Server Web App / Plex App MS Store

Docker Compose

docker run
  -d
  --name='Plex-Media-Server'
  --net='host'
  --pids-limit 2048
  -e TZ="Europe/Athens"
  -e HOST_OS="Unraid"
  -e HOST_HOSTNAME="storm"
  -e HOST_CONTAINERNAME="Plex-Media-Server"
  -e 'PLEX_CLAIM'=
  -e 'PLEX_UID'='99'
  -e 'PLEX_GID'='100'
  -e 'VERSION'='latest'
  -l net.unraid.docker.managed=dockerman
  -l net.unraid.docker.webui='http://[IP]:[PORT:32400]/web'
  -l net.unraid.docker.icon='https://raw.githubusercontent.com/plexinc/pms-docker/master/img/plex-server.png'
  -v '/dev/shm/':'/transcode':'rw'
  -v '/mnt/user/data/media/demos/':'/data':'rw'
  -v '/mnt/cache/appdata/Plex-Media-Server':'/config':'rw'
  --runtime=nvidia
  -e NVIDIA_VISIBLE_DEVICES=all
  -e NVIDIA_DRIVER_CAPABILITIES=all 'plexinc/pms-docker'

Screen Capture of the Issue

Plex Media Server Logs_2024-09-23_14-58-57.zip (1.0 MB)
unraid-diagnostics-20240923-1459.zip (131.1 KB)

I spun up a clean docker with the official image and played a couple of demos @ChuckPa have posted on the forums.

Transcoder seems to work on automatic quality at the start of the playback, but on quality change something is crashing. If I disable hw transcode, transcoding on the CPU works fine.

@spiceygas sorry for hijacking but I believe we are having the exact same issue.

@aLaskaratos @spiceygas

I’ve moved us here because this doesn’t seem the same.

That thread has HW then fails.

This thread can’t see the GPU when the transcoder is invoked.

Sep 23, 2024 11:25:30.192 [140158287055672] DEBUG - [Req#3c2/Transcode] TPU: hardware transcoding: enabled, but no hardware decode accelerator found
Sep 23, 2024 11:25:30.192 [140158287055672] DEBUG - [Req#3c2/Transcode] [Universal] Using local file path instead of URL: /media/Television/The Witcher/Season 02/The Witcher S02E01 - 2160p HDR.mkv
Sep 23, 2024 11:25:30.192 [140158251395896] DEBUG - [Req#3c2/Transcode] Cleaning directory for session jsn6udd61ypwefo41ubqladu (/var/lib/plexmediaserver/Library/Application Support/Plex Media Server/Cache/Transcode/Sessions/plex-transcode-jsn6udd61ypwefo41ubqladu-d4902ba0-cd40-4add-a1f9-7b972db1bb55)
Sep 23, 2024 11:25:30.192 [140158287055672] DEBUG - [Req#3c2/Transcode] TPU: hardware transcoding: final decoder: , final encoder: 
Sep 23, 2024 11:25:30.194 [140158287055672] DEBUG - [Req#3c2/Transcode/JobRunner] Job running: EAE_ROOT=/tmp/pms-404db63f-470d-4acc-942b-8974e386bad6/EasyAudioEncoder FFMPEG_EXTERNAL_LIBS='/var/lib/plexmediaserver/Library/Application\ Support/Plex\ Media\ Server/Codecs/7592546-570471557d92948f58893deb-linux-x86_64/' X_PLEX_TOKEN=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx "/usr/lib/plexmediaserver/Plex Transcoder" -codec:0 hevc -codec:1 eac3_eae -eae_prefix:1 af9yhz0ca669r4y4f3en833u_ -ss 0 -noaccurate_seek -analyzeduration 20000000 -probesize 20000000 -i "/media/Television/The Witcher/Season 02/The Witcher S02E01 - 2160p HDR.mkv" -map 0:0 -codec:0 copy -filter_complex "[0:1] aresample=async=1:ochl='stereo':rematrix_maxval=60.000000dB:osr=48000[0]" -map "[0]" -metadata:s:1 language=eng -codec:1 aac -b:1 256k -f dash -seg_duration 5 -dash_segment_type mp4 -init_seg_name 'init-stream$RepresentationID$.m4s' -media_seg_name 'chunk-stream$RepresentationID$-$Number%05d$.m4s' -window_size 5 -delete_removed false -skip_to_segment 1 -time_delta 0.0625 -manifest_name "http://127.0.0.1:32400/video/:/transcode/session/af9yhz0ca669r4y4f3en833u/90c38704-3070-4f8a-b50f-d16a7978da88/manifest?X-Plex-Http-Pipeline=infinite" -avoid_negative_ts disabled -map_metadata -1 -map_chapters -1 dash -start_at_zero -copyts -vsync cfr -y -nostats -loglevel quiet -loglevel_plex error -progressurl http://127.0.0.1:32400/video/:/transcode/session/af9yhz0ca669r4y4f3en833u/90c38704-3070-4f8a-b50f-d16a7978da88/progress
Sep 23, 2024 11:25:30.194 [140158287055672] DEBUG - [Req#3c2/Transcode/JobRunner] In directory: "/var/lib/plexmediaserver/Library/Application Support/Plex Media Server/Cache/Transcode/Sessions/plex-transcode-af9yhz0ca669r4y4f3en833u-90c38704-3070-4f8a-b50f-d16a7978da88"

One thing I see missing --device=/dev/dri:/dev/dri

The runtimes have been passed and all made visibile Except for the inode.
IIRC, Isn’t this also required on Unraid as it is on any other Docker installation?
(I have to do it for my LXC containers as well even with passing the runtimes)

I think we might have mixed up the posts a bit, the logs you have posted are from @spiceygas , he is running as far as I am aware native on linux. You moved a good chunk of his post over here.

As far as my setup goes on docker/unraid, I don’t believe you need to add dev/dri with runtime-nvidia. I have the mappings even without it. And if I add it still same issue.

@aLaskaratos

May I have your server debug logs zip please

  1. Restart
  2. Attempt the playback
  3. Stop when failure obvious
  4. Download logs
  5. Attach here

I have attached them together with a video of the issue on my previous post, need anything more?

video not available

my bad, fixed it now

Thanks.

I see part of the problem (well known problem)

You can’t change the video resolution (especially lower) after playback starts.

Plex/web does NOT handle the transition

The spinning circle you see is Plex/web waiting for more data at the OLD ID number.

What’s happening in reality is the Transcoder has changed over and ready to give it the NEW ID for the new resolution.

Sometimes this works, most times it will not.

I’ve written several trouble tickets about it – All to be closed (expired) to the sound of crickets

The apps work fine but Plex/Web does not.

The ONLY, and not really viable, workaround is to preset the quality low then change up after it starts.

Would you try that ?

1 Like

I can not seem to find an option for setting the quality before the playback starts, but upon further investigation, on the android and LG webos app the transcoding seems to be fine.

I was aware of the buggy web client, but I have the same issues with the windows apps, MS store and Plex desktop, they throw random errors when I ask the server to transcode (4294967283, 4294967279 and others).

I was suspecting that somewhere the new stream was not getting passed or picked up by the client correctly, what is beyond my understanding is why when I rollback to 470 drivers the playback is perfectly fine across all players and devices.

I tried rolling back multiple versions of pms till 1.28 but had the same issue across all of them.

Once you’ve crossed the 140.x barrier, there is no going back. — risk of corruption is high

The databases are not compatible with each other (except forward migration)

Had to uncheck recommended settings for the menu to pop. You are right, transcoding works if I don’t change the quality.