Transcode dies on Desktop Client but works on Android

Server Version#: 1.41.1.9057 (Same issue on stable 1.41.0.8994)
Player Version#: 1.103.1.238

Server is hosted in a Ubuntu 20.04.6 LTS VM on Proxmox. RTX2060 Super for Transcode. Works fine using PCI passthrough.

I’ve now changed to vGPU using polloloco’s method and FastAPI-DLS license server. Confirmed licensed. I can use nomachine to login to the desktop environment, run benchmarks and I can see the processes in nvidia-smi.

On desktop clients (macbook m1 and windows 11 pc) when I try to transcode the process shows up in nvidia-smi for a moment then disappears and the client gives me the error: " An unknown error occurred (4294967283). Error code: 4294967283".

Using an android client everything works great. I can transcode at any bitrate and the process shows in nvidia-smi and never dies.

I’ve left HDR Tone Mapping disabled through my troubleshooting. I’ve tried disabling and re-enabling hardware acceleration. At this point I’m not sure what to check next. I’ve read lots of people on reddit getting this same setup working, and with it working on my Android it seems I’m close.
Plex Media Server Logs_2024-10-17_19-25-57.zip (858.8 KB)

Seems to not work in a chrome browser too.

Another strange behaviour is most times it will work if I start the stream transcoding (ie set default quality to 720p 4mbps), but if I try to change to another bitrate or resolution it kills the transcode process. If I start the stream at full resolution then change to lower resolution it will kill the transcode process every time.

HW transcoding also broken on my 10th gen I7 running ubuntu and docker. Seems to run for a few seconds on HW transcoding then switches to software. Videos stutter now. Tried switching from latest to the public build and no difference. Tried unselecting HW transcoding and reselecting it in the web interface.

I’m using vgpu so my assumption is my issue is rooted there, but I don’t want to jump to conclusions. I’m hoping someone can look at my logs and give me a more methodical way to troubleshoot this.

I found someone in the gpu unlock discord that had the same issue. But there’s many more that have it working successfully.

I’m using nvidia grid drivers provided from nvidia portal. This provide 550.90.05 for host and 550.90.07 for guest. Using nomachine to remote into gnome I can run unigen heaven for several hours at 70-80 fps. Seems like the drivers are working.

Is there any know issues with Plex and grid drivers? Or specifically 550 drivers?

so I installed jellyfin (can I say that here? lol) and turned on hardware transcoding and it works fine. I can transcode multiple streams and change resolutions and the processes show in nvidia-smi. I think that satisfies for me that the vgpu setup is correct and guest drivers are functioning correctly along with encode and decode lib’s.

Is this a bug with plex and certain versions of grid drivers?

Bumpy bump.

Hoping someone can help me with next steps here. Can someone check out my logs and see if anything stands out? I dont see anything obvious.

Is someone able to have a look at my logs and maybe point me in the right direction? Would really appreciate help with next steps to troubleshoot this.

Hoping the nightshift sees this…

So running the collowing from the cli returns no errors and I can see the process running in nvidia-smi:
ffmpeg -loglevel error -f lavfi -i color=black:s=1080x1080 -vframes 1000000 -an -c:v hevc_nvenc -f null -

So quick summary:
-Plex h/w transcode starts then stops a second later when using chrome, windows player, or mac player. Logs show “Terminated session … with reason Client stopped playback”
-It works from Android client.
-Jellyfin h/w transcodes successfully
-ffmpeg h/w transcodes from cli
-desktop and benchmarking run successfully with process showing in nvidia-smi.

Is there an issue with the ffmpeg that plex uses? Is re-installing Plex on top of the existing install good enough or should I be doing a full uninstall/re-install?

Ok, so unchecking “Use hardware-accelerated video encoding” results in success. Anybody with any ideas why decoding works but encoding fails?

@ChuckPa sorry to ping, but are you able to look at this? Been awhile since I’ve had any eyes on this one.

What’s this?

Are you using a third party’s agent ?

It appears you had it working with GPU passthrough but decided to “Fix it” ?

:thinking:

What I see in your log files is likely the root cause.

Transcoder temp directory over the LAN is a no-go

Oct 17, 2024 19:14:30.006 [139950254066488] DEBUG - Completed: [192.168.1.125:60575] 200 GET /status/sessions (10 live) #1e8 TLS GZIP 1ms 463 bytes (pipelined: 4)
Oct 17, 2024 19:14:30.008 [139950212643640] DEBUG - [Req#1c3/Transcode] Transcoder: Cleaning old transcode directory: "/var/transcode/Transcode/Sessions/plex-transcode-8outqprfteeot3rthitmd29n-148e97d8-21b1-4994-9d31-ffc21dd1b3ab"
Oct 17, 2024 19:14:30.010 [139949994703672] ERROR - [Req#1c3/Transcode] Failed to delete session directory (boost::filesystem::remove: Directory not empty [system:39]: "/var/transcode/Transcode/Sessions/plex-transcode-8outqprfteeot3rthitmd29n-148e97d8-21b1-4994-9d31-ffc21dd1b3ab")
Oct 17, 2024 19:14:30.011 [139950212643640] ERROR - [Req#1c3/Transcode] Transcoder: Failed to delete session directory (boost::filesystem::remove: Resource busy [system:16]: "/var/transcode/Transcode/Sessions/plex-transcode-8outqprfteeot3rthitmd29n-148e97d8-21b1-4994-9d31-ffc21dd1b3ab/.nfs000000000000085b00000001")

Notice the .nfs******** ?

That’s telling me you have NFS involved with your transcoder directory.

NFS/SMB do not guarantee the file locking required for transcoding (as is evident in this log)

Put your Transcoder temp directory back on a local file system directory and you’ll be ok if you keep the permissions right.

It works for Android (shield is a good one) because the shield does all the transcoding itself.

Plex Media Server Logs_2024-10-22_15-31-11.zip (2.2 MB)

Ok makes sense. I was using spining rust over nfs from my NAS VM to save wear on my SSD from transcoding. It’s worked for a few years now, but I’ve done as you said.

New Logs attached. Changed temp directory to local (didn’t know what default is so set to /tmp/transcode made sure owned by plex:plex).

No change in behaviour.

To answer your previous question, I’m using vGPU grid drivers ver 550.90.07 so the VM see’s the RTX2060 as a RTX6000. This is a configuration that’s confirmed working by other users, I know it’s not common so appreciate you pointing me in the right direction.

I have a RTX2000. I do not pull any “vGPU” naming games with it. There is no need to do so from Plex’s perspective.

The Nvidia drivers work perfectly.

Just because you’re faking the Identifiers doesn’t mean you’re going to get more out of the card. In most cases, you’re going to cause problems because the driver will think it has something it actually doesn’t.

My rule: K.I.S.S. (Keep It Simple, Stupid)

As we move forward into HEVC encoding and subtitle burning plus the new transcoder, you’ll want honesty with the hardware.

My little P2200 can do everything my RTX2000 can (except AV1 encoding).

My drivers

[chuck@lizum ~.2030]$ nvidia-smi
Tue Oct 22 18:41:41 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.107.02             Driver Version: 550.107.02     CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA RTX 2000 Ada Gene...    On  |   00000000:01:00.0  On |                  Off |
| 30%   35C    P8              8W /   70W |    2125MiB /  16380MiB |     10%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

This is the only vetted driver for the RTX2000

My recommendation to you is simple: Pass the Nvidia through as a native, physical device (no name changes)) and let the Nvidia client-side drivers in the VM do their job.

I appreciate your assistance.

I agree that KISS is the best policy, unfortunately I’ve run out of pcie slots in my home server. I’m using vgpu to split the card between 2 VM’s - A linux Plex server and a windows VM for my son. The windows VM is working flawlessly, and the linux VM works for all the gpu benchmarks I’ve tested, and plex decode, but not encode.

Are there any other logs that I can look at? Wish I could tell you how much I appreciate your help here.

One of the things we’ve found – vGPUs just do not work well with Linux and hardware transcoding.

The very best I’ve ever seen is video tearing of images; worst being complete failure.

when I use a Linux Container, which Proxmox does natively,

  1. Install the Nvidia drivers on the ProxMox host (these get all the kernel modules)

You now can Install the Nvidia Client-Only side (no kernel modules / no dkms) modules in the LXC

I use LXD/LXD and am doing this:

  1. Nvidia drivers on the host
  2. Pass the runtimes.
  nvidia.driver.capabilities: all
  nvidia.require.cuda: "true"
  nvidia.runtime: "true"
  1. Pass /dev/dri into the container as /dev/dri

Done.

There are several How-Tos via google which show this step-by-step .
They specifically show the creation of the profile needed for the container.

Now you have a container (faster than a VM) with significantly less overhead
and it can share the GPU because Proxmox has the drivers and is in control.

Alternative to consider:

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.