Hardware transcoding issue

I am also trying to get PMS to run in a rack mounted server, I have a Dell R430 with dual xeon cpu’s.
PMS is working perfectly when I don’t enable hardware transcoding, I have no issues changing quality in chrome, safari or the Mac Plex app.
I then enable hardware transcoding in the settings, to use the P400 and it will still directly play, but when I change the quality to 720, the stream stops dead and nothing is passed to my gpu, I can see from Nvidia-smi that nothing is being passed to the gpu.

Sadly I don’t have access to any AMD cpu’s as I just have that R430 server.
Did using NUMA help with the xeon’s or am I stuck at this point for the foreseeable future?

I did consider swapping to Jellyfin but I’ve been using Plex for years and all my notifications and integrations are based around PMS but it’s interesting that it works out of the box.

Did you try using LXC containers on Proxmox rather than using a full fat ubuntu VM, I could be persuaded to use this method as recommended by ChuckPA but I dont have a decent guide or the knowledge how to do that at the min.

During my testing I did try running PMS on many bare metal dual socket Xeon servers. And had nothing but issues with NVIDIA hardware transcoding. I did not realize at the time it could have anything to do with the dual socket & NUMA.

I have not tested anything on LXC. I would say it should work very similar to bare metal. And thus not work on a dual socket systems.

If you are running PMS on bare metal. Then probably the easiest way to test it out is to just remove the secondary Xeon CPU. I am very curious if it works on a Xeon server after that, but way too lazy to check now after moving GPU’s between servers, reinstalling OS and testing continuously literally for weeks.

Im using proxmox 7 and have configured pcie passthrough to give the Ubuntu vm exclusive access to the nvidia p400. I will try and give it a single cpu using the numa setup and will report back.

Ok. Here are some tips.

I am assuming E5-2630 v3 which is 8C/16T. So allocate 16 cores and 16GB memory for example.

There is a package hwloc that has the command lstopo. It’s helpful to see on which NUMA node the GPU is on.

If you see GPU is on the first node. You can try
cores: 16
cpu: host
memory: 16384
numa: 1
numa0: cpus=0-15,hostnodes=0,memory=16384,policy=bind

If GPU is on second node. use instead for the last line:
numa0: cpus=0-15,hostnodes=1,memory=16384,policy=bind

1 Like

Thank you for the tips.
I did install hwloc and used the lstopo tool. The GPU was on the first CPU so I used the following code at the bottom of my config file:
numa: 1
numa0: cpus=0-3,hostnodes=0,memory=4096,policy=bind
Sadly, I’m having the same issue, change the quality and the stream stops dead.

Interestingly, I did see the gpu transcoding without the above changes, when I clicked on “Convert Automatically”


So I know it does work with the GPU and the dual Xeon processors when I convert automatically but nothing on any other quality.
I also tried jellyfin and it worked out of the box, so it certainly is something that PMS doesn’t play nicely with.

1 Like

I think “numa” is a red-herring here.

Think about the kernel. SMP == SYMMETRIC multiprocessing.

Does the OS show 2 cpus with dicrete cores -or- does it show the total number ?
( It shows the total ).

The only thing you need balance for it to work (IIRC) is the same amount of physical memory adjacent each CPU for bus balance.

Regaring the resolution:

  1. I started at original (automatic)
  2. I downgraded quality – PERFECT
  3. I upgraded back to 2160p
    – sometimes it was flawless ; like when I was trying to make it fail.
    – other times it would randomly fail leaving the transcoder spinning at 233% CPU

There’s a weird bug in there somewhere.

As I stated in another thread, my goal this weekend is to find and reliably replicate it so it can get fixed.

As for “out of the box” comparison – I don’t know if that’s entirely fair because of the “Dolby” licensing Plex has to work within. Be that as it may, YES, PMS should work “out of the box” too. That’s what I’m working on now.

Reference:

Hey chuck,

Any news from my side?

Hi there,

I’ll try that asap because I run my VM on a PowerEdge with 2 CPUs too.
I run ESXi but there is also options to limit it to a NUMA mode.

I’ll keep you all informed, thanks a lot for that return of experience! :wink::ok_hand:

1 Like

Hey Franck

Turns out @ChuckPa was right. The real reason why things were working better, had nothing to do with NUMA.

Previously when I was tinkering with various options in the BIOS. I disabled C-states, among other power saving features to increase performance. As expected this causes the CPU’s to draw ton of power with accompanying increase in temperature.

It also caused NUMA/CPU 0 to stay in all core turbo boost ~3GHz. While NUMA/CPU 1 went into all core thermal throttling ~1.75GHz. The fans were only properly cooling the front CPU closer to the fans and not the rear one.

After changing some fan parameters both CPU’s are getting proper cooling, C-states are still disabled and all cores on BOTH cpu are ~3GHz and power draw is a bit insane, transcoding is now working pretty reliable on both NUMA nodes. Transcodes start fast & switching quality in chrome works almost always.

But it seems when I enable all C-states and other energy saving measures. The CPU cores hover at ~2GHz and transcoding is working intermittently. Seems PMS is not getting much benefits from the regular core turbo boost. Switching quality in chrome works some of the time. Was worse still when running on the thermal capped CPU.

I tried a new thing. Enable all C-states & power saving measures in BIOS. Assign only 8 cores to Guest VM. And added idle=poll to the kernel command line in /etc/default/grub on VM.

This causes only the CPU cores the VM is using to never sleep and get perma turbo boosted. PMS is transcoding even better now than previously with this setup. Takes maybe 0.1-0.2 sec less time to start transcode playback. And even harder to break when switching quality in chrome.

It seems to me, that PMS requires quite a bit of single core performance when doing the probing or looping VAAPI/NVENC/NVDEC stuff. And there is some kind of timing issue, if things don’t finish fast enough.

This does not quite explain why it was working so badly on the Xeon. As I was testing on Xeon’s that should turbo up to 3.6Ghz. And did not get it working no matter with c-states disabled and whatnot.

But this would explain why it works much better on all the desktop CPUs I have tested, since they all have base clock of at least 3.6Ghz and go upwards of 5Ghz with turbo, which is quite a bit higher than regular Xeon’s.

So yeah, I’m pretty certain PMS is very sensitive to good single core performance when starting NVIDIA hw transcodes.

I’m going to say one last thing about “VMs” and then drop it.

  1. This a non-OFFICIALLY-supported distro. – but –
  2. Docker is supported
  3. LXC is supported.
  4. Using a VM (which abstracts the HARDWARE) makes no sense to me when there’s a full OS distro underneath it.

I recommend invesigating

  • Create a docker container on the host OS with PMS in it.
    – In that container will be a Ubuntu runtime environment with PMS in it.

VMs (hypervisors) do strange things when they grab resources. The hypervisor might be the root cause

Well, a distro is an OS, the OS I use is supported… Non supported hardware is a complete other story IMHO even if I see your point, there is no mention of unsupported VM configuration on website I think.

In my case, I just can’t because I have no extra hardware where I could install something to support containers. And running containers “directly” on ESXi is quite heavy since VMware stopped VMware vSphere Integrated Containers support.

IMHO (again), we are in 2023, Plex should start “supporting” those setups because it’s pretty much “standard” today. :wink:

3 Likes

I don’t use ESXi, only Proxmox. But I can confidently say that all issues I have experienced with PMS. Are exactly the same whether running inside Proxmox Guest VM or bare metal. Also tested with docker both on bare metal, and on the VM. No difference in any of the setups.

I have tested this very thoroughly.

2 Likes

I have access to a spare server which I’m more than happy to throw a bare metal install of ubuntu on and run a Nvidia gpu with hardware transcoding enabled. Again it will have dual Xeons, if this eliminates the issue of running it on a hypervisor then I’m happy to help.
The advantage for running a full VM for me, is that I dont have to install all of the Nvidia drivers on the underlying host in order to pass them to a LXC. The idea was to give the full installed VM full exclusive access to the GPU and run it isolated that way.
I didnt realise this wasn’t supported, I also had seen others with the setup with no issues, so I dont know what is different for myself and @Ossalingur.
Seems to be a a combo between having hardware transcoding enabled, Nvidia GPU and dual CPU’s.

2 Likes

Hey Chuck,

No news about my docker issue?

@Vicerak

Digging myself out of a hole right now.

I just finished writing up another one.

Let me grab a bite to eat and then i’ll get on the debugger in the real host and continue

I did make some progress.

ALSO, not sure if you saw, Over the weekend, I caught the elusive “transcoder crashes when changing resolutions” problem.

I presented it in today’s meeting.

While I got some initial pushback (valid) to make sure I captured correctly.
We know it’s Plex/web which is failing.

1 Like

I use Plex in a VM since the beginning, I even migrated it from Windows to Linux for better performances. VM allors me to use the same hardware but most important: the same backup solution.

GPU came way later and was working fine for a while until the “infamous” update. :smile:

2 Likes

@Franck_Ehret

Which update? :rofl:

I got to the bottom of the quality change in the browser – it IS a browser bug… BIGTIME.

I wrote it up and requested priority fix.

1 Like

One update, once upon a time before 1.30…
(this thread is too long now for review :rofl:)

More seriously, I can’t judge which update broke transcoding on my server, I’m not using it everyday for video in browser to know.

Thank you, Chuck.

So, you were able to get the errors from your side?