SR-IOV & Plex

I have Plex running on Docker in a Ubuntu 24.04 VM on proxmox. I am passing through the vGPU to the VM via SR-IOV (so I can share with other VMs on the host) and it transcodes fine, but tone mapping corrupts the image (as you have discussed in thread here.
Will the new kernels fix SR-IOV intel vGPU issue with Tone Mapping?

We have no way of knowing what the kernel will do.

Kernels.org → Ubuntu/Debian/Redhat/Fedora (distro provider) → Public.

We are downstream of that.

I don’t undrestand why you’re doing SR-IOV unless your VMs / containers are not setup for shared permissions.

I have multiple LXC containers, each of which has the Nvidia and the Intel as physical adapters added.

Nothing is needed. Each PMS transcoding task is a process which connects to the GPU and requests services.

All the necessary kernel support/drivers are already on the host.
Each client has either (iGPU drivers for Intel or Nvidia gpu client drivers / no-kernel portion) installed in that container / vm. This is why it’s not needed.

FYI – Docker is wrapping cellophane around it. It’s unnecessary complexity.
The Repo updates PMS just as easily as a docker will.

Additionally, A great deal of work has been done since 1.32.6. After we got our footing (the staff reduction), we’ve made huge strides in fixing and advancing the transcoder.

Thank you for the response!
I’m using SR-IOV as a way to share the iGPU with multiple VMs in proxmox - I need this for reasons I wont go int here (otherwise I’d just pass the whole GPU to the VM and solve this issue). I dont want to run it in an LXC on the host as I miss some of the HA/Replication/backup/Snapshot ability I have with proxmox in the cluster I have set up in the DC.

It would be great with SR-IOV support - but please tell me if this is something that is just not going to happen because of blocking challenges with the transcoder/PMS or another insight you may have.
Thank you!

We will be setting up a ProxMox box with multiple VMs over the next few days.
We’ll have all the supported distros and versions on it - each in its own VM - with the Nvidia and iGPU shared

As for what the kernel will / will not support (SR-IOV), and what can be passed through to a VM – This is entirely in Proxmox’s hands. They are the kernel provider for you. They provide all the VM definition and control applications for you.

While we’re Proxmox friendly, we can’t fully support it. We will understand it a lot better soon but don’t know if/when we’ll have corporate level agreements in place to declare ‘formal support’.

Hope that makes sense.

Thanks for offering to test this out. I also run a proxmox setup with an Nvidia vGPU split up. Tesla T4 split up into a windows gaming vm for some light gaming uses, and the other half into the Plex Ubuntu vm. I know we are edge cases, but thanks for working through/looking at our quirks.

I just put the RTX 2000 (half height card) in my new NUC12 (DCMi9) Dragon Canyon. It’s already running flawlessly across multiple LXCs as well as on the main host without any special handling.

All I did was run this little script

[chuck@lizum ~.1999]$ cat lxc/add-gpu 

#
# Argument #1 is LXC container to add
[ -z "$1" ] && "Error: missing container name" && exit 1

if [ "$(lxc list | grep "$1")" == "" ]; then
  echo "Error: Unknown container name '$1'"
  exit 2
fi

# Make certain /dev/dri/renderD128 exists
if [ ! -e /dev/dri/renderD128 ]; then
  echo Error:  This host does not have hardware transcoding ability /dev/dri/renderD128
  exit 3
fi

# Get gid number of group which owns the GPU on the host.
Gid="$(stat -c %g /dev/dri/renderD128)"

# Add it (pass it) into the container,  applying the gid=xxxx property
lxc config device add "$1" gpu gpu gid=$Gid

This is passed as a physical GPU.
This is why I assert, at least on ubuntu & debian with LXC, SR-IOV isn’t needed, because I can demonstrate it

Warning: You will get hung up in the container – when you install Plex –

  1. Adding the GPU to the container without passing the host’s owning GID
  2. If in the container, the resultant owning GID of /dev/dri/renderD128 is ‘root’
  3. PMS WILL NOT add user plex to the root group. (Security violation)
  4. The installer does not know it’s in a container and won’t violate opening up the whole machine by putting PMS in ‘root’
    – You’ll have to do this manually because only you know it’s safe to do so.

Hi @ChuckPa
Firstly, as @rlobbins called out, thank you for looking into this for us. We are certainly an edge case but we appreciate it none the less.

Proxmox facilitates SR-IOV as you know. Its the guest VM that has issues as it can leverage the iGPU just fine for transcoding, but with Tone Mapping enabled the image becomes corrupted, as apparently it uses a different function of the iGPU that isn’t passed through to the VM with SR-IOV.

I am not sure if you have the same problem @rlobbins - perhaps with NVIDIA there is a different architecture that avoids this.

Is there the possibility for in cases like ours that we use the iGPU for transcoding and the CPU for tone mapping?

First,
You don’t want to try and do tonemapping on the CPU. No matter what you do, it won’t be fast enough. You need hardware acceleration to run the calculations at-speed.

Regarding the SR-IOV, if Proxmox has the capability, you might already have what you need.

Tonemapping is done using the ‘cardx’ interface. Transcoding is done using the ‘renderDxxx’ interface

If you look in /dev/dri/by-path, you can see how it’s paired together.

[chuck@lizum ~.1998]$ ls -la /dev/dri/by-path/
total 0
drwxr-xr-x 2 root root 140 Aug 23 21:53 ./
drwxr-xr-x 3 root root 160 Aug 23 21:53 ../
lrwxrwxrwx 1 root root   8 Aug 23 21:53 pci-0000:00:02.0-card -> ../card1
lrwxrwxrwx 1 root root  13 Aug 23 21:53 pci-0000:00:02.0-render -> ../renderD128
lrwxrwxrwx 1 root root   8 Aug 23 21:53 pci-0000:01:00.0-card -> ../card2
lrwxrwxrwx 1 root root  13 Aug 23 21:53 pci-0000:01:00.0-render -> ../renderD129
lrwxrwxrwx 1 root root   8 Aug 23 21:53 platform-simple-framebuffer.0-card -> ../card0
[chuck@lizum ~.1999]$

The ‘platform-simple-framebuffer’ is the result of the new ‘simpledrm’ driver in the 6.8 kernel. It’s part of preparing for the complete switch from Xorg → XWayland and the removal of Xorg.

This work is being done to make Linux more ‘mainstream friendly’ (gaming).
Linux is quick but Xorg is 40+ year old based architecture. It does everything… but not as fast as needed by gaming.

When you do your passthrough, are you passing both the ‘card’ and the matching ‘render’ nodes? If not then you know why tonemapping can’t work.

While I don’t know if it will work in an SR-IOV environment, my Ubuntu machine is capable of sharing the GPUs I have even when shared as physical adapters (no virtualization or abstraction). Everything is passed the whole, raw, card and they all share it without issue. This is why I ask why SR-IOV is needed because if I can share the GPUs (iGPU and Nvidia) without special handling then Why is special handling/abstraction needed? (much like why I ask about running PMS in docker instead of on the native host – there’s no logic to PMS in Docker (host network mode) on Ubuntu.

Thanks for the response.

When running the command above in the VM that I am passing the GPU through to, when using SR-IOV, I get the following output:

root@docker-vm-media:/home/mark# ls -la /dev/dri/by-path/
total 0
drwxr-xr-x 2 root root 100 Aug 24 12:38 .
drwxr-xr-x 3 root root 120 Aug 24 12:38 ..
lrwxrwxrwx 1 root root   8 Aug 24 12:38 pci-0000:00:01.0-card -> ../card0
lrwxrwxrwx 1 root root   8 Aug 24 12:38 pci-0000:06:10.0-card -> ../card1
lrwxrwxrwx 1 root root  13 Aug 24 12:38 pci-0000:06:10.0-render -> ../renderD128

So it looks like it’s passing both card and render interfaces. To be clear, transcoding works perfectly, no issues. Whenever the device has to tone map, then it will stream but with a corrupt image

When I pass the GPU directly to the VM, or run in an LXC, it works fine. Unfortunately in my environment I can’t run it like that permanently and I need to have it via SR-IOV.

Any throughs on why the corruption is happening when I stream with Tone Mapping?

My only thought about why you’re getting image corruption is the SR-IOV driver in your kernel.

Might you be able to give me an example of an application which requires SR-IOV to use the GPU?

I’ll do some research if I can and see what it’s looking for then cross with what base GPU sharing provides.

I suspect the same.

The other VMs on the host require iGPU for various rendering and encoding applications. The other reason is that by passing through the GPU, the host video output is then not accessible, causing issues with KVM/VNC.

I suspect the image corruption is related to this

I will try to run a new VM based on 22.04 and see if that fixes the issue, it might sound like it here

Hi Chuck

Might be worth reading

https://pve.proxmox.com/wiki/PCI(e)_Passthrough

I’m not an expert, but as you’ll know (or soon find out), Proxmox is a fancy front end to a bundle of LXC (containers) and QEMU/KVM (full virtual machines). I run Plex in an LXC which is very easy to set up (it does all the permissions stuff you mentioned in the GUI)

In Proxmox (QEMU/KVM), you can only pass the GPU (PCI device) through to one VM and at that point it becomes unavailable to both the Debian-based Proxmox host, all LXC’s and other VM’s.

The trick to sharing the GPU with multiple QEMU/KVM-based VM’s is to use SR-IOV. The SR-IOV drivers are community-developed rather than being delivered by Proxmox, though once installed the Proxmox docs do lead you through how to use the virtualized resource. Here’s a guide for setting up …

https://www.michaelstinkerings.org/gpu-virtualization-with-intel-12th-gen-igpu-uhd-730/

I had a look at all this when I was setting up Proxmox last year, but it proved to be so easy to set it up using an LXC that I backed it out and will await Proxmox formally supporting the feature. It was cool to have the vGPU available to any and all resource though.

So with Proxmox, my understanding is that if you have two QEMU/KVM-based VM’s requiring access to the GPU, there is no alternative to SR-IOV.

Important question:

Are you using the KVM hypervisor as the foundation?

Important question #2 :sunglasses:

Do you want/need the VM so you can play games on Linux in Windows VM ? :wink:

Above discussion moved here

@mvn @kevin_marchant @rlobbins

I’ve moved the SR-IOV related posts here as it’s not 24.04 HW transcoding.

Not sure if your question was for me or mvn and apologies for butting in. Plus I got QEMU and KVM backwards. In a Proxmox context it seems you can use them interchangeably.

https://pve.proxmox.com/wiki/Qemu/KVM_Virtual_Machines

You asked what the use case was needing SR-IOV and I was just trying to suggest that in my case I was trying full VM’s for Plex and for a Windows VM that does some h/w accelerated graphics conversion etc and still wanted console output from the host OS. In Proxmox you can only attach the GPU to one of these three so I abandoned it - wasn’t willing to rely on SSH for console.

I’ll back out of the debate gracefully now. Though SRVIO does appear to be a rising star for vGPU so will read with interest.

I don’t mind you here… The more :crazy_face: , the better :joy:

I’m trying to understand how this is being constructed because I known QEMU based machines are EMULATED hardware. They are not real. This is the same as VMware unless you pass through the specific hardware (enterprise licenses)

With KVM, you’re getting the real host hardware but abstracted into a new namespace (just like LXC except it now virtualizes one level lower than the LXC which allows you to run different kernels.
– LXC is a different OS on the same kernel
– KVMis a different OS + kernel on the same hardware.

definitely dancing on the razorblade LOL

1 Like

I’m still fairly new to the vGPU setup, and haven’t had a ton of issues. I was mostly just commenting as a thanks for setting up a setup to try to formally support something like this. I feel like I’ve had times where it just seems like the transcode process fails to start and I’ll have to reboot the Vm for things to start working again. I’ve tried to dig into the logs to figure out what is causing the transcode to fail but have never been able to tie down any exit messages.

Likely just a driver/kernel bug with things, still learning. I know there are a bunch of changes coming with kernel 6.8 whenever that gets pushed to Proxmox along with the Nvidia 550.90.07 drivers extending some of the vGPU functionality, like mixed size vGPU allocations.

The only main difference I’ve noticed compared to when I was originally just using an lxc container and passing in the entire gpu before doing the vGPU passed into a VM was that it seemed like transcodes started a lot quicker with out the vGPU virtualization layer. A couple seconds on the lxc container, with the vGPU it’s usually 5-10 seconds before things kick off and get going.

Proxmox thread about newer nvidia drivers and kernel things: vGPU with nVIDIA on Kernel 6.8 | Proxmox Support Forum

This is a use case, however there are other use cases as well.

For example, Blue Iris which is an NVR software for CCTV only runs in Windows and can make use of the GPU to do AI processing for smart detection and processing. I’ve seen many users who run it in a VM on a host that is also running Plex in an Ubuntu or other Linux VM.

As noted above, a split GPU is also necessary if access to the host’s physical console is required while the vGPU is assigned to a VM.

The essential point, independent of the application(s) being run, is that there is another OS

What matters to Plex are two things:

  1. Does the virtual GPU have all the capabilies of the real, physical, GPU ?
  2. Does the vGPU have a ‘renderDxxx’ node ? (transcode)
  3. Does the vGPU have a companion ‘cardX’ node? (tone mapping - OpenCL)

Since Plex uses the Intel Compute Runtime (Intel Media Driver) and the Nvidia toolkit, if all those functions are available to the vGPU Plex has access to in the VM then there’s no reason for it not to work.

Relating to the one example shown of video tearing,
I’ve been chatting with a friend about the interlaced VC-1 issues experienced previously which required changed in the Intel Compute Runtime by a user named Carpalis

The issue we worked on was:

NOW, for the $5000 question …

Might there be an Intel driver change needed to work with SR-IOV?

Hi @ChuckPa,

I was hoping with 24.04.1 we might see a change In behaviour on this, however no such luck, I still get the tearing as per my previous screenshot. I suspect it is related to a linux driver issue as when directly passing through the device it’s fine, but SR-IOV exhibits the issue.

I dove into this a little more and I stumbled across this forum which you had been active on, it looks like SR-IOV was something that was been looked into before. Was this paused in the recent rationalisation? I wonder if this is something we can help get back onto the radar and support with testing etc.
Thank you again