Working Rocket Lake GPU with HDR the easy way both docker and bare metal!

Ok, I’ve made a discovery, I did not realize the plex media server package completely ignores the shared libs installed by linux on the system! It apparently keeps a personal copy of them in the /usr/lib/plexmediaserver/lib directory! So it doesn’t matter that I have installed the latest intel media va driver because it’s completely ignoring it and using its own personal copy. Unfortunately it is not just as simple as copying over the new one, as all the shared libs are linked to a different C/C++ version than the one that plex uses, so there is loader issues with the shared std C++ lib. The only solution I see here is that plex media server be compiled against the latest Intel media va driver and that should fix the Rocket Lake issues once and for all. The driver I am talking about is the ‘iHD_drv_video.so’ lib in the /usr/lib/plexmediaserver/lib/dri directory. It’s not compiled against the latest version code available: https://github.com/intel/media-driver

1 Like

Thanks a lot for sharing your work , I’m running plex from linuxserver.io docker container on 5.10 kernel. Until I don’t upgrade my kernel to 5.11 I suppose it is not worth to try ?

Totally missed that the package was uninstalled. I don’t know if you have a container runtime on your box or not but out of curiosity if you do can you try to run my tweaked version of the pms-docker image? I pushed it up to dockerhub last night. All of the flags to run it are the same as the offical image plexinc/pms-docker. This is the image that I’m successfully running in k8s right now. It’s built from the Dockerfile in my first post if you want to build it locally instead. I was able to get the image to work both in docker and in cri-o.

~/kubernetes/03-MediaApps/05-PlexMain/files    master !1  k describe pod -n media plexmain-59db89566d-9bpfg                                                                              ✔  scuffe@hermes  20:45:07 
Name:         plexmain-59db89566d-9bpfg
Namespace:    media
Priority:     0
Node:         hermes/10.0.0.3
Start Time:   Wed, 09 Jun 2021 07:17:57 +0000
Labels:       app=plexmain
              pod-template-hash=59db89566d
Annotations:  cni.projectcalico.org/podIP: 192.168.137.90/32
              cni.projectcalico.org/podIPs: 192.168.137.90/32
Status:       Running
IP:           192.168.137.90
IPs:
  IP:           192.168.137.90
Controlled By:  ReplicaSet/plexmain-59db89566d
Containers:
  plexmain:
    Container ID:   cri-o://aafb9a90f8e170ad1a1925bee753d270c7a652a9cc2d3d8448d661c1a4f7332d
    Image:          scuffe82/pms-docker-21.04:latest
    Image ID:       docker.io/scuffe82/pms-docker-21.04@sha256:2e7c307813123787af82f5482fd929b53d923e82d9c6e2cbcfa0664289d59125
    Ports:          32400/TCP, 3005/TCP, 8324/TCP, 32469/TCP, 1900/TCP, 32410/TCP, 32412/TCP, 32413/TCP, 32414/TCP
    Host Ports:     0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP
    State:          Running
      Started:      Wed, 09 Jun 2021 07:18:04 +0000
    Ready:          True
    Restart Count:  0
    Limits:
      gpu.intel.com/i915:  1
    Requests:
      gpu.intel.com/i915:  1

scuffe82/pms-docker-21.04

Upgrading the kernel is worth a try for sure. I wasn’t able to get it working when i tried to upgrade to 5.12, but that one also requires more changes to the system to get it working.

Hi there, I looked at your question again and have a more proper answer for you now. After setting the i915 options to enable_guc=2 it does load but still complains about it in dmesg.

root@hermes:~# cat /etc/modprobe.d/i915.conf
options i915 enable_guc=2

root@hermes:~# dmesg | grep -iE "huc|guc|dmc"
[    0.915465] Setting dangerous option enable_guc - tainting kernel
[    0.919342] i915 0000:00:02.0: [drm] Finished loading DMC firmware i915/rkl_dmc_ver2_02.bin (v2.2)
[    0.940280] i915 0000:00:02.0: [drm] GuC firmware i915/tgl_guc_49.0.1.bin version 49.0 submission:disabled
[    0.940284] i915 0000:00:02.0: [drm] HuC firmware i915/tgl_huc_7.5.0.bin version 7.5 authenticated:yes
root@hermes:~#
root@hermes:~# systool -m i915 -av | grep -iE "huc|guc|dmc"
    dmc_firmware_path   = "(null)"
    enable_guc          = "2"
    guc_firmware_path   = "(null)"
    guc_log_level       = "-1"
    huc_firmware_path   = "(null)"

So it looks like i can force it enabled but the default is still disabled out of the box.

It seems loaded for me, I have the same dmesg messages about your kernel being tainted, so you’re “good”.
Check for instabilities and look if you have less CPU load as the GPU will do more stuff now.

I’m not familiar with Docker at all, perhaps I need to learn about it. I generally do not like to virtualize things that rely on things like GPUs, Accelerator hardware, and such. I’ve only used Hyper-V and ESXi and my current big server is a 24-core ESXi machine.

Yes I have all this enabled. I think the fact I have a UHD 750 vs a UHD 730 is why its still not working for me.

jerry@mooncake:/usr/lib/plexmediaserver$ sudo cat /sys/kernel/debug/dri/0/gt/uc/guc_info
GuC firmware: i915/tgl_guc_49.0.1.bin
	status: RUNNING
	version: wanted 49.0, found 49.0
	uCode: 321408 bytes
	RSA: 256 bytes

GuC status 0x800330ec:
	Bootrom status = 0x76
	uKernel status = 0x30
	MIA Core status = 0x3

Scratch registers:
	 0: 	0x0
	 1: 	0x305fd3
	 2: 	0x0
	 3: 	0x4000
	 4: 	0x40
	 5: 	0x630
	 6: 	0x0
	 7: 	0x0
	 8: 	0x0
	 9: 	0x0
	10: 	0x0
	11: 	0x0
	12: 	0x0
	13: 	0x0
	14: 	0x0
	15: 	0x0

GuC log relay not created
jerry@mooncake:/usr/lib/plexmediaserver$ sudo cat /sys/kernel/debug/dri/0/gt/uc/huc_info
HuC firmware: i915/tgl_huc_7.5.0.bin
	status: RUNNING
	version: wanted 7.5, found 7.5
	uCode: 580352 bytes
	RSA: 256 bytes
HuC status: 0x00090001

It’s fairly easy to pick up. For a lot of things its way easier than a VM. I worked at vmware for 7 years and have been down all the VM roads for stuff. One of my old coworkers wrote a blog specifically about passing through the iGPU in esxi if you want to try it.

Passthrough of Integrated GPU (iGPU) for standard Intel NUC (williamlam.com)

I will add a comment that I’ve realized yesterday about this topic.

The “green artifacts” that appears while using HW transcoding, it’s ONLY happening, when Plex transcode to a higher quality than the original. If I choose same quality or below, it’s working perfect.

Just FYI :slight_smile:

PS: btw, I don’t understand what would be the point for letting the user transcode to a higher quality than the source…but well… it’s there…

1 Like

For me the image was pixilated and blocky if HDR and hardware transcode were enabled. It didn’t matter what i set the transcode settings to, but the new kernel seems to have it figured out.

I’m trying to understand what is exactly breaking RocketLake iGPUs hw transcoding.

I got this error message when trying to hw transcode a stream

Jun 15, 2021 18:45:54.366 [0x7f606171cb38] ERROR - [Transcoder] [AVHWDeviceContext @ 0x7fb97cf969c0] No matching devices found.

also the kernel messages

[ 2746.933016] i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:8ed9fff2, in Plex Transcoder [12856]
[ 2747.034316] i915 0000:00:02.0: [drm] Resetting rcs0 for stopped heartbeat on rcs0
[ 2747.034361] i915 0000:00:02.0: [drm] Plex Transcoder[12856] context reset due to GPU hang

Sysinfo

- Intel i5-11600K
- no dGPU
- Ubuntu 20.04 LTS with oem kernel (5.10.0-1029-oem)
- Using the latest `plexinc/pms-docker` docker image

Stream info

Codec HEVC
Bitrate 74817 kbps
Language English
Bit Depth 10
Chroma Subsampling 4:2:0
Coded Height 2160
Coded Width 3840
Color Primaries bt2020
Color Range tv
Color Space bt2020nc
Color Trc smpte2084
Frame Rate 23.97599983215332 fps
Height 2160
Level 5.1
Profile main 10
Ref Frames 1

When i was searching around trying to figure it out there were some posts talking about how 5.10 doesn’t fully implement everything for rocket lake processors. 5.11 is supposed to contain the missing bits to get everything working correctly. Since your running it in docker, you can try to run my modified docker image and see if it makes any difference. I went through a bunch of iterations and finally settled on the 5.11 kernel being the key. 5.11 isn’t fully backported to 20.04 yet and when i was looking at the PPA’s that ubuntu publishes it seems that they are the bare bones kernels for testing and don’t have everything else compiled in.

scuffe82/pms-docker-21.04

The dockerfile used to build it is posted above, it takes all the same flags that the official images used. I didn’t test using it on a 20.04 system with the 5.10 kernel but it may work? Running it on 21.04 until the 5.11 kernel is fully backported was the only way i got it working. Or if you disable HRD tone mapping that gets around it too but anything in HDR looks bad.

Your image works. So 20.04 + 5.10.0-1029-oem + your 21.04 based image is the setup to go.

Turning off HDR tone mapping in the 20.04 image also saves it from crashing, but the video quality absolutely suffered.

So I guess the GPU HANG kernel error must not be the defects in 5.10.

Also tried a newer mainline kernel (5.13-rc5). Interestingly hw transcoding doesn’t start at all.

I want to comment just briefly that your recommendation seems to hold true for TGL on my NUC11TNHv50L (i5-1145G7) as well. Quickly checked with Ubuntu 21.04 and newest compute runtimes but the PMS you recommended. Dashboard indicated hw usage both w/ and w/o tone mapping enabled without considerable CPU toll. Will check back once intro detections for the test mount have settled.
Edit: Probably spoke too soon. 4k → 1080p (20mbps) doesn’t stress the CPU at all but w/ tone mapping one stream it ramps up to 40%. Obviously offloading to hw is still shaky.

@ChuckPa what do you think about this?

What do I think?

May I offer you some :adhesive_bandage:s ?

The ‘bleeding edge’ stuff is precarious at best even though this is not all that ‘bleeding edge’ tech. New processor and ASIC without proper kernel & i915 support – that’s what hurts.

If you have the desire, skill, or just plain stubbornness to keep going, go at it.

I am dealing with my fair share of :crocodile: here

(My NAS is sick. only giving 450 MB/sec when 1.2-1.5 GB/sec is normal)

1 Like

So a change in behavior. After the latest update i’m now getting green/red/gray/blue artifacts and static on the screen. I’m going to update the neo packages to the latest release and rebuild the container to see if it makes any difference.

I updated the neo runtimes on the physical host and also rebuild the container with the latest set. Its better now but I’m still seeing some artifacts and sometimes part of the picture will look blurry for a second then snap into focus. The latest image is pushed to dockerhub if anyone wants to pull it down.

Updated the physical hosts overall packages and now its back to how it was before i updated the container. I’ll poke at this more later and see if i can figure anything out.