Is it normal for Plex Transcoder to be single threaded with HW acceleration turned on?

Server Version#: newest (public)
Player Version#: web / windows 64-bit client (newest)

My System:
ESXi 6.7u1
Intel e5-2650v2 dual CPUs
128GB ECC RAM
nVidia Quadro P2000 (latest drivers)
Windows 7 / 10 Plex Servers (tried both - the Windows 10 was a new install of both Windows & PMS)

All - I’m trying to get to the bottom of an issue with 4k HEVC transcoding performance on my server. Now, before someone chimes in that transcoding 4K HEVC is insane, I know it would be normally, but my rig should be able to accomplish what I’m trying to do based on specs.

That is - unless Plex only uses single threaded transcoding when also using HW acceleration? It certainly appears that may be the case, after reviewing my Task Manager graphs.

Can anybody speak to this? Is this expected? I turn HW acceleration off, and I see uniform load across all of the cores in my VM. 4K HEVC take a lot of horsepower to transcode, but my system can do it with CPU alone - again when its multi-threaded.

Turn HW acceleration on, and I see only 1 of my CPU cores being taxed. That core isn’t at 100%, but its mid-high 80s. My GPU on the other hand never seems to really get into high gear. I figure the single core might be struggling to fill the GPU quickly enough

I’ve read that VC-1 and subtitles can cause single-threaded transcoding, but I’m not playing any subtitles, and again, when I toggle HW off, I see that the transcoding is spread across multiple cores on my processor. Flip the HW back on and restart the video, and it goes back to only a single thread.

Here is the media info for the video I’m testing with:

Media

Video Resolution 4K
Duration 1:00:15
Bitrate 16133 kbps
Width 3840
Height 2160
Aspect Ratio 1.78
Container MKV
Video Frame Rate NTSC
Video Profile main 10

Part

Duration 1:00:15
File xxxxx.mkv
Size 6.79 GB
Container MKV
Indexes sd
Video Profile main 10

Codec HEVC
Bitrate 16005 kbps
Language English
Bit Depth 10
Chroma Subsampling 4:2:0
Color Primaries bt709
Color Range tv
Color Space bt709
Color Trc bt709
Frame Rate 30 fps
Height 2160
Level 5.0
Profile main 10
Ref Frames 1
Title English
Width 3840
Display Title 4K (HEVC Main 10)

Codec EAC3
Channels 2
Bitrate 128 kbps
Language Not Applicable
Audio Channel Layout stereo
Sampling Rate 48000 Hz
Display Title Not Applicable (EAC3 Stereo)

Codec SRT
Language English
Title SDH
Display Title English (SRT)

It seems that 1080p HEVC also only uses 1 core with HW encoding, but that seems like plenty as the transcode quickly shows ‘throttled’ in Tautulli. (that’s a good thing)

Maybe its simply a HW vs CPU/software transcoding thing and nothing to do with HEVC - either way, I’d appreciate any guidance on this - thanks in advance.

ST

are you having an actual problem, lag or buffering or whatever?

are you getting HW for both encode/decode?

p2k is a powerhouse, so a single transcode isn’t going to hit the gpu much.

the cpu load is probably the audio, which if I remember correctly is single threaded.

Yes - I’m having trouble with 4K HEVC content. The sample clip I’m testing with buffers. Tautulli shows that it’s only able to transcode at 0.9 (or less). It also shows HW on both sides of the job.

Also - seeking or trying to change the bitrate on the fly with this clip doesn’t work. I can only get my testing to work at all (with buffering) if I tell my client to only use a lower bitrate to force the issue. Trying to do this on the fly doesn’t work it seems.

PlexWeb always forces a transcode as it doesn’t seem to support h265 playback in the web browser.

My CPU single threaded passmark score is 1675. I would suspect that would be adequate to handle any audio requirements.

In order to keep the Audio and Video synced, the workload has to pass through the same CPU thus IPC speed becomes important. It isn’t single threaded per say, cpu affinity.

Keep in mind that the data has to pass from the CPU to the GPU over PCIE and back, this is why the iGPU’s in the core I’s can do more with less because the gpu is local to the cpu.

Thanks. Are you suggesting my CPU doesn’t have a high enough IPC to keep up? I’ve only really considered the processor overall and single threaded passmark scores.

For my system that is basically 19K / 1.7K respectively.

The GPU is in a PCIE 3.0 16x slot (electrical), and has lots of RAM. The disk attached to the VMs is very fast, but I did experiment with adding a SSD for the transcoding cache, and it didn’t make a difference.

It seems more like something wrong in the underlying transcoding engine.

Like I mentioned, Emby/Jellyfin can playback this content better - almost with no buffering, and I believe the load is more distriubuted across the cores as well - I will verify that and update this thread.

Still - why does seeking not work? Why can’t I change the desired bitrate on the fly? I get that 4K HEVC is a lot to deal with, but my system should be able to cope with headroom to spare.

Further to my last, I’ve got Jellyfin running a 4K HEVC --> 1080p 20Mbps transcode of the same test file. Its actually working without buffering at all when I manually tell it to use 1080p @ 20Mbps.

I don’t have Tautulli to tell me that its using HW, but my GPU is clearly working harder under Jellyfin than Plex per GPU-Z. With Plex my GPU use is about 6-8% during the transcode, with momentary touches to about 12% at startup. With Jellyfin at startup I’m seeing 30-35% on the GPU, then it cruises at 16% fairly consistently during the actual stream.

In terms of the CPU load, it looks similar to Plex - one core is being taxed more than the others, so it looks like that part is consistent between Plex and Jellyfin/Emby.

Why then does Jellyfin seem to get more out of the GPU than Plex? Does this speak to further FFMPEG optimization work that needs to happen on the HEVC side of things with Plex’s implementation?

Edited to add that I’m able to play multiple 4K HEVC --> 1080p @ 20Mbps in Jellyfin - the GPU use climbs to 26-28% while streaming more than one title.

Something in the Plex implementation of ffmpeg appears to be holding the P2000 back with 4K HEVC for some reason. I don’t see the same issue with 1080p HEVC or H264.

I don’t think you state anything specifically in the thread, but to be clear;

you are running this windows instance under vmware? or are you running on bare metal hardware (no virtualization)?

there are often issues with hardware transcoding under virtualization.

I don’t know why emby/jellyfin may seem to perform better.

But it seems obvious that all 3 are having some kind of problems.

The first recommendation would be to simply run on bare metal, if for nothing than to simply rule out the esxi layer.

If you can’t or won’t do that, then you are going to be the mercy of either your own research into getting it to work over vmware, or the help of any others that have (or haven’t).

these are relatively very old, especially in the context of 4k.

fwiw, I have some old dell r610 with Intel(R) Xeon(R) CPU X5570 @ 2.93GHz and they can about 2x 4k>1080 transcodes on the cpus, on bare metal debian.

when I still had esxi on them, they transcoded for crap.

Right - these processors are Ivy Bridge generation, so not new, but still pleanty fast enough to handle a 4k transcode. But I’m not trying to use the processors - I’m trying to use a brand new Quadro P2000.

I’ll go ahead and try bare metal to see if it makes any difference, but I don’t think it will. I have a Sandy Bridge i7 2600 which I just pulled out of production to pair with the P2000 as a test.

I’d also suggest that you may have misread my thoughts on Jellyfin. It works, quite well in a 4K to 1080p transcode situation. The GPU is actually being used which is clearly the difference between my Plex and Jellyfin tests. 16% and smooth in Jellyfin, 6% and buffers in Plex.

That to me says the Plex ffmpeg implementation needs optimization for 4K HEVC to work better for HW acceleration with nVidia cards.

I will repeat these tests on bare metal though and see if I can do any better there.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.