HDR Tone Mapping CPU Usage

Server Version#: 1.21.0.3616-d87012962 (Docker)
Server: Dell PowerEdge T320
CPU: Intel Xeon E5-2470v2 (10 physical cores, 20 virtual cores)
RAM: 32 GB

I was very excited to see the HDR tone mapping support that was just added. I did some experimentation with it on my server and noticed some peculiarities.

HDR Tone mapping ON: Transcoder appears to ONLY use physical cores. It does not ever seem to use hyperthreading. CPU usage never goes above 999% (10 cores x 100% = 1000%).

HDR Tone mapping OFF: Transcoder uses all physical cores and virtual cores (i.e. I see it appearing to use hyperthreading, CPU usage hovers around 1200%).

This behavior on my server makes the difference between being able to transcode 4K in real time vs. not being able to handle it. Is this the expected behavior with HDR tone mapping enabled? If so, why the CPU usage difference with tone mapping enabled vs. disabled?

reporting same thing. Initially I couldn’t figure out why I wasn’t getting hardware transcoding. I’m on i3 10100.

Are you getting HW transcoding to work with HDR tone mapping on?

no, not at all. hardware accell is working for everything EXCEPT tone mapping.

Tone Mapping uses the CPU and OpenCL.

You will see the added CPU load as the CPU copies all the data around for every frame.

This isn’t like zero-copy hardware transcoding.

Because the CPU/GPU (ASIC) doesn’t have the native ability to tone map (HDR->SDR), OpenCL is used in the regular CPU GPU. That puts the load on the CPU which everyone is seeing.

My server does not have any GPU acceleration. It relies only on software transcoding. That being said, I’m pointing out the difference in CPU usage between the two scenarios where HDR tone mapping is enabled vs. when it is disabled.

1 Like

I am having the same issue. I took some logs, here are my details:

[Transcoder] [Parsed_hwmap_2 @ 0x1af5240] Failed to created derived device context: -19.
[Transcoder] [Parsed_hwmap_2 @ 0x1af5240] Failed to configure output pad on Parsed_hwmap_2
[Transcoder] Failed to inject frame into filter network: No such device
Jobs: '/usr/lib/plexmediaserver/Plex Transcoder' exit code for process 8484 is 1 (failure)

May I have full logs please? I can’t help solve from snippets.

@ChuckPa exchanged some info about this with you cohort DaveBinM on reddit about this issue. Sent some logs and confirmed lspci info.

Intel Ice Lake and newer CPU’s with Quick Sync are suppose to have updates to QSV that include HDR 10 tone mapping. My CPU is Comet Lake so that doesn’t help me, but it does suggest those newer QSV versions might handle it all through HW.

1 Like

That also suggests said VAAPI driver is installed which is something I will go check on

however, as I was briefed, this is not HW Tone Mapping.

Essentially this is ā€œTone Mapping for the rest of usā€ .

Having just asked the team:

  1. This initial implementation is OpenCL based as it provides the widest device coverage.

  2. They are considering adding specific ASIC-based tone mapping in the future.

for docker users, based on another thread, I did this and it worked in allowing tone mapping to use GPU on my 9900k gpu:
sudo docker exec plex bash -c ā€œapt update && apt -y install cmake pkg-config python ocl-icd-dev libegl1-mesa-dev ocl-icd-opencl-dev libdrm-dev libxfixes-dev libxext-dev llvm-7-dev clang-7 libclang-7-dev libtinfo-dev libedit-dev zlib1g-dev build-essential gitā€
sudo docker exec plex git clone --branch comet-lake https://github.com/rcombs/beignet.git
sudo docker exec plex bash -c ā€œmkdir /beignet/build/ && cd /beignet/build && cmake -DLLVM_INSTALL_DIR=/usr/lib/llvm-7/bin … && make -j8 && make installā€
sudo docker restart plex

Latest v 1.21.1.3830 appears to resolve the issue.

Hi, the PLEX website as well as your response confirm that tone mapping is not (yet) hardware accelerated in Windows. I do have a question though. I’m running a 4790K (@ 4.4GHz) in combination with an nVidia P1000. For testing purposes I’m transcoding 4K HDR HEVC (30 Mbps) to 1080p H.264 (8 Mbps). With tone mapping on, PLEX CPU usage is around 35 percent and the CPU never gets maxed out. Nevertheless there is intermittent buffering. Transcoding appears to be done by the P1000. Network bandwidth is around 400 Mbps, so no bottleneck there either. Is there a specific limitation to the CPU doing the tone mapping, for instance is it a single-threaded implementation? Cheers in advance.

@ProMace

The reason the CPU doesn’t ā€œMax Outā€ is because of how multi-threading works.

  1. Communication from CPU → Nvidia GPU is about 2% on that CPU
  2. Video transcoding is being done on the GPU
  3. Tone mapping is being done on the Nvidia on Linux
  4. Subtitles would still be done by the CPU - Some portion of a core
  5. Audio - multi-threaded - can be as high as 20% from what I’ve seen.

To your questions:

  • CPU tone mapping is multi-threaded . I see 300% (3 cores) then tone mapping in SW

  • Subtitle burning. Another portion of a thread.

Pain points?

  1. If you’re running PMS in Docker, in a VM, – (that’s a lot of layers) – depends on Hypervisor

  2. Don’t try to run a Linux VM on Windows to gain HW capabilities – won’t happen because Windows is still at the bare machine level and going to pinch off everything you need to do the job right.

Hi , thanks for your elaborate response! To further clarify my scenario:

  1. Running Windows 10 Pro on a physical machine, not a VM.
  2. No (burned) subtitles involved.
  3. The verbose log shows that transcoding speed can’t keep up with the movie. I’ve pasted part of it below.
  4. Transcoding to 4 Mbps instead of 8 works fine, but image quality is obviously worse.
  5. Using SoftPerfect RAM Disk for the temporary transcode files.

So there is no virtualization overhead involved, I’m just running PLEX on a physical Windows machine. I can of course work around it by having a non-HDR (and lower bit rate) version on the side, but it would be a nice-to-have to get it to work for a single stream occasionally. The remaining question I have is: should the 4790K be able to handle tone mapping in software under the conditions I mentioned and based on your explanation of how the workloads are distributed?

VERBOSE LOG:

May 10, 2021 23:47:24.354 [7444] VERBOSE - [Transcode] We want 60 seconds ahead, last returned was 1342.718000 and max is 1342.718000.
May 10, 2021 23:47:24.354 [7444] VERBOSE - [Transcode] It took 0.0 sec to serialize a list with 0 elements.
May 10, 2021 23:47:24.354 [13856] DEBUG - Completed: [127.0.0.1:53211] 206 PUT /video/:/transcode/session/042952e1-a980-4ace-9ce9-ca7d3ad29831/b5ad77db-27d7-48f0-8787-6aee47a018ad/progress?progress=14.8&size=-22&remaining=11000&vdec_packets=4559&vdec_hw_ok=4549&speed=0.8&vdec_hw_status=1 (6 live) 0ms 355 bytes (pipelined: 557) (range: bytes=0-)
May 10, 2021 23:47:24.426 [11660] VERBOSE - [TranscodeOutputStream] Waiting 100ms for more data…
May 10, 2021 23:47:24.548 [13856] VERBOSE - [TranscodeOutputStream] Waiting 100ms for more data…
May 10, 2021 23:47:24.669 [11660] VERBOSE - [TranscodeOutputStream] Waiting 100ms for more data…
May 10, 2021 23:47:24.797 [13856] VERBOSE - [TranscodeOutputStream] Waiting 100ms for more data…
May 10, 2021 23:47:24.868 [13856] VERBOSE - Auth: We found auth token (xxxxxxxxxxxxxxxxxxxx), enabling token-based authentication.
May 10, 2021 23:47:24.868 [13856] VERBOSE - Auth: Came in with the master token, authorization succeeded.
May 10, 2021 23:47:24.868 [0728] DEBUG - Request: [127.0.0.1:53211 (Loopback)] PUT /video/:/transcode/session/042952e1-a980-4ace-9ce9-ca7d3ad29831/b5ad77db-27d7-48f0-8787-6aee47a018ad/progress?progress=14.8&size=-22&remaining=9364&vdec_packets=4567&vdec_hw_ok=4557&speed=0.9&vdec_hw_status=1 (6 live) Signed-in Token (ProMace) (range: bytes=0-)
May 10, 2021 23:47:24.868 [0728] VERBOSE - * Accept => /
May 10, 2021 23:47:24.868 [0728] VERBOSE - * Connection => keep-alive
May 10, 2021 23:47:24.868 [0728] VERBOSE - * Host => 127.0.0.1:32400
May 10, 2021 23:47:24.868 [0728] VERBOSE - * Icy-MetaData => 1
May 10, 2021 23:47:24.868 [0728] VERBOSE - * Range => bytes=0-
May 10, 2021 23:47:24.868 [0728] VERBOSE - * User-Agent => Lavf/58.27.104
May 10, 2021 23:47:24.868 [0728] VERBOSE - * X-Plex-Http-Pipeline => infinite
May 10, 2021 23:47:24.868 [0728] VERBOSE - * X-Plex-Token => xxxxxxxxxxxxxxxxxxxx
May 10, 2021 23:47:24.868 [0728] VERBOSE - * progress => 14.8
May 10, 2021 23:47:24.868 [0728] VERBOSE - * size => -22
May 10, 2021 23:47:24.868 [0728] VERBOSE - * remaining => 9364
May 10, 2021 23:47:24.868 [0728] VERBOSE - * vdec_packets => 4567
May 10, 2021 23:47:24.868 [0728] VERBOSE - * vdec_hw_ok => 4557
May 10, 2021 23:47:24.868 [0728] VERBOSE - * speed => 0.9
May 10, 2021 23:47:24.868 [0728] VERBOSE - * vdec_hw_status => 1
May 10, 2021 23:47:24.868 [0728] VERBOSE - [Transcode] We want 60 seconds ahead, last returned was 1342.718000 and max is 1342.718000.
May 10, 2021 23:47:24.868 [0728] VERBOSE - [Transcode] It took 0.0 sec to serialize a list with 0 elements.
May 10, 2021 23:47:24.868 [13856] DEBUG - Completed: [127.0.0.1:53211] 206 PUT /video/:/transcode/session/042952e1-a980-4ace-9ce9-ca7d3ad29831/b5ad77db-27d7-48f0-8787-6aee47a018ad/progress?progress=14.8&size=-22&remaining=9364&vdec_packets=4567&vdec_hw_ok=4557&speed=0.9&vdec_hw_status=1 (6 live) 0ms 355 bytes (pipelined: 558) (range: bytes=0-)
May 10, 2021 23:47:24.916 [13856] VERBOSE - [TranscodeOutputStream] Waiting 100ms for more data…
May 10, 2021 23:47:25.029 [11660] VERBOSE - [TranscodeOutputStream] Waiting 100ms for more data…
May 10, 2021 23:47:25.133 [13856] VERBOSE - [TranscodeOutputStream] Waiting 100ms for more data…
May 10, 2021 23:47:25.247 [11660] VERBOSE - [TranscodeOutputStream] Waiting 100ms for more data…
May 10, 2021 23:47:25.347 [13856] VERBOSE - [TranscodeOutputStream] Waiting 100ms for more data…
May 10, 2021 23:47:25.416 [13856] VERBOSE - Auth: We found auth token (xxxxxxxxxxxxxxxxxxxx), enabling token-based authentication.
May 10, 2021 23:47:25.416 [13856] VERBOSE - Auth: Came in with the master token, authorization succeeded.
May 10, 2021 23:47:25.416 [7444] DEBUG - Request: [127.0.0.1:53211 (Loopback)] PUT /video/:/transcode/session/042952e1-a980-4ace-9ce9-ca7d3ad29831/b5ad77db-27d7-48f0-8787-6aee47a018ad/progress?progress=14.8&size=-22&remaining=12629&vdec_packets=4576&vdec_hw_ok=4566&speed=0.5&vdec_hw_status=1 (6 live) Signed-in Token (ProMace) (range: bytes=0-)
May 10, 2021 23:47:25.416 [7444] VERBOSE - * Accept => /
May 10, 2021 23:47:25.416 [7444] VERBOSE - * Connection => keep-alive
May 10, 2021 23:47:25.416 [7444] VERBOSE - * Host => 127.0.0.1:32400
May 10, 2021 23:47:25.416 [7444] VERBOSE - * Icy-MetaData => 1
May 10, 2021 23:47:25.416 [7444] VERBOSE - * Range => bytes=0-
May 10, 2021 23:47:25.416 [7444] VERBOSE - * User-Agent => Lavf/58.27.104
May 10, 2021 23:47:25.416 [7444] VERBOSE - * X-Plex-Http-Pipeline => infinite
May 10, 2021 23:47:25.416 [7444] VERBOSE - * X-Plex-Token => xxxxxxxxxxxxxxxxxxxx
May 10, 2021 23:47:25.416 [7444] VERBOSE - * progress => 14.8
May 10, 2021 23:47:25.416 [7444] VERBOSE - * size => -22
May 10, 2021 23:47:25.416 [7444] VERBOSE - * remaining => 12629
May 10, 2021 23:47:25.416 [7444] VERBOSE - * vdec_packets => 4576
May 10, 2021 23:47:25.416 [7444] VERBOSE - * vdec_hw_ok => 4566
May 10, 2021 23:47:25.416 [7444] VERBOSE - * speed => 0.5
May 10, 2021 23:47:25.416 [7444] VERBOSE - * vdec_hw_status => 1

This is Linux. Not Windows.

I’m sorry but I can be of zero help with Windows.

Please create a separate thread and tag it ā€œserver-windowsā€, remove the ā€˜server-linux’ tag as you do

Understood, my apologies and thanks for your help!

Cheers, ProMace

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.