Plex in Docker on VM (ESXi 8.0), using nvidia hardware dec/enc (brand new install)

jon_zdsk · April 24, 2023, 9:24pm

Server Version#: 4.104.2
Player Version#: iOS 8.17

Hey guys,

New install here, my setup is that plex is installed via docker container.
docker is run on a linux VM (ESXi 8.0) that sits on my internal network. Docker has access to my nvidia card for hardware enc/dec.

Things that are working:

Server runs, I can connect from outside of the network (remote)
Transcoding via HW appears to work, plex has access to my nvidia hardware.

Things that i’m having problems with:

Plex transcoding to my iphone iOS Plex App (8.17) with quality lowered (to force transcode) seems to be somewhat unstable. Starts right up, I get two minutes of video/audio, and then it tells me my server isn’t fast enough, but the truth is that i’m barely touching server specs, with the HW decoding i’m looking at ~6% CPU so I’m not sure what happens, it eventually dies.

I’m getting A LOT of info in the logs, too much to really parse through, If I go through console and filter on Transcode I seem to be stuck in a bit of a loop.

I’m seeing these two items pretty much on repeat, and I can never resume or start over the encode again.

Request: [127.0.0.1:51388 (Loopback)] PUT /video/:/transcode/session/2904F1AB-3C60-45E0-8FFD-6F14AB6DAE91/f323fb4f-489b-4c1e-9ac4-b8bf3cefc9ab/progress?progress=0.0&size=-22&remaining=91311&vdec_packets=33&vdec_hw_ok=32&speed=0.3&vdec_hw_status=1 (17 live) #44938 Signed-in Token (jo295) (range: bytes=0-)

Completed: [127.0.0.1:51388] 206 PUT /video/:/transcode/session/2904F1AB-3C60-45E0-8FFD-6F14AB6DAE91/f323fb4f-489b-4c1e-9ac4-b8bf3cefc9ab/progress?progress=0.0&size=-22&remaining=91311&vdec_packets=33&vdec_hw_ok=32&speed=0.3&vdec_hw_status=1 (17 live) #44938 0ms 355 bytes (pipelined: 41) (range: bytes=0-)

Interested on best place to start troubleshooting.

It should be known that I have a 6-7 year old install of plex that I have been attempting to migrate to this new box, but had so many problems with transcoding, that I felt like I needed to start from a fresh canvas, It would be my intention to migrate my existing install/database over, but until I can prove that this configuration works from a clean install, I don’t even know if there is a point.

ChuckPa · April 24, 2023, 11:16pm

You’ve left out a great deal of information.

Actual CPU (Vendor and CPUSKU/Name)
OS you’re using in the ESXi VM

Playing to IOS devices is often DirectPlay ( no transcoding ).
They get the file as it exists and do all the work on-device.

Without knowing more … AND seeing your server DEBUG log files which capture this happening, it’s impossle to guess anything more than:

GPU passthrough incorrect
Drivers missing
Networking incomplete / insufficient somewhere (slow)

I will challenge you as to why you have a VM + Docker + PMS in that Docker.
Do you know, the Docker environment is the same as the native Ubuntu/Debian runtime environment ? (ESXi VM + Guest OS + Native server == best )
Also, docker is no longer (not for some time) for hardware transcoding and tone mapping

jon_zdsk · April 25, 2023, 12:16am

Thanks for the response. Info below.

Actual CPU - The ESXi host has 2x E5-2683 V4 chips, but I’ve allocated 8 cores to the VM that is running this.
The OS I’m using on the VM is Ubuntu 22.04.02

My iOS device will play files direct play, but I specifically changed my quality settings to a reduced quality in an effort to validate transcoding (as I do have remote users that will need to transcode, and I was having this problem in a previous installation so I wanted to validate transcoding worked correctly.

GPU Passthrough incorrect - This could be the case. I’ve tested this as best that I think I can but I could be missing something. I know my VM has access to my GPU by running nvidia-smi from the commandline. I know my docker container has access to my GPU by running nvidia-smi from within the container. I’ve also run ffmpeg -v debug -init_hw_device cuda from the VM to validate that the appropriate access is granted to reach the encoder/decoder. When I was able to get the file to play/transcode, the dashboard within plex did show that i was using the hardware decoder with the (hw) next to the decoding process. This was the only validation I was able to perform though.
Drivers missing - I’m sure this is possible, Wouldn’t know which drivers or where though. I can dig deeper, but any hints would be helpful.
Networking Incomplete / Speed - Always possible. The line should be strong enough, and I’ve run multiple end to end tests, but I don’t think I can rule this out.

Regarding the challenge as to why VM + Docker + PMS - This is a fair challenge. The short answer is that I have other things running on that VM and docker felt like a way to keep these things segmented in a manner that would make plex easy to upgrade in the future, without messing with other items. I think one of the things that tends to happen when you containerize is something, is that you kind of want to do that for everything, and maybe that doesn’t make sense. But I didn’t want to deal with managing dependencies and other items for plex outside of the container environment, when I have other items on that box. If we can narrow the issues down to this specific setup, I will spin up a dedicated machine for plex.

I was not aware that docker did not support hardware transcoding and tone-mapping, as I was able to successful test this to some extent through docker, but maybe that is an unofficial deployment or not a supported method.

With regards to debug log, what is the preferred method to post that here? It’s a fairly hefty log. How far back do you want me to go? Do I post in plain text here or attach a zip?

ChuckPa · April 25, 2023, 12:35am

Confirming: The CPUs you use don’t have QSV.

The per-core speed of the CPUs is “meh”. I have an E5-2690 v4 (faster) and it can’t keep up with CPU power alone. It needs the Nvidia P2200.
— Additionally subtitle burning isn’t going to happen on the box. CPU per-thread speed isn’t fast enough to keep up with 4K subtitles. (I’ve tried)
To confirm you’ve got it all passed through correctly, with the drivers installed correctly, it’s this simple.

Last login: Sun Apr 23 16:08:36 2023 from 192.168.0.13
[chuck@glockner ~.1997]$ nvidia-smi
Mon Apr 24 20:31:20 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro P2200        On   | 00000000:07:00.0 Off |                  N/A |
| 47%   36C    P8     4W /  75W |      4MiB /  5120MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
[chuck@glockner ~.1998]$ ls -la /dev/dri
total 0
drwxr-xr-x  3 root root        120 Apr 22 15:26 ./
drwxr-xr-x 21 root root       5640 Apr 22 15:26 ../
drwxr-xr-x  2 root root        100 Apr 22 15:26 by-path/
crw-rw----  1 root render 226,   0 Apr 23 16:08 card0
crw-rw----  1 root render 226,   1 Apr 23 16:08 card1
crw-rw----  1 root render 226, 128 Apr 23 16:08 renderD128
[chuck@glockner ~.1999]$

You see the GPU enumerated in /dev/dri
Nvidia-SMI confirms the drivers are there.

The last you need confirm is the card supports the codecs you want to play

Logs:

Easily done:

Settings - Server - Troubleshooting
Mid-page – Download Logs
It will give you a ZIP file
Upload here.

Console method:

cd "/var/lib/plexmediaserver/Library/Application Support/Plex Media Server"
tar cfz /tmp/PlexLogs-Myname.tar.gz   ./Logs

Attach /tmp/PlexLogs-Myname.tar.gz to the post

jon_zdsk · April 25, 2023, 12:57am

Confirmed, attempting to use Nvidia Tesla P4 for hw decoding/encoding, not QSV.
Have not done any testing with subtitle burning, but it isn’t really on my radar, With 8 cores dedicated to this VM and no hardware decoding, i was still able to decode something from HDR 4K into 1080p with Tone-Mapping. This made my cpu jump to ~600% in top, but I had the extra bandwidth on the host to scale. Obviously this is not my planned configuration, and once I got hw decoding off the nvidia card to work, this same configuration dropped to around 30% of my 8 cores. (1 stream)
Can confirm, nvidia-smi works in both the container, and the host OS, mine looks slightly different as I’m passing through vgpu instead of the dedicated card. Here is mine below

Tue Apr 25 00:49:03 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.05    Driver Version: 525.85.05    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GRID P4-8Q          On   | 00000000:02:03.0 Off |                    0 |
| N/A   N/A    P8    N/A /  N/A |      0MiB /  8192MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

And /dev/dri below

total 0
drwxr-xr-x  3 root root        140 Apr 24 15:39 .
drwxr-xr-x 20 root root       4080 Apr 24 15:39 ..
drwxr-xr-x  2 root root        120 Apr 24 15:39 by-path
crw-rw----  1 root video  226,   0 Apr 24 15:39 card0
crw-rw----  1 root video  226,   1 Apr 24 15:39 card1
crw-rw----  1 root render 226, 128 Apr 24 15:39 renderD128
crw-rw----  1 root render 226, 129 Apr 24 15:39 renderD129

The card I’m using is a Tesla P4 which has pretty good codec support on the matrix, and definitely supports the few files I’ve attempted.

Logs, attached.

Plex Media Server Logs_2023-04-25_00-45-06.zip (3.1 MB)

Thanks for the responses, very helpful.

ChuckPa · April 25, 2023, 1:02am

NOW, a weird bug my ESXi v7 does:

When you have two GPUs.
Enumeration can be backwards
Only way to know is test both.

Stop plex
Edit Preferences.xml
Add HardwareDevicePath="/dev/dri/renderD12x" ( x=8 or 9`)
Save
Start
Test

jon_zdsk · April 25, 2023, 1:17am

Was able to test this real quick,

It looked the same with both settings - I feel like its stuck in some kind of loop where it keeps calling/initializing the hardware drivers, i captured the logs after both attempts, but the 2nd attempt will still have the details from the first attempt, because i didn’t cut and parse them between. (You’ll see the restart of plex)

First attempt:

Plex Media Server Logs_2023-04-25_01-12-43.zip (2.7 MB)

Second attempt:

Plex Media Server Logs_2023-04-25_01-14-42.zip (2.3 MB)

ChuckPa · April 25, 2023, 1:51am

Try playing with the Web player .

This reduces it to a really small playback solution for the Nvidia

Let’s do H.264 first (it’s SDR and 1080p )

Grab those logs please

jon_zdsk · April 25, 2023, 3:56pm

This is what it looks like converting 1080p h.264 to SD h264 via plex web (chrome).

I did remove the HardwareDevicePath from my Preferences because it was causing problems.

Latest log here

Plex Media Server Logs_2023-04-25_15-56-17.zip (2.4 MB)

jon_zdsk · April 26, 2023, 5:55pm

In an effort to try to nail this down a little better, I’ve spun up a new machine, and installed plex directly via .deb deployment instead of going through docker.

Everything else in my configuration is the same.

I started with a new clean database, and added some videos on local storage.

I ran a test converting 1080p h264 via hardware to SD, i was able to watch 20 minutes of this movie without any issues, and i have no reason to believe that the rest of the movie wouldn’t run stable.

I ran a test converting 4k hdr hevc, to SD via hardware. This ran for about 6-8 minutes pretty flawless with very little cpu usage. Then it died completely and I’m unable to resume or restart that stream at all (just getting an error now, or spins till timeout)

This is the log captured right after that stream crashed.

Plex Media Server Logs_2023-04-26_17-41-34.zip (2.7 MB)

It’s also worth noting that after my 4k stream crashed, i noticed that even the original test i did from 1080p->SD does not work anymore, same video, same parameters/client/settings. So unclear if it crashed my nvidia drivers or something, and put me in a situation where nothing can initialize going forward, hoping this is fairly obvious in the logs.

jon_zdsk · April 26, 2023, 6:06pm

Further to the above, a restart of plex does not fix the problem mentioned, the only thing that allows me to go back to transcoding 1080p files again is a hard reboot of the machine.

ChuckPa · April 26, 2023, 6:19pm

Jon,

Turn off IPv6. All your traffic is IPv4.
In the Cache directory (under Plex Media Server) , please delete ‘cert-v2.p12’ and restart PMS.
Restart the player too (full close/terminate and reopen)
Then please recreate something. I ask this your logs are filled with thousands of:

Apr 26, 2023 17:12:40.078 [140160157809464] WARN - [CERT] TLS connection from [::ffff:192.168.1.1]:37564 came in with unrecognized plex.direct SNI name ‘216-49-129-198.e745d67074ef40a7b11344cfc7b43162.plex.direct’; using installed plex.direct cert

I can’t yet tell if you have certificate expiration problems or a foreign certificate trying to get in the way & blocking.

Apr 26, 2023 17:11:01.447 [140160130095928] WARN - [HttpClient/HCl#3e] HTTP error requesting GET https://216-49-129-198.b401bbfa0425468abcf5f62aa4548fa9.plex.direct:32400 (60, SSL peer certificate or SSH remote key was not OK) (SSL: no alternative certificate subject name matches target host name ‘216-49-129-198.b401bbfa0425468abcf5f62aa4548fa9.plex.direct’)

jon_zdsk · April 26, 2023, 7:41pm

Chuck,

Thanks,

I’ve disabled IPv6 (from the network page) and I’ve deleted the mentioned cert.

After restarting PMS, I’m still seeing a lot of CERT/TLS spam in the logs (haven’t even tried to play/recreate anything)

Latest logs here.

Plex Media Server Logs_2023-04-26_19-39-37.zip (1.5 MB)

ChuckPa · April 26, 2023, 8:29pm

@jon_zdsk

Much better.

The only remaining thing to do with this is to FORCE restart the player.

Don’t use a FQDN to access your PMS from that 216.x.x.x IP address.
Go through the normal https://app.plex.tv (or the Plex apps)

After you update your certificate, you ca then use the FQDN again.

For reference in case you need it:

jon_zdsk · April 29, 2023, 3:11pm

Chuck,

Just circling back to this, I believe I did the above steps to force the regeneration of the certificate. I re-ran my test, and my stream crashed at about ~8mins again. Latest logs here.

One thing I was thinking was that maybe I was running out of space on the root drive , as i’m using default path for transcoder temp files (which i think is /tmp) but i monitored the drive throughout the stream and it did not fill up.

latest logs. right after my stream crash at 8 mins.

Plex Media Server Logs_2023-04-29_15-08-37.zip (2.0 MB)

Topic		Replies	Views
Plex Suddenly Stops Hardware Transcoding Plex Media Server server-linux , apple-tv , server-truenas	69	1016	September 20, 2023
Plex HW transcode randomly stops - Codecs: hardware transcoding: opening hw device failed - probably Plex Media Server server-docker	19	379	October 6, 2023
Help Needed: Having issues with hardware transcoding in Linux Plex Media Server server-linux	24	882	November 1, 2020
Hardware transcoding not working in docker Plex Media Server server-linux , server-docker	37	801	April 6, 2023
Plex HW Transcoding fails on web player, works on iOS Plex Media Server server-linux , plex-web	35	272	January 27, 2025

Plex in Docker on VM (ESXi 8.0), using nvidia hardware dec/enc (brand new install)

Logs:

Related topics