Server Version#: 4.104.2
Player Version#: iOS 8.17
Hey guys,
New install here, my setup is that plex is installed via docker container.
docker is run on a linux VM (ESXi 8.0) that sits on my internal network. Docker has access to my nvidia card for hardware enc/dec.
Things that are working:
Server runs, I can connect from outside of the network (remote)
Transcoding via HW appears to work, plex has access to my nvidia hardware.
Things that i’m having problems with:
Plex transcoding to my iphone iOS Plex App (8.17) with quality lowered (to force transcode) seems to be somewhat unstable. Starts right up, I get two minutes of video/audio, and then it tells me my server isn’t fast enough, but the truth is that i’m barely touching server specs, with the HW decoding i’m looking at ~6% CPU so I’m not sure what happens, it eventually dies.
I’m getting A LOT of info in the logs, too much to really parse through, If I go through console and filter on Transcode I seem to be stuck in a bit of a loop.
I’m seeing these two items pretty much on repeat, and I can never resume or start over the encode again.
Interested on best place to start troubleshooting.
It should be known that I have a 6-7 year old install of plex that I have been attempting to migrate to this new box, but had so many problems with transcoding, that I felt like I needed to start from a fresh canvas, It would be my intention to migrate my existing install/database over, but until I can prove that this configuration works from a clean install, I don’t even know if there is a point.
I will challenge you as to why you have a VM + Docker + PMS in that Docker.
Do you know, the Docker environment is the same as the native Ubuntu/Debian runtime environment ? (ESXi VM + Guest OS + Native server == best )
Also, docker is no longer (not for some time) for hardware transcoding and tone mapping
Actual CPU - The ESXi host has 2x E5-2683 V4 chips, but I’ve allocated 8 cores to the VM that is running this.
The OS I’m using on the VM is Ubuntu 22.04.02
My iOS device will play files direct play, but I specifically changed my quality settings to a reduced quality in an effort to validate transcoding (as I do have remote users that will need to transcode, and I was having this problem in a previous installation so I wanted to validate transcoding worked correctly.
GPU Passthrough incorrect - This could be the case. I’ve tested this as best that I think I can but I could be missing something. I know my VM has access to my GPU by running nvidia-smi from the commandline. I know my docker container has access to my GPU by running nvidia-smi from within the container. I’ve also run ffmpeg -v debug -init_hw_device cuda from the VM to validate that the appropriate access is granted to reach the encoder/decoder. When I was able to get the file to play/transcode, the dashboard within plex did show that i was using the hardware decoder with the (hw) next to the decoding process. This was the only validation I was able to perform though.
Drivers missing - I’m sure this is possible, Wouldn’t know which drivers or where though. I can dig deeper, but any hints would be helpful.
Networking Incomplete / Speed - Always possible. The line should be strong enough, and I’ve run multiple end to end tests, but I don’t think I can rule this out.
Regarding the challenge as to why VM + Docker + PMS - This is a fair challenge. The short answer is that I have other things running on that VM and docker felt like a way to keep these things segmented in a manner that would make plex easy to upgrade in the future, without messing with other items. I think one of the things that tends to happen when you containerize is something, is that you kind of want to do that for everything, and maybe that doesn’t make sense. But I didn’t want to deal with managing dependencies and other items for plex outside of the container environment, when I have other items on that box. If we can narrow the issues down to this specific setup, I will spin up a dedicated machine for plex.
I was not aware that docker did not support hardware transcoding and tone-mapping, as I was able to successful test this to some extent through docker, but maybe that is an unofficial deployment or not a supported method.
With regards to debug log, what is the preferred method to post that here? It’s a fairly hefty log. How far back do you want me to go? Do I post in plain text here or attach a zip?
The per-core speed of the CPUs is “meh”. I have an E5-2690 v4 (faster) and it can’t keep up with CPU power alone. It needs the Nvidia P2200.
— Additionally subtitle burning isn’t going to happen on the box. CPU per-thread speed isn’t fast enough to keep up with 4K subtitles. (I’ve tried)
To confirm you’ve got it all passed through correctly, with the drivers installed correctly, it’s this simple.
Last login: Sun Apr 23 16:08:36 2023 from 192.168.0.13
[chuck@glockner ~.1997]$ nvidia-smi
Mon Apr 24 20:31:20 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17 Driver Version: 525.105.17 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Quadro P2200 On | 00000000:07:00.0 Off | N/A |
| 47% 36C P8 4W / 75W | 4MiB / 5120MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
[chuck@glockner ~.1998]$ ls -la /dev/dri
total 0
drwxr-xr-x 3 root root 120 Apr 22 15:26 ./
drwxr-xr-x 21 root root 5640 Apr 22 15:26 ../
drwxr-xr-x 2 root root 100 Apr 22 15:26 by-path/
crw-rw---- 1 root render 226, 0 Apr 23 16:08 card0
crw-rw---- 1 root render 226, 1 Apr 23 16:08 card1
crw-rw---- 1 root render 226, 128 Apr 23 16:08 renderD128
[chuck@glockner ~.1999]$
You see the GPU enumerated in /dev/dri
Nvidia-SMI confirms the drivers are there.
The last you need confirm is the card supports the codecs you want to play
Logs:
Easily done:
Settings - Server - Troubleshooting
Mid-page – Download Logs
It will give you a ZIP file
Upload here.
Console method:
cd "/var/lib/plexmediaserver/Library/Application Support/Plex Media Server"
tar cfz /tmp/PlexLogs-Myname.tar.gz ./Logs
Confirmed, attempting to use Nvidia Tesla P4 for hw decoding/encoding, not QSV.
Have not done any testing with subtitle burning, but it isn’t really on my radar, With 8 cores dedicated to this VM and no hardware decoding, i was still able to decode something from HDR 4K into 1080p with Tone-Mapping. This made my cpu jump to ~600% in top, but I had the extra bandwidth on the host to scale. Obviously this is not my planned configuration, and once I got hw decoding off the nvidia card to work, this same configuration dropped to around 30% of my 8 cores. (1 stream)
Can confirm, nvidia-smi works in both the container, and the host OS, mine looks slightly different as I’m passing through vgpu instead of the dedicated card. Here is mine below
Tue Apr 25 00:49:03 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.05 Driver Version: 525.85.05 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GRID P4-8Q On | 00000000:02:03.0 Off | 0 |
| N/A N/A P8 N/A / N/A | 0MiB / 8192MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
It looked the same with both settings - I feel like its stuck in some kind of loop where it keeps calling/initializing the hardware drivers, i captured the logs after both attempts, but the 2nd attempt will still have the details from the first attempt, because i didn’t cut and parse them between. (You’ll see the restart of plex)
In an effort to try to nail this down a little better, I’ve spun up a new machine, and installed plex directly via .deb deployment instead of going through docker.
Everything else in my configuration is the same.
I started with a new clean database, and added some videos on local storage.
I ran a test converting 1080p h264 via hardware to SD, i was able to watch 20 minutes of this movie without any issues, and i have no reason to believe that the rest of the movie wouldn’t run stable.
I ran a test converting 4k hdr hevc, to SD via hardware. This ran for about 6-8 minutes pretty flawless with very little cpu usage. Then it died completely and I’m unable to resume or restart that stream at all (just getting an error now, or spins till timeout)
This is the log captured right after that stream crashed.
It’s also worth noting that after my 4k stream crashed, i noticed that even the original test i did from 1080p->SD does not work anymore, same video, same parameters/client/settings. So unclear if it crashed my nvidia drivers or something, and put me in a situation where nothing can initialize going forward, hoping this is fairly obvious in the logs.
Further to the above, a restart of plex does not fix the problem mentioned, the only thing that allows me to go back to transcoding 1080p files again is a hard reboot of the machine.
In the Cache directory (under Plex Media Server) , please delete ‘cert-v2.p12’ and restart PMS.
Restart the player too (full close/terminate and reopen)
Then please recreate something. I ask this your logs are filled with thousands of:
Apr 26, 2023 17:12:40.078 [140160157809464] WARN - [CERT] TLS connection from [::ffff:192.168.1.1]:37564 came in with unrecognized plex.direct SNI name ‘216-49-129-198.e745d67074ef40a7b11344cfc7b43162.plex.direct’; using installed plex.direct cert
I can’t yet tell if you have certificate expiration problems or a foreign certificate trying to get in the way & blocking.
Apr 26, 2023 17:11:01.447 [140160130095928] WARN - [HttpClient/HCl#3e] HTTP error requesting GET https://216-49-129-198.b401bbfa0425468abcf5f62aa4548fa9.plex.direct:32400 (60, SSL peer certificate or SSH remote key was not OK) (SSL: no alternative certificate subject name matches target host name ‘216-49-129-198.b401bbfa0425468abcf5f62aa4548fa9.plex.direct’)
Just circling back to this, I believe I did the above steps to force the regeneration of the certificate. I re-ran my test, and my stream crashed at about ~8mins again. Latest logs here.
One thing I was thinking was that maybe I was running out of space on the root drive , as i’m using default path for transcoder temp files (which i think is /tmp) but i monitored the drive throughout the stream and it did not fill up.
latest logs. right after my stream crash at 8 mins.