And if you use that, try streaming from your browser, and change quality a dozen times, does it work every time? I’m seeing ~25% failure in my VMs, and ~60% on my server.
I’m not looking to stress test it yet.
We have a very basic – Failure to start HW transcoding. Let’s get there first ?
I need to plot the datapoints. I have an idea where this is going but must be scientific about it.
That’s fair. If you think you’re seeing a pattern, fantastic! I’ll take a little optimism right now. In terms of testing:
Distro: OpenMediaVault 6
Graphics Card: NVIDIA Tesla P4
Driver: 515.86.01-1
Test 1: Fail
Test 2: Fail
Test 3: Fail
Test 4: Fail
Test 5: N/A
Of note, Test 5 did not use the hardware decoders, as my card doesn’t support VP9 10-bit decoding. Curious, as that’s the only one that worked.
Also, Test 2 produced the decoder surfaces error… which is really odd, because I’m using driver 515.86.01-1 as confirmed by nvidia-smi and have set nvdecExtraFrames all the way to 16 (I tried 1, 2, 4, and 8 first - how high can I set this value?). I mis-clicked one time and accidentally set Test 1 to “Convert Automatically” which spat out a decoder surfaces error as well - after the tests I was able to reproduce this.
The other tests that failed simply did… nothing. No errors, just a periodic 404 as my client requested data.
I repeated the tests 5 times because I was seeing variability earlier, but they behaved the same every time. If you told me you picked this footage specifically to test a theory, I’d believe it.
If your card doesn’t do HW transcoding, Please mark it “NA”
I will amend the instructions.
With a VM on my desktop:
Distro: Debian 11
Graphics Card: GTX 1080Ti
Driver: 515.86.01-1
Test 1: Fail
Test 2: Fail
Test 3: Fail
Test 4: Pass
Test 5: Pass
Again repeated 5x, this time with nvdecExtraFrames=2 as instructed. Tests 1 and 2 produced “No decoder surfaces left”.
I’m doing these by starting the stream which defaults to Direct Stream for H264, and a conversion to “Maximum” for HEVC and VP9, and after a few frames play, changing quality.
The same VM, changed drivers:
Distro: Debian 11
Graphics Card: GTX 1080Ti
Driver: 5.10.84-1
Test 1: Fail
Test 2: Fail
Test 3: Fail
Test 4: Partial Fail
Test 5: Pass
Both Test 1 and 2 again displayed “No decoder surfaces left”, however this time Test 1 began playing via CPU transcoding twice. Test 4 failed on the second attempt with the usual nothingness, however I returned to it after doing Test 5 five times, and was able to reproduce it twice after 30+ more attempts (I stopped counting) - one time it fell back to CPU transcoding and started playing. It’s this variability that’s killing me, although I could live with the failure rate I just saw. Makes me wonder if I simply didn’t run Test 4 enough times with driver 515 to see the failure appear.
Off to bed for me, but more testing tomorrow…
Here are the results that you asked for… plus a little more.
Ubuntu 22.04.1
Kernel: 5.15.0-58-generic
Nvidia Driver: 525.60.13
PMS 1.31.1.6638
nvdecExtraFrames=2
GPU: Nvidia Tesla P4
Playback Quality: 20mbps 1080p
- Test 1 (Jellyfish 30mbps h264): Initially plays at max quality, after changing to 1080px20mbps: Fail (gpu process memory increases, then decreases, process dies)
- Test 2 (Jellyfish 30mbps HEVC): Initially plays at max quality, after changing to 1080px20mbps: Fail (gpu process memory increases, then decreases, process dies)
- Test 3 (Jellyfish 120mbps 4K h264): Initially plays and transcodes at max quality, after changing to 1080px20mbps: Fail (gpu process memory increases, then decreases, process dies)
- Test 4 (Jellyfish 120mbps 4K HEVC 10bit): Initially plays and transcodes at max quality, after changing to 1080px20mbps: Fail (gpu process memory increases, then decreases, process dies)
- Test 5 (World in HDR Vorbis): Converts using GPU/CPU? - Fails completely when switching to 1080px20mbps (Error code: s1001 (Network))
- Test 6 (World in HDR): Converts using GPU/CPU? - Fails completely when switching to 1080px20mbps (Error code: s1001 (Network))
- Test 7 (LG Colors of Journey 4K): Converts using GPU/CPU? - Fails completely when switching to 1080px20mbps (Error code: s1001 (Network))
- Test 8 (Costa Rica 4K): Pass with Max and Pass with 1080px20mbps? It played in both max and 1080px20mbps once I switched. However, CPU nor GPU spiked and there was no GPU process.
Plex Media Server Logs_2023-01-31_10-46-49.zip (1.3 MB)
As a note, when I was running Nvidia driver version 470, with PMS plexmediaserver_1.28.0.5999-97678ded3_amd64 I was able to have GPU transcoding on Roku devices and my Android phone without issue.
I also had tried my Android phone and it failed to transcode every video in the tests. This was working fine previously.
Plex Server version 1.30.2.6563 on Windows 10.
Plex Web version is 4.87.2.
NVIDIA GeForce GTX 1650, with Windows driver version 31.0.15.2824
-
test 1 - Pass. Flawless playing, no big spike in GPU, normal load
-
test 2 - Fail. Several second delay, started playing, freezes and then plays again, GPU decode load begins with a big spike, then nothing for a few seconds, then normal load.
-
test 3 - Fail. Small spike in GPU load, then no load at all, no playing.
-
test 4 - Pass ??? Several second delay, no GPU activity, then it starts playing with no issues with an initial large GPU load which quickly settles down to a normal GPU load.
-
test 5 - Pass. Instantly started playing normally, normal GPU load, no spikes.
Folks,
While I appreciate all the extra descriptions, please be terse ?
If you have a failure in any test but can make it work by increasing
nvdecExtraFrames then please annotate in your results.
e.g
Test 2 : FAIL (nvdecExtraFrames=2)
Test 2: (PASS - nvdecExtraFrames=4)
This makes it stand out for me so I can work it into the results I’m compiling.
PS: Please don’t try to “stress test” it by constantly changing playback bitrates. Doing so invalidates the data I’m collecting.
@ChuckPa How do I set nvdecExtraFrames on a Windows machine?
Do you have Preferences…xml ? I don’t think you do. If you do, set it there
It’s been my thought this has been a Linux-only thread because of how Windows manages hardware transcoding.
AFAIK — HW transcoding runs through DirectX on Windows. In Linux we talk to the hardware API directly.
Logs for these tests start around 11:50am 12:27pm
Nvidia Driver: 525.60.13
PMS 1.31.1.6638
nvdecExtraFrames=2
Kernel: 5.15.0-58-generic
GPU: Nvidia Tesla P4
Playback Quality: 20mbps 1080p
nvdecExtraFrames=2
Test 1: Fail (CPU encoding)
Test 2: Fail (Was transcoding on GPU at max convert, failed to transcode 1080px20mbps)
Test 3: Fail
Test 4: Fail (Was transcoding on GPU at max convert, failed to transcode 1080px20mbps)
Test 5: Fail (Error code: s1001 (Network))
nvdecExtraFrames=4
Test 1: Fail (CPU encoding)
Test 2: Fail (Was transcoding on GPU at max convert, failed to transcode 1080px20mbps)
Test 3: Fail
Test 4: Fail (Was transcoding on GPU at max convert, failed to transcode 1080px20mbps)
Test 5: Fail (Decodes on CPU, Encodes on GPU, completely fails at 1080px20mbps: Error code: s1001 (Network))
nvdecExtraFrames=6
Test 1: Fail (CPU encoding)
Test 2: Fail (Was transcoding on GPU at max convert, failed to transcode 1080px20mbps)
Test 3: Fail
Test 4: Fail (Was transcoding on GPU at max convert, failed to transcode 1080px20mbps)
Test 5: Fail (Decodes on CPU, Encodes on GPU, completely fails at 1080px20mbps: Error code: s1001 (Network))
nvdecExtraFrames=8
Test 1: Fail (CPU encoding)
Test 2: Fail (Was transcoding on GPU at max convert, failed to transcode 1080px20mbps)
Test 3: Fail
Test 4: Fail (Was transcoding on GPU at max convert, failed to transcode 1080px20mbps)
Test 5: Fail (Decodes on CPU, Encodes on GPU, completely fails at 1080px20mbps: Error code: s1001 (Network))
nvdecExtraFrames=16
Test 1: Fail (CPU encoding)
Test 2: Fail (Was transcoding on GPU at max convert, failed to transcode 1080px20mbps)
Test 3: Fail
Test 4: Fail (Was transcoding on GPU at max convert, failed to transcode 1080px20mbps)
Test 5: Fail (Decodes on CPU, Encodes on GPU, completely fails at 1080px20mbps: Error code: s1001 (Network))
For the Test 5 when switching to 1080px20mbps, I am receiving a few errors:
- [Req#7126/Transcode/3iobezh2q8rw407bjpbp1m1y/6736ac51-7e25-4cc8-a5fd-a85765fd75e3] [vp9 @ 0x7f27b69cb700] Failed setup for format cuda: hwaccel initialisation returned error.
- [Req#7113/Transcode/3iobezh2q8rw407bjpbp1m1y/6736ac51-7e25-4cc8-a5fd-a85765fd75e3] [vp9 @ 0x7f27b69cb700] Hardware is lacking required capabilities
These errors go along with the understanding that the GPU doesn’t support VP9 10bit, which is why it decodes on CPU originally.
@ChuckPa I think this issue does involve Windows as well. See plex-media-server-on-windows-with-nvidia-geforce-gtx-1650-will-not-play-some-hevc-media
More Windows info: VLC Media Player plays all 5 test files flawlessly and all with normal GTX 1650 GPU decode load. This HEVC playback issue manifests itself only in Plex Server’s web client. I am inclined to agree with @Adarnof that this is a web client issue. This could explain why this problem appears in such a disparate environment (Linux, Windows).
It could be two separate issues (1080p transcoding surface errors + web client transcoding failure), or it could be that something about the initialization of transcode session for a web stream is tripping the issues that the test videos are able to produce every time.
For fun I frankensteined my Tesla P4 into my desktop (I hear external GPUs are all the rage) so I can play with it in a VM, and the results are… confusing.
Distro: OMV 6
Kernel: 5.10.0-21
Graphics Card: Tesla P4
Driver: 515.86.01
Test 1: Fail
Test 2: Fail
Test 3: Fail
Test 4: Pass
Test 5: N/A
Distro: OMV 6
Kernel: 6.0.0-1
Graphics Card: Tesla P4
Driver: 515.86.01
Test 1: Fail
Test 2: Fail
Test 3: Pass
Test 4: Fail
Test 5: N/A
Distro: OMV 6
Kernel: 5.10.0-1
Graphics Card: 1080Ti
Driver: 515.86.01
Test 1: Fail
Test 2: Fail
Test 3: Fail
Test 4: Pass
Test 5: Pass
Distro: OMV 6
Kernel: 6.0.0-1
Graphics Card: 1080Ti
Driver: 515.86.01
Test 1: Fail
Test 2: Fail
Test 3: Fail
Test 4: Fail
Test 5: Pass
I did not repeat any tests for the above results to avoid “stress testing” the transcoder, so I’ll again caution I could be seeing random failures mixed in. But I also didn’t believe these results, so repeated tests 3 and 4 a few more times after, and they would sometimes work, sometimes fail - for now I’m calling this behavior randomness as I really don’t see a pattern to it.
Just for fun I tried these tests with the NVIDIA Shield android app as the client, as it was behaving differently when I started this thread. I performed these tests with the same configuration in my last test above, not even rebooting:
Client: NVIDIA Shield app
Distro: OMV 6
Kernel: 6.0.0-1
Graphics Card: 1080Ti
Driver: 515.86.01
Test 1: Fail
Test 2: Fail
Test 3: Pass
Test 4: Pass
Test 5: N/A (can’t request 1080p 20Mbps in the app with this video)
Sorry for the verbosity again. I don’t know that I’m properly isolating variable so I like to add in context. Now that I can switch distros, kernels, drivers, and GPUs with ease, are there any specific configurations you want me to check @ChuckPa ?
VLC is not applicable here as a comparison.
It’s a thick player (it reads the raw file and decodes everything internally then writes to the local display panel. ) PMS is a server which must process the data then send it to the player.
An Apples-Apples comparison would be:
VLC -vs- Plex Nvidia Shield app.
Both can read the raw media.
Nvidia Shield app outputs the HDMI directly to the TV.
Since you Frankenstein’d it in the VM.
- Stop PMS
- Rename “Library” → “Library.keep”
- Install PMS 1.29.2.6364 (I’ll give you a link if needed)
- Create a new “PMS-1-29-2” test server
- Add only the “Other Videos” section with the test files in it
- Retest
Distro: OMV 6
Kernel: 5.10.0-21
GPU: Tesla P4
Driver: 515.86.01
PMS: 1.29.2.6364
Test 1: Pass
Test 2: Pass
Test 3: Pass
Test 4: Pass
Test 5: N/A
Wow. This looks like a good combination. I haven’t re-tried any tests to see if random failures appear, but at least the 1080p transcodes are working.
EDIT: stress-testing it against orders, I had 3 failures in 50 attempts from the 4k h264 test footage and 4 in 50 from the 4k HEVC. Not perfect, but certainly acceptable: this is me trying to break it - I doubt I’d ever be impacted in normal use if that rate holds (how often do I request transcoding changes mid-stream, let alone multiple?).
Thank you. That puts the knife in the wall right where I expected it.
If you have the time ? 525.x.x drivers ?
If your system is like mine, 1.29.2.6364 + 525.60.13 will work just as flawlessly
Yup, passes the tests with driver 525.85.12.
EDIT: also passes on first attempt with kernel 6.0.0 + driver 525