Hardware transcoding issue

@ChuckPa
The downgrade to 1.29.x doesnt work because of database issues as well (the database migration never went through when trying to go back to 1.29.x). For now I am moving to CPU transcoding, as it seems to do a decent enough job without running into issues with some media.

What I am trying to get at is that saying its a bug in CUDA 12 should mean that going back to CUDA 11.7 would resolve the issue, which it does not (at least transcoding 1080p 8bit hevc → 1080p h264). When the issue is present in both 11.7 and 12 listing explanation #1 as a
CUDA 12 and a specific card issue" doesnt seem like a realistic explanation.

Regarding CUDA 11.7 and 8 bit HEVC. I tested it before recommending. If it didn’t work, I wouldn’t have done so.

As I said

I am only the support team. I’m trying to collect definitive data for Engineering.
That’s all I can do.

Given I cannot reproduce your problems and you can’t reproduce my results,

I have

  1. Ubuntu server 20.04.5. LTS (no GUI)
  2. Nvidia 515.86.01 and 525.60.13 (from Ubuntu distribution)
  3. PMS
  4. Cockpit (for running the server & NAS)
  5. LXC / LXD for running 3 containers. (no docker)

I don’t have other packages.

What’s the difference between your machine and mine?

@ChuckPa
Im running a redhat based instance (fedora) as a bare metal instance and you are running a debian based (ubuntu). Mine is also a headless server (no xwindows or DM installed) that runs plex, apache httpd, and sftp. I have dockerd (running pihole). For nvidia I am installing using the NVIDIA supplied binary. Other packages I have installed are ffmpeg and the various codecs from rpmfusion repos (installed as this machine was used for my normal outside of Plex transcoder/ripping machine for quite a while).

I remember fedora. I had used it from the FC-5 days until IBM took over and it started breaking for me every few months.

Having FFMPEG and rpmfusion installed is of no consequence and totally OK

Plex keeps all its codecs in “Plex Media Server/Codecs”. This keeps it fully isolated from the host.

Understanding how difficult a downgrade would be, might I suggest & request

  1. Stop PMS
  2. Renaming ‘Library’ → ‘Library.KEEP’
  3. Install 1.29.2.6364 from the link below
  4. Spin up a new TEST server, giving it a Friendly Name which is different than your existing server name.
  5. Add your smallest media directory you can ( “just enough” to test with)
  6. Let it spin up and settle
  7. Test

In case this helps, here’s my repro.
Content:

  • Codec HEVC
  • Bitrate 2727 kbps
  • Bit Depth 8
  • Chroma Location left
  • Chroma Subsampling 4:2:0
  • Coded Height 1080
  • Coded Width 1920
  • Color Primaries bt709
  • Color Range tv
  • Color Space bt709
  • Color Trc bt709
  • Frame Rate 23.976 fps
  • Height 1080
  • Level 4.0
  • Profile main
  • Ref Frames 1
  • Width 1920

System:
Ubuntu 22.04 VM on Proxmox with a Nvidia T400 (PCIe passthrough)
Driver: latest nvidia-driver-525 from the official ubuntu repo
Plex: 1.30.2.6563

Symptom:
Video fails to start when hw transcoding is enabled.

Error from log:
Feb 06, 2023 23:41:29.581 [0x7f3c2d676b38] ERROR - [Req#87b5/Transcode/obimdiradcsycrcaqnco6dmn/b225f38e-0b36-4ae7-b312-27a7534b4648] [hevc @ 0x7fe09024e700] No decoder surfaces left
Feb 06, 2023 23:41:29.582 [0x7f3c2c803b38] ERROR - [Req#87b6/Transcode/obimdiradcsycrcaqnco6dmn/b225f38e-0b36-4ae7-b312-27a7534b4648] [hevc @ 0x7fe09024e700] decoder->cvdl->cuvidDecodePicture(decoder->decoder, &ctx->pic_params) failed → CUDA_ERROR_INVALID_VALUE: invalid argument
Feb 06, 2023 23:41:29.582 [0x7f3c2e891b38] ERROR - [Req#87b7/Transcode/obimdiradcsycrcaqnco6dmn/b225f38e-0b36-4ae7-b312-27a7534b4648] [hevc @ 0x7fe09024e700] hardware accelerator failed to decode picture
Feb 06, 2023 23:41:29.582 [0x7f3c2d06db38] ERROR - [Req#87b8/Transcode/obimdiradcsycrcaqnco6dmn/b225f38e-0b36-4ae7-b312-27a7534b4648] Error while decoding stream #0:0: Generic error in an external library

@ChuckPa

Alright so ive done a bit more testing with just ffmpeg after I saw a post on another media server forum reporting similar issues. It appears it may not be a plex issue but an issue within ffmpeg and CUDA. I have now been able to replicate the issue with my local ffmpeg on two separate machines with different nvidia cards and different ffmpeg versions also different CUDA versions leveraging CUDA 11.7 and 12.

I can replicate the issue in ffmpeg directly by passing -hwaccel (auto|cuda|nvdec) and -hwaccel_output_format cuda. Passing other combinations results in the hw transcode happening without issue.

Example commands below
Truncated plex command

/usr/lib/plexmediaserver/Plex Transcoder -codec:0 hevc -hwaccel:0 nvdec -hwaccel_fallback_threshold:0 10 -threads:0 1 -hwaccel_output_format:0 cuda -hwaccel_device:0 cuda -analyzeduration 20000000 -probesize 20000000 -i /srv/ftp/movies/test/jellyfish-30-mbps-hd-hevc.mkv -filter_complex [0:0]hwupload[0];[0]scale_cuda=w=1920:h=1080:format=nv12[1] -map [1] -codec:0 h264_nvenc -b:0 20000k -preset:0 hq

Generates the errors that were discussed

Taking what I saw there I created an ffmpeg command that is very basic but results in generating the same errors

ffmpeg -hwaccel nvdec -hwaccel_output_format cuda -i /srv/ftp/movies/test/jellyfish-30-mbps-hd-hevc.mkv -c:v h264_nvenc test.mkv 

Fails with the same errors as seen in pms logs. Below is truncated output that ffmpeg spat out (the Invalid DTS is something I didnt see in pms logs but the other messages are the same)

[matroska @ 0x5589ef3c7ec0] Invalid DTS: 27928 PTS: 27861 in output stream 0:0, replacing by guess
Error while decoding stream #0:0: Generic error in an external library
    Last message repeated 1 times
[hevc @ 0x5589f0144940] No decoder surfaces left
[hevc @ 0x5589f0144940] decoder->cvdl->cuvidDecodePicture(decoder->decoder, &ctx->pic_params) failed -> CUDA_ERROR_INVALID_VALUE: invalid argument
[hevc @ 0x5589f0144940] hardware accelerator failed to decode picture
Error while decoding stream #0:0: Generic error in an external library
[hevc @ 0x5589f0207200] No decoder surfaces left
[hevc @ 0x5589f0207200] decoder->cvdl->cuvidDecodePicture(decoder->decoder, &ctx->pic_params) failed -> CUDA_ERROR_INVALID_VALUE: invalid argument
[hevc @ 0x5589f0207200] hardware accelerator failed to decode picture
Error while decoding stream #0:0: Generic error in an external library
[hevc @ 0x5589f02c9ac0] No decoder surfaces left
[hevc @ 0x5589f02c9ac0] decoder->cvdl->cuvidDecodePicture(decoder->decoder, &ctx->pic_params) failed -> CUDA_ERROR_INVALID_VALUE: invalid argument
[hevc @ 0x5589f02c9ac0] hardware accelerator failed to decode picture
[hevc @ 0x5589f038c380] Could not find ref with POC 891
[hevc @ 0x5589f038c380] No decoder surfaces left
[hevc @ 0x5589f038c380] decoder->cvdl->cuvidDecodePicture(decoder->decoder, &ctx->pic_params) failed -> CUDA_ERROR_INVALID_VALUE: invalid argument
[hevc @ 0x5589f038c380] hardware accelerator failed to decode picture
[hevc @ 0x5589f044ec40] Could not find ref with POC 889
[hevc @ 0x5589f044ec40] No decoder surfaces left
[hevc @ 0x5589f044ec40] decoder->cvdl->cuvidDecodePicture(decoder->decoder, &ctx->pic_params) failed -> CUDA_ERROR_INVALID_VALUE: invalid argument
[hevc @ 0x5589f044ec40] hardware accelerator failed to decode picture
Error while decoding stream #0:0: Generic error in an external library
    Last message repeated 4 times
[matroska @ 0x5589ef3c7ec0] Invalid DTS: 28962 PTS: 28929 in output stream 0:0, replacing by guess
[matroska @ 0x5589ef3c7ec0] Invalid DTS: 29496 PTS: 29463 in output stream 0:0, replacing by guess

Modifying that command and replacing the explicit CUDA with auto or nvdec solves the problem or simply removing the explicit callout to hwaccel_output_format also resolves the issue

ffmpeg -hwaccel nvdec -hwaccel_output_format auto -i /srv/ftp/movies/test/jellyfish-30-mbps-hd-hevc.mkv -c:v h264_nvenc test.mkv
ffmpeg -hwaccel nvdec -hwaccel_output_format nvdec -i /srv/ftp/movies/test/jellyfish-30-mbps-hd-hevc.mkv -c:v h264_nvenc test.mkv
ffmpeg -hwaccel nvdec -i /srv/ftp/movies/test/jellyfish-30-mbps-hd-hevc.mkv -c:v h264_nvenc test.mkv

All these show the same type of performance on my hardware that plex runs on resulting in a 14x conversion

Can you guys make the -hwaccel_output_format a configurable value or simply let ffmpeg choose the solution by using auto or leave that out of the args you are passing to the transcoder?

2 Likes

So based on the other options you guys are running I see that the change would not be very easy. Though I would still ask if it would be possible to have a setting to leverage NVDEC versus CUDA as the decoder. While its not directly leveraging the CUDA api it can levarage CUDA cores to do the work and is an API that spun off from CUDA. Though it still appears that its something in FFMPEG and since Im not sure what all changes you guys do with compiling yours I havent been able to get a 1 for 1 example command to trigger based on the builds I am using locally.

While I can get 8bit hevc to decode using CUDA in some circumstances depending on various filter_complex settings without generating the error I cant seem to be able to scope it down to a specific thing that may work with the plex build.

1 Like

apologies… see youve been working. greatly appreciated. very much under the weather. hoping tomorrow

Mine is pretty “similar” : Redhat based (Almalinux), headless server and running Plex & NGINX. The only difference is that mine is a VM with GPU passthrough but I don’t think it makes a difference from the OS POV. I’m wondering if it has something to do with headless servers as in my case, the card used for the server console is the virtual one.

Wow, thanks for the time invested! I’m not that advanced in Linux to test that so deep. :blush::ok_hand:

Thank you for all the hard work, Chuck.

A couple of questions if possible please:

  • Is this a permanent fix, or a workaround to a break in the Nvidia drivers?
  • Do you happen to know when will the fixed version be released?

Many thanks in advance!

The working theory here (at least with me)

  1. Nvidia’s not going to break their driver AND leave it ‘broken’ long. HOWEVER, they have been rolling out version updates at a quick pace so :person_shrugging: ??

Ref:

Version: 525.89.02
Release Date: 2023.2.8
Operating System: Linux 64-bit
Language: English (US)
]$ nvidia-smi; tail -10 /var/lib/plexmediaserver/Library/Application\ Support/Plex\ Media\ Server/Logs/Plex\ Media\ Server.log
Fri Feb 10 16:29:17 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.89.02    Driver Version: 525.89.02    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0 Off |                  N/A |
|  0%   43C    P0    46W / 130W |    232MiB /  6144MiB |      2%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A    549492      C   ...diaserver/Plex Transcoder      228MiB |
+-----------------------------------------------------------------------------+
Feb 10, 2023 16:29:16.072 [0x7f8b5c31ab38] ERROR - [Req#beb/Transcode/4a1pcc3np2opc31szbqzglgy/01dd33db-78b9-46a7-bc23-02355df847b2] [hevc @ 0x7f4da08b5240] Could not find ref with POC 897
Feb 10, 2023 16:29:16.072 [0x7f8b5de0db38] ERROR - [Req#bec/Transcode/4a1pcc3np2opc31szbqzglgy/01dd33db-78b9-46a7-bc23-02355df847b2] [hevc @ 0x7f4da08b5240] No decoder surfaces left
Feb 10, 2023 16:29:16.072 [0x7f8b5c31ab38] ERROR - [Req#bed/Transcode/4a1pcc3np2opc31szbqzglgy/01dd33db-78b9-46a7-bc23-02355df847b2] [hevc @ 0x7f4da08b5240] decoder->cvdl->cuvidDecodePicture(decoder->decoder, &ctx->pic_params) failed -> CUDA_ERROR_INVALID_VALUE: invalid argument
Feb 10, 2023 16:29:16.072 [0x7f8b5de0db38] ERROR - [Req#bee/Transcode/4a1pcc3np2opc31szbqzglgy/01dd33db-78b9-46a7-bc23-02355df847b2] [hevc @ 0x7f4da08b5240] hardware accelerator failed to decode picture
Feb 10, 2023 16:29:16.073 [0x7f8b5c31ab38] ERROR - [Req#bef/Transcode/4a1pcc3np2opc31szbqzglgy/01dd33db-78b9-46a7-bc23-02355df847b2] Error while decoding stream #0:0: Generic error in an external library
Feb 10, 2023 16:29:16.073 [0x7f8b5de0db38] ERROR - [Req#bf0/Transcode/4a1pcc3np2opc31szbqzglgy/01dd33db-78b9-46a7-bc23-02355df847b2] [hevc @ 0x7f4da08b5240] No decoder surfaces left
Feb 10, 2023 16:29:16.073 [0x7f8b5c31ab38] ERROR - [Req#bf1/Transcode/4a1pcc3np2opc31szbqzglgy/01dd33db-78b9-46a7-bc23-02355df847b2] [hevc @ 0x7f4da08b5240] decoder->cvdl->cuvidDecodePicture(decoder->decoder, &ctx->pic_params) failed -> CUDA_ERROR_INVALID_VALUE: invalid argument
Feb 10, 2023 16:29:16.073 [0x7f8b5de0db38] ERROR - [Req#bf2/Transcode/4a1pcc3np2opc31szbqzglgy/01dd33db-78b9-46a7-bc23-02355df847b2] [hevc @ 0x7f4da08b5240] hardware accelerator failed to decode picture
Feb 10, 2023 16:29:16.074 [0x7f8b5c31ab38] ERROR - [Req#bf3/Transcode/4a1pcc3np2opc31szbqzglgy/01dd33db-78b9-46a7-bc23-02355df847b2] Error while decoding stream #0:0: Generic error in an external library
Feb 10, 2023 16:29:17.105 [0x7f8b60485b38] ERROR - Unknown metadata type: 

So that tells me there isn’t enough decode memory.

nvdecExtraFrames="x". The default is “1”. What happens, other than GPU card memory usage increasing, to playback as you increase 1,2,4,8, etc?

@ChuckPa is that setting valid in 1.31.0.6654 or do I need to go back and install 1.31.1.6617 again step through testing that again?

That setting is valid in all versions (until fully resolved) for all versions since build 6617

10 min max but i have 8*2 core cpu

@mwf369 & @plex_famverhaegen.be

The amount of time required to complete any databases up/down migration

(the “503 - Maintenance” messge)

Is directly dependent on Disk I/O and number of records in the DB.

@mwf369 Fair comment from @ChuckPa … :: I did do a optimise DB and clean bundles AND the DB is on a SSD not a regular disk [got not a small db]. i might have oversized my new NAS setup :slight_smile:

It is partial hardware and partial CPU, where as using the CUDA API ensures its all hardware based. I have no problem using FFMPEG directly with specific options, I can trigger the same thing that Plex shows by choosing other specific options. I cannot just change how plex calls their build of ffmpeg for transcoding. Having an option to levarage nvdec versus cuda would allow us users running into this issue with 1080p 8bit hevc files to still use hardware transcoding without falling back to 100% cpu based transcoding to ensure our users dont run into the issue.

But yes utilizing CUDA ensures full hardware decoding, nvdec will use cuda cores for parts but doesnt ensure its all done on the hardware… or at least thats how I understand NVDEC.

@ChuckPa Is it expected that there is no nvdecExtraFrames option visible when checking the /:/prefs XML ? Tested 1.31.1.6617, 1.31.1.6638 and 1.31.1.6641 so far.

Doing PUT to /:/prefs?nvdecExtraFrames=1 returns cannot set preference value for unknown preference nvdecExtraFrames on all 3 versions.

I know I can edit the XML, but I wanted to verify it’s taking effect…