HW Transcoding fails when selecting anything other than "Convert Maximum": No enc/dec-oder found suddenly

Server Version#: 1.18.4.2171
Player Version#: Chrome WebApp & PMP 1.5.0.951

Tl,dr:
HW acceleration is set up in PMS instance in Win10 VM. When viewing using Chrome webapp, “convert automatically” works, selecting a quality level manually stops playback (no error) and PMS GPU utilization drops to 0%. When viewing with Plex Media Player, original playback works, selecting any transcode quality causes “Unknown Error -17”. PMP logs show “Failed to recognize file format.” and PMS logs show “hardware transcoding: enabled, but no hardware decode accelerator found” even though it was just working when using the webapp. Not limited to particular file resolution or codec.

Not so “tl,dr”:

Hello hello hello! First off, thanks for even entertaining looking at this. Second, I may just be adding to a long list of issues people are having with hardware transcoding.

So, in the process of upgrading my homelab, I’ve built an R620 into a Proxmox box with a Quadro P2200. I’ve got a Win10 Pro (1909) VM with the GPU passed through, fully updated, running PMS 1.18.4.2171 (fresh, no import of previous database or anything). I’ve got this box connected to an R720XD running FreeNAS, connect directly via 10gb DAC. I’ve scraped in a test folder filled with movies to do some testing, I have the latest Quadro drivers (441.66), and I have “HW acceleration when available” and “Use hardware encoding” on. All of my testing below is done locally (same LAN).

The curiousness is that when I load a file (h.264 or HEVC/h.265, 4K or 1080p), it starts to play fine. Given that I’m testing with the Chrome WebApp, H.265 4k files transcode to h.264. Using “Convert Automatically” works fine. When I try to manually select a quality setting, the load on my server spikes to 100% for the briefest of seconds, drops to 0%, and never starts to play again. I tested this with multiple files: two 4K HEVC movies and a 1080p h.264 file. What is very odd is that nvenc and nvdec are selected as the enc/dec-oder when using “convert automatically” in the webapp but, after I select a quality level, I get the following lines:

Jan 16, 2020 23:09:27.246 [1456] DEBUG - TPU: hardware transcoding: enabled, but no hardware decode accelerator found
Jan 16, 2020 23:09:27.246 [1456] DEBUG - [Universal] Using local file path instead of URL: <movie with a guy in a metal suit>.mkv
Jan 16, 2020 23:09:27.262 [1456] DEBUG - TPU: hardware transcoding: final decoder: , final encoder: 

Other than that, I don’t see any other blatant issues, but I’m not well-versed in reading PMS logs. Full logs referenced here is titled “Plex Media Server Logs_2020-01-16_23-11-43”.
Plex Media Server Logs_2020-01-16_23-11-43.zip (364.4 KB)

Then, I decided to test this on PMP on the same computer I was using for the plex webapp (Win10, i5 6600k, GTX1070). Original quality 4k plays fine. Then, only for some files, converting 4k to any other selected quality is either choppy (freezes for a half second every 3 or 4 seconds) or throw an “unknown error -17” message. The PMP logs show the following:

2020-01-17T00:20:37.280 [ ERROR ] [MPVEngine] unrecognized file format. 
2020-01-17T00:20:37.280 [ DEBUG ] [QHotkeyinput] Playback state is now 'Stopped' 
2020-01-17T00:20:37.280 [ DEBUG ] cplayer: finished playback, unrecognized file format (reason 4) 
2020-01-17T00:20:37.280 [ ERROR ] cplayer: Failed to recognize file format. 
2020-01-17T00:20:37.280 [ INFO  ] cplayer:  
2020-01-17T00:20:37.280 [ WARN  ] [Web] [Player] A critical error occurred: -17 An unknown error occurred (-17) 

PMP logs referenced here are titled “PMP 17-Jan 1221.log”.
PMP 17-Jan 1221.log (1.2 MB)

I decided to restart PMS, clear the logs, and retest again with PMP to clean up the logs a bit and see what shows. I did this multiple times with different test files and I’ll try to outline them here:

  1. 4k HDR HEVC file
    a. Original Quality played first, plays fine.
    b. 1080p 20mbps transcode selected: error -17
    c. PMS log: Plex Media Server Logs_2020-01-17_00-54-27.zip (249.1 KB)

  2. Different 4k HDR HEVC file
    a. Original Quality played first, plays fine.
    b. 1080p 20mbps transcode selected: plays fine for a few seconds, then crashes out of playback.
    c. PMS log: Plex Media Server Logs_2020-01-17_00-58-24.zip (257.6 KB)

  3. 1080p H.264 file
    a. Original Quality played first, plays fine.
    b. 1080p 8mbps transcode selected: error -17
    c. PMS log: Plex Media Server Logs_2020-01-17_01-02-49.zip (272.1 KB)

  4. First 4k HDR HEVC file, hardware transcoding/encoding all off
    a. Original Quality played first, plays fine.
    b. 1080p 20mbps transcode selected: playback is choppy but never throws an error nor stops. Overall CPU usage on PMS server hovers around 50%. I should state that this Win10 VM running PMS is alotted 12 cores of a pair of e5-2630L V2s and 32gb of memory.
    c. Plex Media Server Logs_2020-01-17_01-12-47.zip (250.6 KB)

  5. Second 4k HDR HEVC file, hardware transcoding/encoding all off (CPU only)
    a. Original Quality played first, plays fine.
    b. 1080p 20mbps transcode selected: playback is smooth, no issues. Overall CPU usage on PMS server hovers around pegs to 100%.
    c. Plex Media Server Logs_2020-01-17_01-22-37.zip (5.8 MB)

I will state that this is a new type of set up for me as my current “production” setup is an unRAID docker with only CPU transcoding so maybe I’m missing something obvious. I’ll happily do more testing if needed. Plus, I can provide a MediaInfo print out of each of the files used, if that may be useful and/or relevant, I’m just out of time and it’s already far too late for a work night.

Any recommendations on what’s causing this, known issue or not, or things to look for in logs to better analyze, would be greatly appreciated. And I am open to recommendations of switching VM OS, I just picked Windows as I know it best and has full driver support out of the box.

Cheers

EDIT: slight rewriting for clarity, addition of TL,DR.

For the Windows VM how did you verify the GPU was passed thru?

Transcoding doesn’t work if you set Plex to run as a service or logged it via RDP.

The GPU shows up under display adapters, it was recognized by the graphics driver installer, and it shows in task manager. It also shows usage when transcoding when “convert automatically”.

I’m not sure how to set it up as a service, and unless the default installer sets it up as a service automatically, that is not the case. As for the RDP comment, I’m not sure what you mean.

Just to reiterate, hardware transcoding does work when viewing in the Chrome Webapp, but only when set to “Convert automatically”. I confirm this by watching the utilization level of the GPU in task manager in the VM. When I select a quality level manually, playback stocks and GPU utilization drops to 0%. When using Plex Media Player, I just get the error -17 and a “file format not recognized” in the PMP logs, coinciding with an inability of PMS to actually utilize nvenc/ndvec.

Bump, hoping someone has an idea.

How are you starting Plex in this VM?

The drivers do a monitor check for a HDCP display device which is why most people have to put a dummy plug on the video card. Since you are running Proxmox have you considered switching to a Debian 10 VM, it would be much cleaner.

It starts automatically when the VM loads (ie: when windows starts).

It appears I don’t need the HDCP adapter as the GPU is doing transcoding, and not the CPU (confirmed by viewing Task Manager in the VM), initially but then fails or causes a “file format issue” when manually selecting a quality in the webapp or in PMP.

I am considering switching to a different VM OS but I want to better understand why this ISN’T working before I abandon Win10. I just don’t understand why it’s partially working.

Please be specific on how Plex starts up.

Do you have it set to start as a service? Hardware Transcoding is un-reliable with this method.
Do you have a scheduled process to login as the Plex user and start Plex or auto login?

Plex Media Player 2.48 and Plex for Windows 1.5 are two different programs, are you using both?

PMS is not running as a service. I installed it using the default Plex installer (exe) and have not set it up as a service. EDIT: The only windows account on the VM does not have a password set and thus logs in automatically. Once logged in, PMS is set to start automatically using the in-program setting, not a scheduled task or service.

Yes, I used either PMP or the Chrome web portal to test the functionality of transcoding from the PMS instance I am testing. The ways in which each “viewer” fails is detailed in my initial post.

EDIT: Added detail to “how does PMS start” paragraph.
EDIT2: spelling error.

Bump. Still looking for help on this.

Is there a monitor connected to the gpu?

If not connect one.

Or get a dummy hdmi plug.

Just to rule it out, I got a dummy DisplayPort plug (P2200 don’t have HDMI, only DP). No change:

  • WebApp w/4k test file:
    • Convert maximum works
    • Any manually selected lower quality fails and GPU utilization drops to 0%
  • Plex Media Player w/4k test file:
    • Original playback works fine
    • Any conversion level gives “Error -17”

EDIT: Removed the note about network. Stuttering is occurring when playing the file locally. Seems that, when HW transcoding, it’s not doing it in real-time, just slightly slower, so eventually it catches up and has to keep buffering.

EDIT2: Seems like I might be having something at least similar to the issue described in this particular comment in this thread, where sometimes HW decoding/encoding is found and then it isn’t. Is the waiting for the transcoder update, as described in that linked thread, still pending? Or is that update in reference to the update that brought in NVDEC for Windows? If this is still a “hang in there” issue, I just want to know that before I start blaming hardware.

Edited previous reply.

Bump.

I didn’t see anything about using RDP, are you using it to access the windows vm?

If so, you might try not using it since it screws up hw transcoding.

RDP = remote windows desktop

Specifically RDP, no. I have been using AnyDesk to connect to the VM as the QEMU display is disabled. Though I will try disabling RDP all together to see if that helps.

Edit: it was off already.

looked through your first log, not sure what to make of it,

there is this error

Jan 17, 2020 00:54:09.718 [6676] DEBUG - Codecs: testing h264_qsv (encoder)
Jan 17, 2020 00:54:09.718 [6676] DEBUG - Codecs: hardware transcoding: testing API qsv
Jan 17, 2020 00:54:09.749 [6676] ERROR - [FFMPEG] - Error initializing an MFX session: -3.
Jan 17, 2020 00:54:09.749 [6676] DEBUG - Codecs: hardware transcoding: opening hw device failed - probably not supported by this system, error: Unknown error occurred
Jan 17, 2020 00:54:09.749 [6676] DEBUG - Codecs: testing h264_nvenc (encoder)
Jan 17, 2020 00:54:09.749 [6676] DEBUG - Codecs: hardware transcoding: testing API nvenc
Jan 17, 2020 00:54:09.749 [7656] DEBUG - Failed to stream media, client probably disconnected after 249020416 bytes: 10054 - An existing connection was forcibly closed by the remote host

then what looks like testing for hw decode/encoder

attempts to stream

Jan 17, 2020 00:54:15.655 [9388] DEBUG - Streaming Resource: Reached Decision id=246 codes=(General=1001,Direct play not available; Conversion OK. Direct Play=3000,App cannot direct play this item. Direct play is disabled. Transcode=1001,Direct play not available; Conversion OK.) media=(id=280 part=(id=281 decision=transcode container=mkv protocol=http streams=(Video=(id=10009 decision=transcode bitrate=18097 encoder=h264_nvenc width=1920 height=1080) Audio=(id=10010 decision=transcode bitrate=908 encoder=libopus channels=8 rate=48000))))
Jan 17, 2020 00:54:15.655 [7656] DEBUG - Completed: [10.10.11.64:61447] 200 GET /video/:/transcode/universal/decision?hasMDE=1&path=%2Flibrary%2Fmetadata%2F246&mediaIndex=0&partIndex=0&protocol=http&fastSeek=1&directPlay=0&directStream=0&subtitleSize=100&audioBoost=100&location=lan&maxVideoBitrate=20000&directStreamAudio=0&session=1mbcs8ysdv5i11crdk5pz842&offset=922&subtitles=auto&copyts=1&Accept-Language=en (14 live) TLS GZIP 3790ms 5114 bytes (pipelined: 1)
Jan 17, 2020 00:54:15.812 [6872] DEBUG - Job was already killed, not killing again.
Jan 17, 2020 00:54:15.812 [6872] DEBUG - Stopping transcode session 1mbcs8ysdv5i11crdk5pz842
Jan 17, 2020 00:54:15.812 [4268] DEBUG - Jobs: 'C:\Program Files (x86)\Plex\Plex Media Server\Plex Transcoder.exe' exit code for process 1984 is -1059143458 (intentional termination)

then whether it was terminated by user/client/server…

can you open a cmd window and type

nvidia-smi

then alt-prntscreen and paste into a reply?

should look something like this (I am not using plex on this machine)

below is nvidia-smi on my plex server running linux (3 remote 720>720/sd transcoders).

image


if you are comfortable with linux, for doo-doo and laffs, try to spin up a vm or container with minimal debian, install a recent linux nvidia driver from @ here

you don’t need the patch with a quadro, but is a good reference for known working drivers.

make sure that nvidia-smi looks something like mine above, install a test plex server, with 1 or 2 problem files, see if it works.

this will at least help you identify if it is specific to your windows vm.

if it fails the same or similarly in linux, then you probably have some kind of either virtualization problem, or a hardware problem.


for what its worth, my linux server above is proxmox but I have pms running baremetal on proxmox itself (no virtualization or container), which is something you could try instead of a virtual debian box.

Yeah, the logs get stranger if, in some of them, you notice that detecting the hw encoders works but then doesn’t, corresponding to when I manually select a quality.

As for the SMI printout, here is what I get:
image

I am currently creating a whole new, fresh Win10 Pro VM instance just to see what happens. But I’ll try out Debian next.

That info rom error looks scary.

Yeahhhhhhh. I’m trying to dig into it now. All I’m finding is vague information of the sort “something with the hardware is $*@#&ed” which has reignited fears that a bit of stupidity in the install of the P2200 led me to damage the hardware somehow.

Well, time to try the new VM(s). I will report back when I have done the testing.

Eeeew. I know that feeling. Perhaps ruling out hardware should be the next step then.