Sync and transcode stops working after first transcode when steam limit set to 1 to match HW

Server Version#: 1.30.0.6486
Player Version#: iOS 8.13

I am seeing an issue with using NVIDIA hardware transcoding and sync/download vidoes. When I restart plex the first video works and is transcoded correctly and download. the 2nd video fails to start.
Every 2.0s: nvidia-smi -i 0 -q -d UTILIZATION hal: Tue Dec 20 12:39:21 2022

This is the first video, if stop the download (cancel from device) and re request a sync

==============NVSMI LOG==============

Timestamp                                 : Tue Dec 20 12:39:21 2022
Driver Version                            : 525.60.11
CUDA Version                              : 12.0

Attached GPUs                             : 2
GPU 00000000:0F:00.0
    Utilization
        Gpu                               : 11 %
        Memory                            : 13 %
        Encoder                           : 7 %
        Decoder                           : 34 %
    GPU Utilization Samples
        Duration                          : 11.71 sec
        Number of Samples                 : 71
        Max                               : 18 %
        Min                               : 10 %
        Avg                               : 14 %
    Memory Utilization Samples
        Duration                          : 11.71 sec
        Number of Samples                 : 71
        Max                               : 22 %
        Min                               : 13 %
        Avg                               : 18 %
    ENC Utilization Samples
        Duration                          : 11.71 sec
        Number of Samples                 : 71
        Max                               : 12 %
        Min                               : 6 %
        Avg                               : 9 %
    DEC Utilization Samples
        Duration                          : 11.71 sec
        Number of Samples                 : 71
        Max                               : 59 %
        Min                               : 33 %
        Avg                               : 48 %

this is what happens.

==============NVSMI LOG==============

Timestamp                                 : Tue Dec 20 12:41:00 2022
Driver Version                            : 525.60.11
CUDA Version                              : 12.0

Attached GPUs                             : 2
GPU 00000000:0F:00.0
    Utilization
        Gpu                               : 0 %
        Memory                            : 0 %
        Encoder                           : 0 %
        Decoder                           : 1 %
    GPU Utilization Samples
        Duration                          : 11.71 sec
        Number of Samples                 : 71
        Max                               : 1 %
        Min                               : 0 %
        Avg                               : 0 %
    Memory Utilization Samples
        Duration                          : 11.71 sec
        Number of Samples                 : 71
        Max                               : 1 %
        Min                               : 0 %
        Avg                               : 0 %
    ENC Utilization Samples
        Duration                          : 11.71 sec
        Number of Samples                 : 71
        Max                               : 0 %
        Min                               : 0 %
        Avg                               : 0 %
    DEC Utilization Samples
        Duration                          : 11.71 sec
        Number of Samples                 : 71
        Max                               : 3 %
        Min                               : 0 %
        Avg                               : 0 %

I see there is an existing guid

Dec 20, 2022 12:40:37.203 [0x7fca88dc4b38] DEBUG - [Req#ec7/Transcode] Found session GUID of 3f501e3a6d1a7fda3f449c5896fe987714d94dfa in session start.
Dec 20, 2022 12:40:37.203 [0x7fca88dc4b38] DEBUG - [Req#ec7/Transcode] Using existing transcode session.
Dec 20, 2022 12:40:37.203 [0x7fca88dc4b38] DEBUG - [Req#ec7/Transcode] Activity: registered new activity a3252e0d-fed6-4ffc-a67b-f9e8726e3df0 - ""
Dec 20, 2022 12:40:37.203 [0x7fca88dc4b38] DEBUG - [Req#ec7/Transcode] Activity: updated activity a3252e0d-fed6-4ffc-a67b-f9e8726e3df0 - completed -1.0% - Media download by manojav
Dec 20, 2022 12:40:37.204 [0x7fca88dc4b38] DEBUG - Content-Length is -1 (of total: -1).
Dec 20, 2022 12:40:37.205 [0x7fca86481b38] DEBUG - [TranscodeOutputStream] Input processing thread started at offset 0 for -1 bytes.
Dec 20, 2022 12:40:37.437 [0x7fca893cdb38] DEBUG - [Req#ee9/Transcode/3f501e3a6d1a7fda3f449c5896fe987714d94dfa/6842f575-ed69-44c6-bd71-081495944081] Session 3f501e3a6d1a7fda3f449c5896fe987714d94dfa (0) is throttling
Dec 20, 2022 12:40:37.437 [0x7fca8798db38] DEBUG - [Req#f29/Transcode/3f501e3a6d1a7fda3f449c5896fe987714d94dfa/6842f575-ed69-44c6-bd71-081495944081] Throttle - Going into sloth mode.

However when first transcoded sycn finishes or is cancelled there is no sessions left in the transcode folder.

If if restart plex everything works for the very first transcode.

However when I watch movies that are transcoded things work fine, as stop and start movies. This is only an issue on sync.

I should add the sync with transcode doesn’t work even when hardware encode turned off until plex server is restarted. then again it works for the first video synced and then stops

Suspect its related to this

any idea when this will be fixed?

Full debug logs please which capture this ? (Can’t see what’s happening without them)

Also, which card please ?

I’ve seen several report problems with the 525.60. drivers

They are the new CUDA 12.0 protocol – which seems to have some issues.

I currently use 510.85.02, which is more than enough for even AV1 decode, and has proven to be extremely stable with PMS.

Lastly, Given you state: “Stops working after first transcode”

  1. The drivers
  2. update-initramfs -u followed by a restart (I’ve seen this step get missed by automatic driver installers recently.

DELETED ATTACHMENT WILL ADD TO PM

Driver Version                            : 525.60.11
CUDA Version                              : 12.0
  1. I will confirm this and make sure update the ramfs, but it looks like it took it and I see the compile and update to initramfs.

However setting the limit to Unlimited seems to have addressed the issue.

It almost feels that once the first encode / transcode is complete the counter isn’t reset, as result the 2nd one cannot start it still thinks there is one in progress and its running in Sloth mode.

I will downgrade to 515.85.02. I was running this version, that I then upgraded to 525, can go back no issues.

Sorry for the piecemeal asnwers!

Card is GTX 1050 Ti.

There are no issues with hardware based stream transcoding for viewing on web etc. They work great. its just during sync/download that setting it to 1/2 stream max (which is close to what my card do) is causing the sessions not to get reset.

Going back to 515 is a great step.

515 is still the 11.x protocol family which PMS has been proven reliable with.

Nvidia drivers 525 are supposed to be 11.x compatible but this first release of it doesn’t seem to be all that compatible.

Cool. Running 515 now. But still without setting the

image

Things dont really work, only very first transcode that requires hw works, and then everything else is sloth mode.

Sloth mode = It has reached the high water mark (Output blocks) for that stream but the stream isn’t done yet.

“Getting back to work” = It’s reached the low water mark and starts filling the output buffer again.

What’s the OS underneath this container and did you upadate initramfs after downgrading the Nvidia drivers ?

In the log you provided above,

[chuck@lizum Downloads.2009]$ cat  Plex\ Media\ Server.log | col -b | grep TPU
Dec 20, 2022 12:38:52.865 [0x7fca88dc4b38] DEBUG - [Req#3cc/Transcode] TPU: hardware transcoding: using hardware decode accelerator nvdec
Dec 20, 2022 12:38:52.865 [0x7fca88dc4b38] DEBUG - [Req#3cc/Transcode] TPU: hardware transcoding: zero-copy support present
Dec 20, 2022 12:38:52.865 [0x7fca88dc4b38] DEBUG - [Req#3cc/Transcode] TPU: hardware transcoding: using zero-copy transcoding
Dec 20, 2022 12:38:52.866 [0x7fca88dc4b38] DEBUG - [Req#3cc/Transcode] TPU: hardware transcoding: final decoder: nvdec, final encoder: nvenc
Dec 20, 2022 12:40:36.712 [0x7fca8798db38] DEBUG - [Req#ebd/Transcode] TPU: hardware transcoding: using hardware decode accelerator nvdec
Dec 20, 2022 12:40:36.712 [0x7fca8798db38] DEBUG - [Req#ebd/Transcode] TPU: hardware transcoding: zero-copy support present
Dec 20, 2022 12:40:36.712 [0x7fca8798db38] DEBUG - [Req#ebd/Transcode] TPU: hardware transcoding: using zero-copy transcoding
Dec 20, 2022 12:40:36.712 [0x7fca8798db38] DEBUG - [Req#ebd/Transcode] TPU: hardware transcoding: final decoder: nvdec, final encoder: nvenc
[chuck@lizum Downloads.2010]$ 

You can see it working every time. Not sure where you’re not seeing it working?

May I have a fresh set of logs ZIP by PMS please?

At the moment I have it working and I am downloading/sync-ing some movies. so once I set the streams back to 1 or 2. I can reproduce it and I will send you the zipped up log. The current work around is set the limit to unlimited.

See the initial post and nvidia-smi output showing that its hardly doing anything. I have some other cameras being decoded on that card which take up about ~500 megs of VRAM on the card.

The OS underneath is Ubuntu 22.04 and yes I did do the initramfs and reboot after the downgrade to 515

Also why does it go into sloth mode while doing a sync/download? should it not process everything as fast as it can write to disk? I am using /dev/shm as the transcode location to get rid of any i/o related slowness as well. I have 12G free to start, and an avg. movie (720p) about 2.3G. So no issues with running out of room there.

Sync/download is a “background task”. They’ve decided that Sync/Download is less important than active streaming.

Whether to run full speed Sync/Download or not has been argued on both sides of the coin.

Yes when a stream is going on I completely agree but when the transcoder is not doing anything but syncing streams it seems odd. that is all

Odd ? YES

But can they predict the system will stay idle ? NO.

and that’s why they run it background

Once memory has been allocated in the GPU for a transcode, you can’t let go without killing the job.

It is what it is like you said we can argue both sides.

But the crux of the issue is this

  1. set my simultaneous streams to 1.
  2. restart pms.
  3. sync two moves on my ios device. first movie starts to sync using nvenc and finished as fast as it can I guess.
  4. sync finishes on the 1st movie, starts on the 2nd movie, but now in sloth mode and takes forever, nvidia-smi shows no load on the GPU
  5. any other movies I queue up after that will also not load up the GPU and proceed in sloth mode.

work around is set the limit to unlimted, then two moves start together are process and all further movies and synced using GPU and evidenced by nvidia-smi showing work being done.

I will attach full logs shortly.

Thank you for stating it this way. I was NOT understanding that. Your log file showed me 4 transcodes all using NVDEC and NVENC.

Your use of the term “sloth mode” caused the confusion.

What you describe is, to me, a problem.

Do you have a log zip / can you recreate this easily so I can submit to Engineering?

Yes. I can re-create this at will. I will pick a smaller TV episode but regardless of the source type the behavior is the same.

all future transcodes after the firsdt start with nvenc/nvdec like you say but they all print out sloth mode. almost like the first one isn’t marked as finished.

Printing ‘sloth mode’ means the buffer is full.

IF the GPU still has active sessions (nvidia-smi shows Plex Transcoder) then the GPU is being used.

If not, I need to see “sloth mode” in your logs because it’s clearly an overloaded term and confusing the ********** out of me