PMS hardware encoding works at first, then breaks after a while

Server Version#: 1.26.0.5715
Player Version#: Plex Web 4.76.1
Docker Version: 20.10.14, build a224086
OS Version: Ubuntu 22.04
Nvidia Driver: 510.60.02

Docker Compose File

services:
  plex:
    container_name: plex
    image: plexinc/pms-docker:latest
    restart: unless-stopped
    environment:
      - TZ=America/Los_Angeles
      - PLEX_CLAIM=<removed>
      - NVIDIA_VISIBLE_DEVICES=all
      - NVIDIA_DRIVER_CAPABILITIES=all
    runtime: nvidia
    ports:
      - 32400:32400
    volumes:
      - /home/<user>/plex:/config
      - /home/<user>/plex/transcode/:/transcode
      - /mnt/pool/:/data

Greetings-

I recently installed an RTX 3060 in my Plex server to utilize GPU hardware accelerated encoding. My issue is that it works when I spin up my Docker container, then after some time (most recently within 12 hours) it no longer works. When I first start the container I can exec into it and run nvidia-smi and get the following output (same output as running on host):

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.60.02    Driver Version: 510.60.02    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:43:00.0  On |                  N/A |
|  0%   36C    P8    18W / 170W |      3MiB / 12288MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

but when it breaks I get the following instead when run in the container (host still shows output above):

Failed to initialize NVML: Unknown Error

Poking around my Plex logs I see this repeating log:

May 05, 2022 12:41:03.515 [0x7f77bd4f0b38] ERROR - [Transcode] [FFMPEG] - cu->cuCtxCreate(&hwctx->cuda_ctx, desired_flags, hwctx->internal->cuda_device) failed
May 05, 2022 12:41:03.515 [0x7f77bd4f0b38] ERROR - [Transcode] [FFMPEG] -  -> CUDA_ERROR_NOT_PERMITTED: operation not permitted

What I find most strange is that HW encoding works for a little while and then stops working. Seems like it should just be broken and not work at all or it should work completely. Not sure what changes over time. Maybe has to do with DVR or daily butler stuff (throwing stuff against the wall here to see what sticks)?

I am willing to post more logs if that would help.

My current plan is to try swapping over to the linuxserver.io Plex image and see if that works. I have also found some other posts on Reddit and here on how to get HW encoding working with various changes to try to some nvidia files on the host system, some docker compose changes, etc., but usually those changes / fixes are for when HW transcoding doesn’t work at all. In fact, I haven’t found anyone else with this exact issue (works at first then stops working after some time), hence me making this post.

If anyone has any ideas I would very much appreciate any input. Thanks!

New docker-compose file is as follows:

version: '3.8'

services:
  plex:
    container_name: plex
    image: lscr.io/linuxserver/plex:latest
    restart: unless-stopped
    environment:
      - PUID=1000
      - PGID=1000
      - TZ=America/Los_Angeles
      - PLEX_CLAIM=<removed>
      - VERSION=docker
      - NVIDIA_VISIBLE_DEVICES=all
      - NVIDIA_DRIVER_CAPABILITIES=all
    runtime: nvidia
    ports:
      - 32400:32400
    volumes:
      - /home/<user>/plex:/config
      - /mnt/pool/:/data

HW encoding is working again, but I am not sure if that is just because I spun up a new container (which temporarily fixed it before). I will let it run for the next day or so and see what happens.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.60.02    Driver Version: 510.60.02    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:43:00.0  On |                  N/A |
|  0%   47C    P2    43W / 170W |    228MiB / 12288MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A   2511070      C   ...diaserver/Plex Transcoder      224MiB |
+-----------------------------------------------------------------------------+

Just reporting in that using my new docker compose I still have the issue described in the original post. Still trying to nail down what is causing this problem.

I added privileged: true to my docker-compose file and hardware transcoding has been working correctly for over 3 days straight now. Will continue to monitor it.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.