Hardware transcode fails to start after container is running for a bit

Server Version#: 1.41.8.9834
Player Version#: 1.109.0.329-ea562b95

I am running plex in docker with the 6.11 kernal and the nvidia 570 drivers. when I start the container hardware transcoding works fine. But after some time running, if I try to start a transcode on any device I get the following cuda errors. Not sure if it has to do with my docker setup, here is the compose file I am using. Any help resolving this would be greatly appreciated.

services:
  plex:
    cpus: "43.9"
    deploy:
      resources:
        reservations:
          devices:
            - capabilities: [gpu]
    container_name: plex
    image: plexinc/pms-docker:latest
    restart: always
    runtime: nvidia
    gpus: all
    ports:
      - 32400:32400/tcp
      - 3005:3005/tcp
      - 8324:8324/tcp
      - 32469:32469/tcp
#      - 1900:1900/udp
      - 32410:32410/udp
      - 32412:32412/udp
      - 32413:32413/udp
      - 32414:32414/udp
    environment:
      - TZ=America/New_York
      - PLEX_CLAIM=<REDACTED
      - ADVERTISE_IP=<REDACTED>
      - NVIDIA_VISIBLE_DEVICES=all
      - NVIDIA_DRIVER_CAPABILITIES=all
    hostname: Redacted
    volumes:
      - path/to/Plex/Library:/config
      - path/to/Plex/Transcode:/transcode
      - path/to/Plex/tmp:/tmp
      - path/to/Data:/data
      - /etc/localtime:/etc/localtime:ro
    network_mode: bridge

PMS console Logs

Jun 12, 2025 00:09:43.763 [137358148021048] Error — [Req#a9bc7/Transcode] [FFMPEG] - cu->cuCtxCreate(&hwctx->cuda_ctx, desired_flags, hwctx->internal->cuda_device) failed
Jun 12, 2025 00:09:43.763 [137358148021048] Error — [Req#a9bc7/Transcode] [FFMPEG] -  -> CUDA_ERROR_NOT_PERMITTED: operation not permitted
Jun 12, 2025 00:09:43.763 [137358148021048] Error — [Req#a9bc7/Transcode] [FFMPEG] - 

Jun 12, 2025 00:09:43.770 [137358148021048] Error — [Req#a9bc7/Transcode] [FFMPEG] - cu->cuCtxCreate(&hwctx->cuda_ctx, desired_flags, hwctx->internal->cuda_device) failed
Jun 12, 2025 00:09:43.770 [137358148021048] Error — [Req#a9bc7/Transcode] [FFMPEG] -  -> CUDA_ERROR_NOT_PERMITTED: operation not permitted
Jun 12, 2025 00:09:43.770 [137358148021048] Error — [Req#a9bc7/Transcode] [FFMPEG] - 

Jun 12, 2025 00:09:43.784 [137358148021048] Error — [Req#a9bc7/Transcode] [FFMPEG] - cu->cuCtxCreate(&hwctx->cuda_ctx, desired_flags, hwctx->internal->cuda_device) failed
Jun 12, 2025 00:09:43.784 [137358148021048] Error — [Req#a9bc7/Transcode] [FFMPEG] -  -> CUDA_ERROR_NOT_PERMITTED: operation not permitted
Jun 12, 2025 00:09:43.784 [137358148021048] Error — [Req#a9bc7/Transcode] [FFMPEG] 

The solution that has worked for me was adding the nvidia devices into the container manually. This is a known nvidia issue where certain containers loose permission due to cgroups or similar.

services:
  plex:
    #cpus: "43.9"
    deploy:
      resources:
        reservations:
          devices:
            - capabilities: [gpu]
    container_name: plex
    image: plexinc/pms-docker:latest
    restart: always
    runtime: nvidia
    gpus: all
    ports:
      - 32400:32400/tcp
      - 3005:3005/tcp
      - 8324:8324/tcp
      - 32469:32469/tcp
#      - 1900:1900/udp
      - 32410:32410/udp
      - 32412:32412/udp
      - 32413:32413/udp
      - 32414:32414/udp
    environment:
      - TZ=America/New_York
      - PLEX_TUNER_DISABLE=true
      - PLEX_CLAIM=<REDACTED>
      - ADVERTISE_IP=<REDACTED>
      - NVIDIA_VISIBLE_DEVICES=all
      - NVIDIA_DRIVER_CAPABILITIES=all
    hostname: <REDACTED>
    volumes:
      volumes:
      - path/to/Plex/Library:/config
      - path/to/Plex/Transcode:/transcode
      - path/to/Plex/tmp:/tmp
      - path/to/Data:/data
      - /etc/localtime:/etc/localtime:ro
    network_mode: bridge
    devices:
      - /dev/nvidia-uvm
      - /dev/nvidia-uvm-tools
      - /dev/nvidia-modeset
      - /dev/nvidiactl
      - /dev/nvidia0

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.