Stuck in P-State P0 after transcode finished on NVIDIA

Hi,
I noticed that after running transcodes on plex, with hardware acceleration enabled (gpu), the p-state of my nvidia gtx 1070 gets stuck in P0 mode.
Because of this it is drawing a lot of power (more than while transcoding in P2…), and the fan is spinning up unnecessarily as well…

If I stop the plex media server, the p-state immediately gets back to P8 as it should be when idle.
Can you please do something with the transcoder to let the gpu return to P8 after having finished the transcoding?

plex-nvidia-pstate-p0.txt (3.7 KB)

Server Version#: 1.15.1.791
Player Version#: 3.91.0

2 Likes

Processes using the GPU while transcoding:

$ sudo fuser -v /dev/nvidia*
                     FELHASZNÁLÓ  PID HOZZÁFÉRÉS PARANCS
/dev/nvidia0:        root      12516 F.... nvidia-persiste
                     plex      16779 F.... Plex Media Serv
                     plex      18191 F...m Plex Transcoder
/dev/nvidiactl:      root      12516 F.... nvidia-persiste
                     plex      16779 F.... Plex Media Serv
                     plex      18191 F...m Plex Transcoder
/dev/nvidia-modeset: root      12516 F.... nvidia-persiste
/dev/nvidia-uvm:     plex      16779 F.... Plex Media Serv
                     plex      18191 F.... Plex Transcoder

Processes after transcoding has finished:

$ sudo fuser -v /dev/nvidia*
                     FELHASZNÁLÓ  PID HOZZÁFÉRÉS PARANCS
/dev/nvidia0:        root      12516 F.... nvidia-persiste
                     plex      16779 F.... Plex Media Serv
/dev/nvidiactl:      root      12516 F.... nvidia-persiste
                     plex      16779 F.... Plex Media Serv
/dev/nvidia-modeset: root      12516 F.... nvidia-persiste
/dev/nvidia-uvm:     plex      16779 F.... Plex Media Serv

Somehow Plex Media Server doesn’t let the GPU go…

Would you please be able to add additional information herer?

Which graphics card is in use? GT-1070?
Distro and version please?

If you have any logs of it crashiing / hanging the system (Settings - Server - Troubleshooting - Download Logs) , captured right after this happens, please attach the ZIP.

It will greatly help the team see what’s happening. It does look like a bug to them but they need to see where it is.

I’m running on Ubuntu 16.04 LTS with mediatree kernel for my Hauppauge WinTV-DualHD (plex logs at the end of the post)

$ lsb_release -a && uname -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 16.04.6 LTS
Release:        16.04
Codename:       xenial
Linux 4.4.0-142201902141402-generic #0+mediatree+hauppauge-Ubuntu SMP Thu Feb 14 21:23:18 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

I copy the output of nvidia-smi, there you should have every info regarding the GPU

Thu Mar 14 10:39:47 2019 CET
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.104      Driver Version: 410.104      CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1070    On   | 00000000:01:00.0 Off |                  N/A |
|  0%   47C    P8    14W / 151W |      1MiB /  8118MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

/*** Transcode begin ***/

Thu Mar 14 10:40:20 2019 CET
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.104      Driver Version: 410.104      CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1070    On   | 00000000:01:00.0 Off |                  N/A |
|  0%   48C    P2    37W / 151W |    113MiB /  8118MiB |      2%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      8099      C   /usr/lib/plexmediaserver/Plex Transcoder     101MiB |
+-----------------------------------------------------------------------------+

/*** Transcode end ***/

Thu Mar 14 10:41:45 2019 CET
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.104      Driver Version: 410.104      CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1070    On   | 00000000:01:00.0 Off |                  N/A |
|  0%   53C    P0    38W / 151W |     11MiB /  8118MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
 

The relevant log files:
Transcode-Bug-Pstate-Stuck-P0.zip (8.6 KB)

Bumpp

Can’t help further. Every way I go at this leads me to Kernel / Driver interaction.

From a systems perspective, No app should be able to use the documented API (nvidia GPU drivers) and hang it, further supporting asserting it’s GPU/Kernel)

Suggest their forums.

Bug is still here …

Why attribute the main Plex Media server PID to GPU, why not only the Transcoder child PID ?

1 Like

Bummpp @ChuckPa

How nVidia works:

  1. Transocder queries the nVidia driver API (if installed)
  2. Finding API, driver queries for devices
  3. Success/failure returned to PMS
  4. Upon success, PMS now sends transocde requests to nVidia API.
  5. When the session is over, transcoder tells the nVidia API to disconnect / end session.
  6. Transcoder exits.

If the API (driver) doesn’t disconnect and return to the transcoder, it will hang.
If this were a transcoder problem, everyone would see it, wouldn’t they?
Therefore, it has to be a local user issue (driver / card / OS) incompatibility / bug.

But transcoder is killed.

it is the parent process (which contains all the plex, dlna, etc etc processes) that remains in the memory of the GPU.

Transcoding PID is well killed

Well I don’t think it’s related only to my specific setup.
I have upgraded ubuntu from 16.04 to 18.04 since then.
Tried with the regular kernel and the HWE as well.
Also installed 3 drivers from Nvidia.
The issue is always the same.

PMS itself (DLNA, etc) does not download itself into the GPU. This isn’t cuda or OpenCL.
The hardware transcoding codec is removed when processing terminates by the driver.

Very strange … others ffmpeg based transcoding are ok (Emby, jellyfin, ffmpeg command’ etc etc).

The big difference is starting a Plex transcode create 2 PID, Plex Media Serv, and Plex media transcoder, at the end, only Plex Media transcode is killed.

The subject is : Why Plex Media Serv ?

EDIT : Some peoples have the same bug on reddit

I send a post to Nvidia, but in my opinion … it’s a Plex issue

I can confirm that Plex does this on my box, while ffmpeg and jellyfin (and emby) do not.

So the bug is not about Nvidia Drivers …

There is something that plex does in transcoding management that differs from others and causes this bug, and it’s an amazing issue in fact …

I found some identical bug tracks on windows on the forums / reddit / Web

This test prove problem can be workarounded from Plex Server side by means of freeing some GPU resources/handles when no GPU processing active. So workaround is definitely within Plex team grasp. Even if root cause of such power mismanagement is inside Nvidia driver, Plex team can make an effort and expedite this bug to Nvidia support, because you can map a broad picture and describe how to reproduce problem in terms of Nvidia API invocations.

@ChuckPa :slight_smile:

Hmm, am seeing similar.

plex with transcoder session running

[1]+  Stopped                 watch -d nvidia-smi
root@proximo:~# fuser -v /dev/nvidia*
                     USER        PID ACCESS COMMAND
/dev/nvidia0:        plex      19286 F...m Plex Media Serv
                     plex      21292 F...m Plex Transcoder
/dev/nvidiactl:      plex      19286 F.... Plex Media Serv
                     plex      21292 F...m Plex Transcoder
/dev/nvidia-uvm:     plex      19286 F.... Plex Media Serv
                     plex      21292 F...m Plex Transcoder

plex running, idle no transcoder use

[1]+  Stopped                 watch -d nvidia-smi
root@proximo:~# fuser -v /dev/nvidia*
                     USER        PID ACCESS COMMAND
/dev/nvidia0:        plex      19286 F...m Plex Media Serv
/dev/nvidiactl:      plex      19286 F.... Plex Media Serv
/dev/nvidia-uvm:     plex      19286 F.... Plex Media Serv

plex shut down

root@proximo:~# systemctl stop plexmediaserver
root@proximo:~# fuser -v /dev/nvidia*
root@proximo:~#

my nvidia smi does not idle down even with plex not running.

Thu Jun 27 19:05:33 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.26       Driver Version: 430.26       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1650    Off  | 00000000:03:00.0 Off |                  N/A |
| 50%   44C    P0    N/A /  N/A |      0MiB /  3909MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

disabled plex startup service

systemctl disable plexmediaservice

rebooted server

root@proximo:~# fuser -v /dev/nvidia*
                     USER        PID ACCESS COMMAND
/dev/nvidia0:        nvidia-persistenced   2239 F.... nvidia-persiste
/dev/nvidiactl:      nvidia-persistenced   2239 F.... nvidia-persiste
/dev/nvidia-modeset: nvidia-persistenced   2239 F.... nvidia-persiste
root@proximo:~# uptime
 19:14:11 up 4 min,  2 users,  load average: 0.91, 0.95, 0.44

nvidia smi showing P8 7watt

Thu Jun 27 19:14:42 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.26       Driver Version: 430.26       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1650    On   | 00000000:03:00.0  On |                  N/A |
| 50%   41C    P8     7W /  75W |      0MiB /  3909MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

before & after plex start

after a plex transcoder session has started and ended, card @ P0

something is definitely wack somewhere.

also, to be clear this is without nvdec patch (nvenc only)

1 Like

Bump @ChuckPa :slight_smile:

Bump

I’m not the transcoder guy.
We all need to wait until they reply.

Why only this machine? If a major issue, everyone who uses it would be speaking up.

It’s probably everyone’s (with a similar setup)

It’s not a major issue, but it obviously uses more power unnecessarily.

It is also not an obvious problem, how many people constantly monitor nv-smi or even are aware of power levels. (I wasn’t until coming across these threads)

Sorry Chuck, i thought you were a plex employee :wink:

This is not only this machine, it’s all Linux Plex server with Nvidia HW :smile: (tests in Reddit topic, and almost 5 persons in my circle of friends)

I agree, it’s not obvious, not a HIGHT priority, but Plex company can’t say there is nothing.

power consumption x3 , + fans at 45/50%, 40°c for nothings ==> material wear … it’s not “nothings”

There are few people because not many people explore these kinds of things

A lot of premium members have bought the pass for transcoding, which has a issue, not blocking, but it does exist