Kernel crash when using nvidia hardware transcoding (please help!)

Server Version#: 1.18.3.2111
Player Version#: any
Hi there,

I’m running Plex Media Server on my HP MicroServer Gen 8 with 16GB of RAM.
It’s running Fedora 31 (x86_64) and I’ve installed a GeForce GTX 1050Ti for HW transcoding.
My nVidia Driver version: 440.36
I’m also using the nvidia-patch for enabling more than two sessions.
Furthermore I’m blacklisting nouveau and enabling modesetting (in kernel cmd):
rd.driver.blacklist=nouveau modprobe.blacklist=nouveau nvidia-drm.modeset=1

I’ve renamed: “/usr/lib/plexmediaserver/Plex Transcoder”
To: “/usr/lib/plexmediaserver/Plex Transcoder2”

Then I’ve created a new “/usr/lib/plexmediaserver/Plex Transcoder” executable file that says:

#!/bin/sh
exec /usr/lib/plexmediaserver/Plex\ Transcoder2 -hwaccel nvdec "$@"

Now it seems to be working according to the output of nvidia-smi:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.36       Driver Version: 440.36       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 105...  Off  | 00000000:07:00.0 Off |                  N/A |
|  0%   47C    P0    N/A /  75W |    286MiB /  4039MiB |      1%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0    193760      C   /usr/lib/plexmediaserver/Plex Transcoder2    273MiB |
+-----------------------------------------------------------------------------+

But it sometimes seems to give a kernel crash every now and then:

[40809.424195] Plex Media Serv: page allocation failure: order:0, mode:0x10dc0(GFP_KERNEL|__GFP_NORETRY|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0
[40809.424200] CPU: 6 PID: 6506 Comm: Plex Media Serv Tainted: P           OE     5.3.13-300.fc31.x86_64 #1
[40809.424201] Hardware name: HP ProLiant MicroServer Gen8, BIOS J06 04/04/2019
[40809.424202] Call Trace:
[40809.424210]  dump_stack+0x66/0x90
[40809.424212]  warn_alloc.cold+0x7b/0xfb
[40809.424216]  __alloc_pages_slowpath+0xdc4/0xe00
[40809.424219]  __alloc_pages_nodemask+0x2ee/0x340
[40809.424238]  uvm_mem_alloc+0x245/0x3b0 [nvidia_uvm]
[40809.424251]  uvm_va_range_create_semaphore_pool+0x176/0x290 [nvidia_uvm]
[40809.424262]  uvm_api_alloc_semaphore_pool+0xf6/0x1a0 [nvidia_uvm]
[40809.424270]  uvm_ioctl+0xedc/0x1360 [nvidia_uvm]
[40809.424473]  ? _nv008350rm+0x1d/0x30 [nvidia]
[40809.424475]  ? ns_capable_common+0x2e/0x50
[40809.424642]  ? _nv008375rm+0x60/0x80 [nvidia]
[40809.424739]  ? os_is_administrator+0xf/0x20 [nvidia]
[40809.424906]  ? _nv007504rm+0xd0/0x130 [nvidia]
[40809.425025]  ? os_acquire_spinlock+0xe/0x20 [nvidia]
[40809.425263]  ? _nv033270rm+0xc/0x20 [nvidia]
[40809.425423]  ? _nv036742rm+0xac/0x170 [nvidia]
[40809.425426]  ? update_load_avg+0x76/0x600
[40809.425457]  uvm_unlocked_ioctl+0x31/0x60 [nvidia_uvm]
[40809.425471]  uvm_unlocked_ioctl_entry+0x89/0xb0 [nvidia_uvm]
[40809.425475]  do_vfs_ioctl+0x405/0x660
[40809.425478]  ksys_ioctl+0x5e/0x90
[40809.425480]  __x64_sys_ioctl+0x16/0x20
[40809.425484]  do_syscall_64+0x5f/0x1a0
[40809.425488]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[40809.425490] RIP: 0033:0x7ff93ebb234b
[40809.425493] Code: 0f 1e fa 48 8b 05 3d 9b 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 0d 9b 0c 00 f7 d8 64 89 01 48
[40809.425494] RSP: 002b:00007ff8857f4ee8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[40809.425496] RAX: ffffffffffffffda RBX: 00007ff87c538f90 RCX: 00007ff93ebb234b
[40809.425497] RDX: 00007ff8857f5270 RSI: 0000000000000044 RDI: 0000000000000060
[40809.425497] RBP: 00007ff8857f5270 R08: 0000000000000001 R09: 00007ff8857f5270
[40809.425498] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000044
[40809.425499] R13: 0000000000000060 R14: 0000000205c00000 R15: 0000000000000000
[40809.425528] Mem-Info:
[40809.425535] active_anon:574979 inactive_anon:15075 isolated_anon:0
                active_file:1138313 inactive_file:1910555 isolated_file:0
                unevictable:0 dirty:9976 writeback:0 unstable:0
                slab_reclaimable:61051 slab_unreclaimable:64858
                mapped:243376 shmem:15263 pagetables:4412 bounce:0
                free:49018 free_pcp:1 free_cma:0
[40809.425539] Node 0 active_anon:2299916kB inactive_anon:60300kB active_file:4553252kB inactive_file:7642220kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:973504kB dirty:39904kB writeback:0kB shmem:61052kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 120832kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[40809.425540] Node 0 DMA free:15884kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15968kB managed:15884kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[40809.425544] lowmem_reserve[]: 0 3313 15915 15915 15915
[40809.425546] Node 0 DMA32 free:64132kB min:14056kB low:17568kB high:21080kB active_anon:247824kB inactive_anon:0kB active_file:1050624kB inactive_file:1918972kB unevictable:0kB writepending:13064kB present:3487632kB managed:3422096kB mlocked:0kB kernel_stack:544kB pagetables:716kB bounce:0kB free_pcp:8kB local_pcp:0kB free_cma:0kB
[40809.425549] lowmem_reserve[]: 0 0 12601 12601 12601
[40809.425551] Node 0 Normal free:116056kB min:116948kB low:130312kB high:143676kB active_anon:2052092kB inactive_anon:60300kB active_file:3502692kB inactive_file:5723636kB unevictable:0kB writepending:26840kB present:13238268kB managed:12912116kB mlocked:0kB kernel_stack:10032kB pagetables:16932kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[40809.425555] lowmem_reserve[]: 0 0 0 0 0
[40809.425557] Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15884kB
[40809.425567] Node 0 DMA32: 69*4kB (UM) 114*8kB (UME) 74*16kB (UME) 61*32kB (ME) 68*64kB (UME) 30*128kB (ME) 17*256kB (UM) 8*512kB (UME) 11*1024kB (UME) 12*2048kB (UM) 2*4096kB (UM) = 64996kB
[40809.425576] Node 0 Normal: 1123*4kB (UMEH) 885*8kB (UMEH) 454*16kB (UMEH) 920*32kB (UME) 884*64kB (UME) 97*128kB (UME) 1*256kB (M) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 117524kB
[40809.425585] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[40809.425585] 3062277 total pagecache pages
[40809.425587] 0 pages in swap cache
[40809.425588] Swap cache stats: add 0, delete 0, find 0/0
[40809.425588] Free swap  = 0kB
[40809.425589] Total swap = 0kB
[40809.425589] 4185467 pages RAM
[40809.425590] 0 pages HighMem/MovableOnly
[40809.425590] 97943 pages reserved
[40809.425591] 0 pages cma reserved
[40809.425591] 0 pages hwpoisoned
[40809.794447] Plex Media Serv: page allocation failure: order:0, mode:0x10dc0(GFP_KERNEL|__GFP_NORETRY|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0
[40809.794452] CPU: 6 PID: 6506 Comm: Plex Media Serv Tainted: P           OE     5.3.13-300.fc31.x86_64 #1
[40809.794453] Hardware name: HP ProLiant MicroServer Gen8, BIOS J06 04/04/2019
[40809.794453] Call Trace:
[40809.794461]  dump_stack+0x66/0x90
[40809.794464]  warn_alloc.cold+0x7b/0xfb
[40809.794468]  __alloc_pages_slowpath+0xdc4/0xe00
[40809.794471]  __alloc_pages_nodemask+0x2ee/0x340
[40809.794492]  uvm_mem_alloc+0x245/0x3b0 [nvidia_uvm]
[40809.794504]  uvm_va_range_create_semaphore_pool+0x176/0x290 [nvidia_uvm]
[40809.794515]  uvm_api_alloc_semaphore_pool+0xf6/0x1a0 [nvidia_uvm]
[40809.794524]  uvm_ioctl+0xedc/0x1360 [nvidia_uvm]
[40809.794733]  ? _nv008350rm+0x1d/0x30 [nvidia]
[40809.794737]  ? ns_capable_common+0x2e/0x50
[40809.794907]  ? _nv008375rm+0x60/0x80 [nvidia]
[40809.795005]  ? os_is_administrator+0xf/0x20 [nvidia]
[40809.795173]  ? _nv007504rm+0xd0/0x130 [nvidia]
[40809.795270]  ? os_acquire_spinlock+0xe/0x20 [nvidia]
[40809.795440]  ? _nv033270rm+0xc/0x20 [nvidia]
[40809.795539]  ? _nv036742rm+0xac/0x170 [nvidia]
[40809.795541]  ? update_load_avg+0x76/0x600
[40809.795552]  uvm_unlocked_ioctl+0x31/0x60 [nvidia_uvm]
[40809.795560]  uvm_unlocked_ioctl_entry+0x89/0xb0 [nvidia_uvm]
[40809.795563]  do_vfs_ioctl+0x405/0x660
[40809.795564]  ksys_ioctl+0x5e/0x90
[40809.795565]  __x64_sys_ioctl+0x16/0x20
[40809.795567]  do_syscall_64+0x5f/0x1a0
[40809.795569]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[40809.795571] RIP: 0033:0x7ff93ebb234b
[40809.795574] Code: 0f 1e fa 48 8b 05 3d 9b 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 0d 9b 0c 00 f7 d8 64 89 01 48
[40809.795575] RSP: 002b:00007ff8857f51a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[40809.795576] RAX: ffffffffffffffda RBX: 00007ff87c306670 RCX: 00007ff93ebb234b
[40809.795577] RDX: 00007ff8857f5530 RSI: 0000000000000044 RDI: 0000000000000060
[40809.795578] RBP: 00007ff8857f5530 R08: 0000000000000001 R09: 00007ff8857f5530
[40809.795578] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000044
[40809.795579] R13: 0000000000000060 R14: 0000000205c00000 R15: 0000000000000000

Any ideas?
Does it run out of memory? or is my RAM bad/damaged?

Any ideas would be helpful.

Thanks in advance!

After looking more closer at the kernel crash, it does seem to be running out of memory.
It seems to mention the nvidia/nvidia_uvm kernel modules and the Plex Media Server.

So a combination of these things are taking up all memory when hardware transcoding I guess ?

Isn’t 16GB of ram enough then?
I expected when doing both decoding and encoding on the nVidia GPU it would basically do zero-copy which means the uncompressed video shouldn’t leave the GPU’s RAM right?

Found it…
Using htop I figured out the Plex Transcoder uses up 14GB of RAM when hardware transcoding…

193760 plex 20 0 **14.0G** 463M 286M R 8.5 2.9 6:06.35 /usr/lib/plexmediaserver/Plex Transcoder2 -hwaccel nvdec -codec:0 h264 -hwaccel:0 nvdec -hwaccel_fallback_threshold:0 10 -ss 178 -analyzeduration 20000000 -probesize 20000000 -i /media/data/media/movies/Gemini.Man.2019.1080p.WEB-DL.DD5.1.H264-CMRG/Gemini.Man.2019.1080p.WEB-DL.DD5.1.H264-CMRG.mkv -ss 178 -analyzeduration 20000000 -probesize 20000000 -i /media/data/plexmediaserver/Library/Application Support/Plex Media Server/Cache/Transcode/Sessions/plex-transcode-q74fhc35chfs1fsniv7hmpmm-8da68be3-0d2e-4741-b7bd-a3dfc680d309/temp-0.srt -map_inlineass 1:s:0 -filter_complex [0:0]scale=w=1920:h=1080[0];[0]format=pix_fmts=yuv420p|nv12[1];[1]inlineass=font_scale=1.000000:font_path=/usr/lib/plexmediaserver/Resources/Fonts/DejaVuSans-Regular.ttf:fontconfig_file=/usr/lib/plexmediaserver/Resources/fonts.conf:language=nl[2] -map [2] -codec:0 h264_nvenc -b:0 7294k -maxrate:0 9726k -bufsize:0 19452k -forced-idr:0 1 -r:0 23.975999999999999 -force_key_frames:0 expr:gte(t,178+n_forced*1) -map 0:1 -metadata:s:1 language=eng -codec:1 copy -copypriorss:1 0 -segment_format mpegts -f ssegment -individual_header_trailer 0 -segment_time 1 -segment_start_number 178 -segment_copyts 1 -segment_time_delta 0.0625 -segment_list http://127.0.0.1:32400/video/:/transcode/session/q74fhc35chfs1fsniv7hmpmm/8da68be3-0d2e-4741-b7bd-a3dfc680d309/seglist -segment_list_type csv -segment_list_size 5 -segment_list_separate_stream_times 1 -segment_list_unfinished 1 -max_delay 5000000 -avoid_negative_ts disabled -map_metadata -1 -map_chapters -1 media-%05d.ts -map 1:s:0 -f null -codec ass nullfile -start_at_zero -copyts -y -init_hw_device cuda=cuda: -hwaccel_device cuda -filter_hw_device cuda -nostats -loglevel quiet -loglevel_plex error -progressurl http://127.0.0.1:32400/video/:/transcode/session/q74fhc35chfs1fsniv7hmpmm/8da68be3-0d2e-4741-b7bd-a3dfc680d309/progress

Also I can see it’s already adding: -hwaccel nvdec by itself now.
So I guess I can remove my hack.

Still that wouldn’t explain why it takes up 14GB for a single transcode??

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.