Plex service crashing and locking server Ubuntu 18.04.1 1.14.1.5488

Server Version#: Noticed during 1.14.1.5488
Player Version#: N/A

From what I’ve noticed it is crashing / locking my server during a scan. I have the interval set to every 15 minutes and after about a day it locks up. By that I mean I am unable to SSH into the server and have to hard restart it with the physical power button. The crashing seemed to happen more sooner everytime after. At first it would go for a week and then steadily starting crashing almost daily. I don’t have any data before this server version. I moved my Windows 10 data to Ubuntu and this was the version when that migration happened. Something from the move may have caused this. My database got corrupt during one of these crashes and I had to go to a back up. I’m afraid my current database is corrupt again because of the crashes. Attached are logs from 1.14.1.5488. I have also updated to 1.15.0.647 and 1.15.0.659 with seemingly the same issue happening. I just turned verbose logging off so when I get another crash with the current server version I can upload more logs.

Noticed a similar post at PMS 1.14.1.5488 Crashing frequently on Windows 10

Plex Media Server Logs_2019-01-30_16-30-01.zip (3.1 MB)

I do not see the lockup when scanning.

I do see a lockup while using the nVidia encoder sending to an Android-based device.

The transcoder goes into sloth (partial sleep) mode waiting for the client to need more.
The next I see it the server being restarted

Thanks that is good to know. I will upload new logs when it happens again.

Server crashed and locked up after about 2 days of up time. Attached are logs.

Plex Media Server Logs_2019-02-17_17-32-15.zip (3.3 MB)

There is something going on outside of PMS.

This shows it streaming normally, without the transcoder, and then the system locking up.

The NULL characters at the end are because the kernel locked up mid-write to the disk.

Feb 17, 2019 17:24:24.267 [0x7f2e58ffd700] DEBUG - Request: [127.0.0.1:41150 (Loopback)] GET /:/metadata/updateProgressMessage?message=Scanning%20Family%20Guy%2FSeason%2013 (30 live) GZIP Signed-in Token (freakytoad1)
Feb 17, 2019 17:24:24.267 [0x7f2f00c5a700] DEBUG - Completed: [127.0.0.1:41150] 200 GET /:/metadata/updateProgressMessage?message=Scanning%20Family%20Guy%2FSeason%2013 (30 live) GZIP 1ms 166 bytes
Feb 17, 2019 17:24:24.300 [0x7f2e6d7fa700] DEBUG - Request: [127.0.0.1:41152 (Loopback)] GET /:/metadata/updateProgressMessage?message=Scanning%20Family%20Guy%2FSeason%2012 (31 live) GZIP Signed-in Token (freakytoad1)
Feb 17, 2019 17:24:24.301 [0x7f2f00c5a700] DEBUG - Completed: [127.0.0.1:41152] 200 GET /:/metadata/updateProgressMessage?message=Scanning%20Family%20Guy%2FSeason%2012 (29 live) GZIP 1ms 166 bytes
Feb 17, 2019 17:24:24.786 [0x7f2f0145b700] DEBUG - Auth: authenticated user 10434655 as whitecastle200
Feb 17, 2019 17:24:24.786 [0x7f2e6cff9700] DEBUG - Request: [168.245.154.244:16680 (WAN)] GET /:/eventsource/notifications (29 live) TLS Signed-in Token (whitecastle200)
Feb 17, 2019 17:24:29.458 [0x7f2f00c5a700] DEBUG - handleStreamWrite code 32: Broken pipe
Feb 17, 2019 17:24:29.458 [0x7f2f00c5a700] DEBUG - NotificationStream: Removing because of error
Feb 17, 2019 17:24:34.245 [0x7f2f00c5a700] DEBUG - Completed: [10.0.0.14:61302] -2 GET /player/proxy/poll?deviceClass=pc&protocolVersion=1&protocolCapabilities=timeline%2Cplayback%2Cnavigation%2Cmirror%2Cplayqueues&timeout=1 (17 live) GZIP 20001ms 10 bytes (pipelined: 6)
Feb 17, 2019 17:24:42.670 [0x7f2efb7fe700] DEBUG - [CompanionProxy] player mxbey2agzzefb8ejhwhjeh41 was last refreshed 10 seconds ago
Feb 17, 2019 17:24:43.046 [0x7f2f0145b700] DEBUG - EventSource: Failure in IdleTimeout (0 - Success).
Feb 17, 2019 17:24:43.046 [0x7f2f0145b700] DEBUG - MyPlex: We appear to have lost Internet connectivity, resetting device URL cache.
Feb 17, 2019 17:24:43.046 [0x7f2f0145b700] ERROR - EventSource: Retrying in 15 seconds.
Feb 17, 2019 17:24:46.130 [0x7f2e6dffb700] DEBUG - NetworkServiceBrowser: SSDP departed after not being seen for 21.945803 seconds: 10.0.0.1 (FreeBSD router)
Feb 17, 2019 17:24:46.130 [0x7f2e6dffb700] DEBUG - NetworkServiceBrowser: SSDP departed after not being seen for 21.945833 seconds: 10.0.0.1 (WANDevice)
Feb 17, 2019 17:24:46.130 [0x7f2e6dffb700] DEBUG - NetworkServiceBrowser: SSDP departed after not being seen for 21.945845 seconds: 10.0.0.1 (WANConnectionDevice)
Feb 17, 2019 17:24:46.130 [0x7f2e6dffb700] DEBUG - NetworkServiceBrowser: SSDP departed after not being seen for 29.213143 seconds: 10.0.0.23 (SHIELD)
Feb 17, 2019 17:24:46.130 [0x7f2e6dffb700] DEBUG - NetworkServiceBrowser: SSDP departed after not being seen for 21.950641 seconds: 10.0.0.24 (RX-A780 94117C)
Feb 17, 2019 17:24:46.130 [0x7f2e6dffb700] DEBUG - NetworkServiceBrowser: SSDP departed after not being seen for 21.702541 seconds: 10.0.0.25 ([LG] webOS TV OLED65B7A)
Feb 17, 2019 17:24:46.130 [0x7f2e6dffb700] DEBUG - NetworkServiceBrowser: SSDP departed after not being seen for 21.595654 seconds: 10.0.0.44 (HDHomeRun DMS 104E0C00)
Feb 17, 2019 17:24:46.137 [0x7f2efbfff700] DEBUG - DVR:Device: Discovering and refreshing devices with identifier tv.plex.grabbers.hdhomerun
Feb 17, 2019 17:24:46.138 [0x7f2efbfff700] DEBUG - DVR:Grabber: HDHomerun discovered 0 compatible devices.
Feb 17, 2019 17:24:46.138 [0x7f2efbfff700] DEBUG - DVR:Device: Testing grabber HDHomerun device device://tv.plex.grabbers.hdhomerun/104E0C00 at http://10.0.0.44:80
Feb 17, 2019 17:24:46.138 [0x7f2efbfff700] DEBUG - DVR:Device: Device device://tv.plex.grabbers.hdhomerun/104E0C00 was already known, refreshing database info
Feb 17, 2019 17:24:46.139 [0x7f2efbfff700] DEBUG - HTTP requesting GET http://10.0.0.44:80/discover.json
Feb 17, 2019 17:24:51.140 [0x7f2efbfff700] ERROR - Error issuing curl_easy_perform(handle): 28
Feb 17, 2019 17:24:51.140 [0x7f2efbfff700] DEBUG - HTTP simulating 408 after curl timeout
Feb 17, 2019 17:24:51.148 [0x7f2efbfff700] ERROR - DVR:Device: Error refreshing existing device device://tv.plex.grabbers.hdhomerun/104E0C00, marking as dead.
Feb 17, 2019 17:24:52.670 [0x7f2edffff700] DEBUG - [CompanionProxy] player mxbey2agzzefb8ejhwhjeh41 was last refreshed 20 seconds ago
\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00

I looked in my syslog on the server and found the following during the same time the lock up happened.

Feb 17 17:24:24 hades kernel: [173666.768335] general protection fault: 0000 [#2] SMP PTI
Feb 17 17:24:24 hades kernel: [173666.768548] Modules linked in: nvidia_uvm(POE) nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache snd_hda_codec_hdmi nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass intel_cstate intel_rapl_perf drm_kms_helper drm ipmi_devintf ipmi_msghandler fb_sys_fops syscopyarea sysfillrect sysimgblt ppdev intel_wmi_thunderbolt wmi_bmof snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer snd soundcore mei_me parport_pc parport mei shpchp mac_hid intel_pch_thermal acpi_pad sch_fq_codel sunrpc ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov
Feb 17 17:24:24 hades kernel: [173666.769216]  async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel e1000e aes_x86_64 crypto_simd ptp glue_helper cryptd pps_core ahci i2c_i801 libahci wmi video
Feb 17 17:24:24 hades kernel: [173666.769568] CPU: 2 PID: 25868 Comm: Plex Media Scan Tainted: P      D    OE    4.15.0-45-generic #48-Ubuntu
Feb 17 17:24:24 hades kernel: [173666.769811] Hardware name: Gigabyte Technology Co., Ltd. H370HD3/H370 HD3-CF, BIOS F12 01/16/2019
Feb 17 17:24:24 hades kernel: [173666.770050] RIP: 0010:find_inode+0x37/0xb0
Feb 17 17:24:24 hades kernel: [173666.770256] RSP: 0018:ffffb0d7826d3a40 EFLAGS: 00010202
Feb 17 17:24:24 hades kernel: [173666.770472] RAX: 0000000000000000 RBX: ffff8bcdb114c800 RCX: ffffb0d7826d3ae0
Feb 17 17:24:24 hades kernel: [173666.770701] RDX: 0fffffffffffff20 RSI: ffffb0d78097e0c0 RDI: ffff8bcdb114c800
Feb 17 17:24:24 hades kernel: [173666.770931] RBP: ffffb0d7826d3a70 R08: ffffb0d7826d3ae0 R09: ffffffffc0b9ea2b
Feb 17 17:24:24 hades kernel: [173666.771160] R10: 00000000019ebdd9 R11: 0000000000000001 R12: ffffb0d7826d3ae0
Feb 17 17:24:24 hades kernel: [173666.771389] R13: ffffffffc0ac2380 R14: ffffb0d7826d3ae0 R15: 0fffffffffffff20
Feb 17 17:24:24 hades kernel: [173666.771620] FS:  00007f9e18884b80(0000) GS:ffff8bcdbdd00000(0000) knlGS:0000000000000000
Feb 17 17:24:24 hades kernel: [173666.771862] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 17 17:24:24 hades kernel: [173666.772069] CR2: 00007f9e000a20c8 CR3: 000000020084a002 CR4: 00000000003606e0
Feb 17 17:24:24 hades kernel: [173666.772297] Call Trace:
Feb 17 17:24:24 hades kernel: [173666.772503]  iget5_locked+0x82/0x1f0
Feb 17 17:24:24 hades kernel: [173666.772726]  ? nfs_file_has_writers+0x50/0x50 [nfs]

This was the end of syslog before I hard restarted.

Any ideas if this is plex related?

That is extremely hard to tell without a kernel stack dump from the moment.

What that does tell me (look at the Call Trace)
Something is writing to an NFS mount point which is using local_lock=something when it went down.

Do you have your PMS metadata Library remote on NFS ?

If so, it correlates to what I’ve seen on ESXi and bad ethernet drivers for Ubuntu. The string of trailing nulls is “signature”

There wasn’t a crash log generated so I can’t provide a dump file.

I have media attached via NFS (from Unraid) but the Plex install and metadata resides on the same SSD that Ubuntu is on. I should add that sometimes I do notice that the plex scanner will get stuck scanning a library and has to be canceled for it to move on. Saw this today before the crash.

How would I verify that my ethernet driver is bad / where would I get a good driver?

Another crash. Here are the logs from Plex and the error from syslog.
Plex Media Server Logs_2019-02-19_06-38-25.zip (3.1 MB)

Feb 19 06:08:35 hades kernel: [131837.217787] general protection fault: 0000 [#1] SMP PTI
Feb 19 06:08:35 hades kernel: [131837.217983] Modules linked in: nvidia_uvm(POE) nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache snd_hda_codec_hdmi nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) drm_kms_helper drm ipmi_devintf ipmi_msghandler fb_sys_fops syscopyarea sysfillrect sysimgblt intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel ppdev kvm irqbypass intel_cstate intel_rapl_perf snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec wmi_bmof intel_wmi_thunderbolt snd_hda_core snd_hwdep snd_pcm snd_timer snd soundcore parport_pc mei_me mei mac_hid parport intel_pch_thermal shpchp acpi_pad sunrpc sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov
Feb 19 06:08:35 hades kernel: [131837.218450]  async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd i2c_i801 e1000e ptp pps_core ahci libahci wmi video
Feb 19 06:08:35 hades kernel: [131837.218732] CPU: 3 PID: 238 Comm: kworker/3:1H Tainted: P           OE    4.15.0-45-generic #48-Ubuntu
Feb 19 06:08:35 hades kernel: [131837.218937] Hardware name: Gigabyte Technology Co., Ltd. H370HD3/H370 HD3-CF, BIOS F12 01/16/2019
Feb 19 06:08:35 hades kernel: [131837.219161] Workqueue: xprtiod xs_tcp_data_receive_workfn [sunrpc]
Feb 19 06:08:35 hades kernel: [131837.219364] RIP: 0010:skb_release_data+0xa3/0x150
Feb 19 06:08:35 hades kernel: [131837.219574] RSP: 0018:ffffa258c11dfd60 EFLAGS: 00010206
Feb 19 06:08:35 hades kernel: [131837.219793] RAX: 0000000000000020 RBX: 0000000000000000 RCX: ffffffffa33f7ae0
Feb 19 06:08:35 hades kernel: [131837.220036] RDX: 0000000000001000 RSI: 00000000000000e7 RDI: 1000000000000000
Feb 19 06:08:35 hades kernel: [131837.220279] RBP: ffffa258c11dfd80 R08: 0000000000000000 R09: 0000000000000000
Feb 19 06:08:35 hades kernel: [131837.220512] R10: 000000000000021d R11: 000000000000020f R12: ffff8d6bc35e24f0
Feb 19 06:08:35 hades kernel: [131837.220752] R13: ffff8d6d71af2600 R14: ffff8d6bc35e24c0 R15: 00000000000000e8
Feb 19 06:08:35 hades kernel: [131837.220993] FS:  0000000000000000(0000) GS:ffff8d6d7dd80000(0000) knlGS:0000000000000000
Feb 19 06:08:35 hades kernel: [131837.221237] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 19 06:08:35 hades kernel: [131837.221457] CR2: 00007fdb0b7e211c CR3: 0000000144c0a003 CR4: 00000000003606e0
Feb 19 06:08:35 hades kernel: [131837.221699] Call Trace:
Feb 19 06:08:35 hades kernel: [131837.221909]  skb_release_all+0x24/0x30
Feb 19 06:08:35 hades kernel: [131837.222119]  __kfree_skb+0x12/0x20
Feb 19 06:08:35 hades kernel: [131837.222333]  tcp_read_sock+0x10a/0x1d0
Feb 19 06:08:35 hades kernel: [131837.222563]  ? xs_tcp_setup_socket+0x3c0/0x3c0 [sunrpc]
Feb 19 06:08:35 hades kernel: [131837.222789]  ? xs_tcp_setup_socket+0x3c0/0x3c0 [sunrpc]
Feb 19 06:08:35 hades kernel: [131837.223015]  xs_tcp_data_receive_workfn+0xba/0x180 [sunrpc]
Feb 19 06:08:35 hades kernel: [131837.223220]  process_one_work+0x1de/0x410
Feb 19 06:08:35 hades kernel: [131837.223406]  worker_thread+0x32/0x410
Feb 19 06:08:35 hades kernel: [131837.223592]  kthread+0x121/0x140
Feb 19 06:08:35 hades kernel: [131837.223777]  ? process_one_work+0x410/0x410
Feb 19 06:08:35 hades kernel: [131837.223963]  ? kthread_create_worker_on_cpu+0x70/0x70
Feb 19 06:08:35 hades kernel: [131837.224150]  ret_from_fork+0x35/0x40
Feb 19 06:08:35 hades kernel: [131837.224336] Code: 45 fa 0f 1f 44 00 00 f0 ff 4f 1c 75 05 e8 c6 52 9a ff 41 0f b6 46 02 83 c3 01 49 83 c4 10 39 d8 7f ce 49 8b 7e 08 48 85 ff 74 10 <48> 8b 1f e8 15 f5 ff ff 48 85 db 48 89 df 75 f0 4d 85 ed 74 44 
Feb 19 06:08:35 hades kernel: [131837.224580] RIP: skb_release_data+0xa3/0x150 RSP: ffffa258c11dfd60
Feb 19 06:08:35 hades kernel: [131837.224777] ---[ end trace 9768bf23c835b78d ]---

That is a distribution kernel / equipement problem on your end.
If I were to call it. Either hardware failing or unhandled condition in the driver.

This really sounds like LRO (Large Receive Offload) .

If you have this enabled in your network card, try running with it disabled.

man ethtool

Here are the results of running ethtool on that ethernet interface. Looks like LRO is already off?

If it truly is my ethernet card would putting in a different card help?

plex@hades:~$ ethtool -k eno1
Features for eno1:
Cannot get device udp-fragmentation-offload settings: Operation not permitted
rx-checksumming: on
tx-checksumming: on
        tx-checksum-ipv4: off [fixed]
        tx-checksum-ip-generic: on
        tx-checksum-ipv6: off [fixed]
        tx-checksum-fcoe-crc: off [fixed]
        tx-checksum-sctp: off [fixed]
scatter-gather: on
        tx-scatter-gather: on
        tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
        tx-tcp-segmentation: on
        tx-tcp-ecn-segmentation: off [fixed]
        tx-tcp-mangleid-segmentation: off
        tx-tcp6-segmentation: on
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-gre-csum-segmentation: off [fixed]
tx-ipxip4-segmentation: off [fixed]
tx-ipxip6-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-udp_tnl-csum-segmentation: off [fixed]
tx-gso-partial: off [fixed]
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off
rx-all: off
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: off [fixed]
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: off [fixed]

It does. That implies something is flaking out. Thermal? Dust bunnies?

??? unclear

Sorry email reply didn’t work. This is what I said:

This was a new server build as of 2 months ago. Plenty of airflow. I’m seeing that my current ethernet driver (e1000e) is 3.2.6-k. Looks like version this is a few years old. I’ll try to update it later tonight and see if that helps anything.

ESXi with e1000e emulation?

This is a bare metal server ( if I’m using that term correct ) so no virtualization going on. Ubuntu 18.04 is directly installed onto an SSD. The ethernet port is onboard my motherboard which is a Gigabyte H370-HD3. The port is an Intel I219-V. Not sure if this answers your question.

This is an OS crash.

When I saw it last, the Ubuntu ethernet driver had a bug.

From what you’ve shown me here, knowing it’s a bare metal installation and not the result of ESXi emulation misalignmet, The fault can only be in Ubuntu itself.

Feb 19, 2019 06:08:55.251 [0x7fdafce60700] DEBUG - Auth: authenticated user 10434655 as whitecastle200
Feb 19, 2019 06:08:55.252 [0x7fda85ffb700] DEBUG - Request: [168.245.154.244:49220 (WAN)] GET /:/eventsource/notifications (13 live) TLS Signed-in Token (whitecastle200)
Feb 19, 2019 06:08:56.860 [0x7fda68ff9700] DEBUG - Statistics: Flushing 8 expired bandwidth entries, 0 expired media entries.
\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00

It appears 18.04.2 has regressed and that bug has returned.

It looks like I am beginning to have this issue with plex 1.15 is there a way to fix it?

My issue was faulty RAM. I ran a memtest on my RAM and it found errors. I replaced the RAM and all my crashing / locking issues went away.

Time to test 110gb of ram then.
:frowning: