"There was an unexpected error loading this library" every so often-SIGKILL required to restart PleX

Server Version#: plex-media-server-plexpass 1.32.5.7349-1
Player Version#: plex-media-player 2.58.1-3
<If providing server logs please do NOT turn on verbose logging, only debug logging should be enabled>

Trying to click on media in the client will eventually throw “There was an unexpected error loading this library” after trying to open a media library item a after a couple seconds of trying.

The plexmediaserver service is still running but trying to restart it with systemctl restart plexmediaserver hangs - unable to stop it. As if the process is stuck on something.

Eventually, systemd kills the service and starts a new one. This results in a usable PleX experience once more and all the library items load open near-immediately.

It is also possible to kill the PleX process but it doesn’t respond - A manual sigkill (kill -9) were required to kill the stunned Plex Media Server process which wouldn’t close when asked nicely.

Starts back up immediately after being killed.

None of the involved storage on the server seems to be having any trouble and the dual socket cpu’s of the server are idle. The Plex Media Server process is just getting inexplicably stuck.

The last log lines in journalctl for the plexmediaserver unit are Generated new chapter thumbnails for SomeShowNameHere an hour or so before plexmediaserver.service: Main process exited, code=killed, status=9/KILL when systemctl had to kill the process because it wasn’t responding.

The host also has a Quadro P2000 [GP106GL] in it which plexmediaserver uses. This gpu was also idle and responsive as seen via the nvidia-smi command during the plex server process’s partial hang.

This keeps happening frequently multiple times a month and it’s frustrating to bother with. The plex server should not have to be sigkilled to be restarted and it shouldn’t be locking up like that either.

Plex did this again this afternoon and I noticed a process I don’t usually see named Plex EAE Service which had started nearly 12 hours prior (around 2:03am) and had consumed 48m36s of cputime since its startup. When I issued a regular kill to this process (The default is SIGTERM) it refused to die.

Most importantly combing journalctl for relevant Plex logs showed the exact same last message Generated new chapter thumbnails for SameEpisodeOfSomeShowHere that I saw the last time Plex hung like this.

There were two copies of the above mentioned episode. One in regular quality and another in DolbyVision which slipped under the radar. I removed that file and restarted plex (Once again using SIGKILL because it refuses to restart once it gets into this state) and I saw the same episode have chapter thumbnails generated again.

Thus far my only lead is PleX’s thumbnail generator not liking certain media formats or edge cases. Does anyone on the development team actually read these posts or is this just another dead end for reporting a problem which has been plaguing me for months?

Please pull that file out, set it aside (Yes, we want it), and replace it with something more mainstream for now so you can continue.

I have a feeling you got your hands on a HEVC DV Profile 5 video – which cannot be transcoded in any way (no license) and PMS might not know what to do with it. It shouldn’t have gotten all twisted up under any circumstances.

If we can make arrangements for me to get the file (or at least the first 250 MB of it), that’ll help us.

Hi and thanks for your response. I’ve pulled the file back out of this morning’s snapshot and below is the video section from mediainfo regarding the file if it helps give a glance. I suspect that 05.06 might confirm your suspicions if I’m reading correctly.

Happy to scp it to my webserver and pm you the link if you like.

Video
ID                                       : 1
Format                                   : HEVC
Format/Info                              : High Efficiency Video Coding
Format profile                           : Main 10@L5.1@Main
HDR format                               : Dolby Vision, Version 1.0, dvhe.05.06, BL+RPU
Codec ID                                 : V_MPEGH/ISO/HEVC
Duration                                 : 44 min 56 s
Bit rate                                 : 18.8 Mb/s
Width                                    : 3 840 pixels
Height                                   : 1 920 pixels
Display aspect ratio                     : 2.000
Frame rate mode                          : Constant
Frame rate                               : 23.976 (24000/1001) FPS
Color space                              : YUV
Chroma subsampling                       : 4:2:0
Bit depth                                : 10 bits
Bits/(Pixel*Frame)                       : 0.107
Stream size                              : 5.92 GiB (97%)
Writing library                          : x265 3.5+1-f0c1022b6:[Linux][GCC 10.2.1][64 bit] 10bit
Encoding settings                        : cpuid=1111039 / frame-threads=5 / wpp / no-pmode / no-pme / psnr / ssim / log-level=2 / input-csp=1 / input-res=3840x1920 / interlace=0 / total-frames=0 / level-idc=0 / high-tier=0 / uhd-bd=0 / ref=3 / no-allow-non-conformance / repeat-headers / annexb / aud / no-eob / no-eos / hrd / info / hash=0 / no-temporal-layers / no-open-gop / min-keyint=96 / keyint=96 / gop-lookahead=0 / bframes=4 / b-adapt=2 / b-pyramid / bframe-bias=0 / rc-lookahead=96 / lookahead-slices=8 / scenecut=40 / no-hist-scenecut / radl=0 / no-splice / no-intra-refresh / ctu=64 / min-cu-size=8 / no-rect / no-amp / max-tu-size=32 / tu-inter-depth=1 / tu-intra-depth=1 / limit-tu=0 / rdoq-level=0 / dynamic-rd=0.00 / no-ssim-rd / signhide / no-tskip / nr-intra=0 / nr-inter=0 / no-constrained-intra / strong-intra-smoothing / max-merge=3 / limit-refs=1 / no-limit-modes / me=1 / subme=2 / merange=57 / temporal-mvp / no-frame-dup / no-hme / weightp / weightb / no-analyze-src-pics / deblock=0:0 / no-sao / no-sao-non-deblock / rd=3 / selective-sao=0 / early-skip / rskip / no-fast-intra / no-tskip-fast / no-cu-lossless / b-intra / no-splitrd-skip / rdpenalty=0 / psy-rd=2.00 / psy-rdoq=0.00 / no-rd-refine / no-lossless / cbqpoffs=0 / crqpoffs=0 / rc=crf / crf=17.0 / qcomp=0.60 / qpstep=4 / stats-write=0 / stats-read=0 / vbv-maxrate=21900 / vbv-bufsize=40000 / vbv-init=0.9 / min-vbv-fullness=50.0 / max-vbv-fullness=80.0 / crf-max=0.0 / crf-min=0.0 / ipratio=1.40 / pbratio=1.30 / aq-mode=2 / aq-strength=1.00 / cutree / zone-count=0 / no-strict-cbr / qg-size=32 / no-rc-grain / qpmax=69 / qpmin=0 / no-const-vbv / sar=1 / overscan=0 / videoformat=5 / range=1 / colorprim=2 / transfer=2 / colormatrix=2 / chromaloc=0 / display-window=0 / cll=0,0 / min-luma=0 / max-luma=1023 / log2-max-poc-lsb=8 / vui-timing-info / vui-hrd-info / slices=1 / no-opt-qp-pps / no-opt-ref-list-length-pps / no-multi-pass-opt-rps / scenecut-bias=0.05 / hist-threshold=0.03 / no-opt-cu-delta-qp / no-aq-motion / no-hdr10 / no-hdr10-opt / no-dhdr10-opt / no-idr-recovery-sei / analysis-reuse-level=0 / analysis-save-reuse-level=0 / analysis-load-reuse-level=0 / scale-factor=0 / refine-intra=0 / refine-inter=0 / refine-mv=1 / refine-ctu-distortion=0 / no-limit-sao / ctu-info=0 / no-lowpass-dct / refine-analysis-type=0 / copy-pic=1 / max-ausize-factor=1.0 / no-dynamic-refine / no-single-sei / no-hevc-aq / no-svt / no-field / qp-adaptation-range=1.00 / scenecut-aware-qp=0conformance-window-offsets / right=0 / bottom=0 / decoder-max-rate=0 / no-vbv-live-multi-pass
Default                                  : Yes
Forced                                   : No
Color range                              : Full

Thanks for confirming it’s profile 5.

There’s no need to get us a copy of the file. We have some dvhe.05.06 sample files here to use & retest with.

Looks like it’s happened again today against a new replacement file fetched after a purge of all DV/HDR10 content in an attempt to prevent Plex entering this hang state every day.

The last log entry from was from 8:19am this morning (It is now 8:07PM, I killed it at 8:05PM), it has been hung for nearly 12 hours and I received some complaints earlier today before I could get to things.

The last log line is: Sep 07 08:19:58 hostnameHere plexmediaserver[2170498]: Generated new chapter thumbnails for Thor.

One of the two files which match the 2011 title has

Video
ID                                       : 1
Format                                   : HEVC
Format/Info                              : High Efficiency Video Coding
Format profile                           : Main 10@L5.1@High
HDR format                               : SMPTE ST 2086, HDR10 compatible

I’m at a point where I’m gonna have to write some check script to keep an eye on things. The platform continuously kills itself and refuses to die as a process instead of letting my family watch things.

This is made worse by automated content procurement methods having no idea the inbound content is HDR10 before syncing or otherwise.


Before killing plexmediaserver tonight I tried to figure out if it could be saved without a SIGKILL.

/usr/lib/plexmediaserver/Plex Transcoder had an entirely irrelevant movie file open and was in a SLEEP state. The movie file which it had open as a file descriptor was only AVC / High@L5 and presumably unrelated to these crashes. Attaching strace to it showed this shortly before being detached.

[pid 2215987] futex(0x7fb8acc70a34, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid 2215986] futex(0x7fb8acc93a34, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid 2215985] futex(0x7fb8ad044a34, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid 2215984] futex(0x7fb8ad074a34, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid 2215983] futex(0x7fb8ad097a34, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid 2214744] restart_syscall(<... resuming interrupted read ...>) = 0
[pid 2214744] poll([{fd=3, events=POLLIN}], 1, 100) = 0 (Timeout)
[pid 2214744] poll([{fd=3, events=POLLIN}], 1, 100) = 0 (Timeout)
[pid 2214744] poll([{fd=3, events=POLLIN}], 1, 100) = 0 (Timeout)

/usr/lib/plexmediaserver/Plex Tuner Service kept respawning as I killed it but the process was happy to die and respawn shortly after without a SIGKILL, so I doubt it was relevant to today’s hang.


Plex EAE Service was also in a sleep state but had 226minutes 54s of cpu time clocked and had a start date of Sep06 (Some timestamp the previous day, at least 8 hours of existing). Attaching strace to it only showed epoll_wait(5, and this process would not stop without a SIGKILL.


Next up was Plex Plug-in which had a date of the previous day with only 2m:31s of cpu time and was also sleeping. Attaching strace showed lines like the below and this one also required a SIGKILL.

[pid 2167763] futex(0x7f0048e7f480, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...>
[pid 2167762] futex(0x7f0048e7ed00, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...>
[pid 2165989] futex(0x7f004968e2f0, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...>
[pid 2165988] futex(0x7f004968e260, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...>
[pid 2165987] futex(0x7f004968e080, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...>
[pid 2165986] futex(0x7f004968d2d0, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...>
[pid 2165985] futex(0x7f004968d330, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...>
[pid 2165984] futex(0x7f004968d1b0, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...>
[pid 2165983] futex(0x7f004968d060, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...>
[pid 2165982] futex(0x7f004968be60, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...>
[pid 2164991] read(0,  <unfinished ...>
[pid 2167772] madvise(0x7f00487f3000, 8192, MADV_FREE) = 0
[pid 2167772] getpid()                  = 2164991
[pid 2167772] epoll_pwait(7, [], 1023, 200, NULL, 8) = 0
[pid 2167772] madvise(0x7f00487e6000, 12288, MADV_FREE) = 0
[pid 2167772] epoll_pwait(7, [], 1023, 200, NULL, 8) = 0
[pid 2167772] madvise(0x7f00487f0000, 8192, MADV_FREE) = 0
[pid 2167772] epoll_pwait(7, [], 1023, 200, NULL, 8) = 0
[pid 2167772] madvise(0x7f00487f3000, 8192, MADV_FREE) = 0

The remaining processes at this point were Plex Tuner Service (Which kept reviving after being killed gracefully.) and the main process /usr/lib/plexmediaserver/Plex Media Server which required a SIGKILL as usual to restart and resume serving content.

I’ve uploaded the last few minutes of '/var/lib/plex/Plex Media Server/Logs/Plex Media Server.log' to this comment for more info. The log timestamps Sep 07, 2023 08:20:00.943 and Sep 07, 2023 08:20:01.219 were for Thor which lines up within a minute or so of the service hang.
plexmediaserver.log (36.8 KB)

Happened again today though I’ve been combing for potential causes and found other problems on the adventure which I’ll detail to the development team elsewhere.

As PleX keeps hanging up the phone but not actually dying for a process to take over, leaving my devices plexless until I find out it’s hung - I’ve written the below nagios-compatible check to run every 5 minutes:

#JJ
#Sat 09 Sep 2023 22:55:52 AEST
if ! systemctl is-active plexmediaserver >/dev/null 2>&1
then
echo "Service plexmediaserver is not running."
fi

staleSeconds=300
now=$(date +%s)
fileMtime=$(stat -c%Z '/var/lib/plex/Plex Media Server/Logs/Plex Media Server.log')
secondsSinceLastModification=$(($now - $fileMtime))
lastModString="($secondsSinceLastModification seconds since last log write)"
if [ $secondsSinceLastModification -gt $staleSeconds ]
then
  echo "Logs indicate PleX is dead. Copying logs and restarting... $lastModString"
  cp -v "/var/lib/plex/Plex Media Server/Logs/Plex Media Server.log" "~/Plex.Media.Server.sensu.logcopy.$(date +%d-%m-%Y_%H:%M:%S%P).log"
  echo "Last logs:"
  echo "----------"
  tail -n10 "/var/lib/plex/Plex Media Server/Logs/Plex Media Server.log"
  echo "----------"
  if ! timeout 10 sudo systemctl restart plexmediaserver
  then
    echo "Failed to restart plex gracefully... forcing kill..."
    pkill -f -9 Plex
    if sudo systemctl restart plexmediaserver
    then
      echo "Successfully restarted plex after force-kill."
      exit 1
    else
      echo "Failed to restart plex after force-kill"
      exit 2
    fi
  else
    echo "Successfully restarted PleX first-try without force-kill. Incicates was not stuck despite log age."
  fi
else
  echo "Logs indicate PleX is OK. $lastModString"
  exit 0
fi

I ran this messy command pipe to gauge an estimate of how large the deltas are between my PleX server writing logs to tell what I should set the staleSeconds variable to. I saw no deltas over 15 minutes for the currently running plex pid so set the timeout as that for an assumption of healthy operation.

$ cat /var/lib/plex/Plex\ Media\ Server/Logs/Plex\ Media\ Server.*.log | cut -d' ' -f1,2,3,4,5 | cut -d':' -f1,2 | sort -u | while read line ; do date -d"$line" +%s; done | while read line ; do [ -z "$last" ] && last=$line ; echo $(($line - $last)) ; last=$line; done| sort -u -V

There may be smarter ways to detect this crash but I’ll think about those at a later date (e.g. cURLing the main page for a timely response maybe a viable health-check option. Though may require a valid login to actually see the main page.)