Almost daily unreachable PMS (Chuck save me)

Server Version#: 1.18.4.2171-ac2afe5f8
Player Version#: Personally I’m using ATV Plex app

Plex Media Server Logs_2020-02-17_23-06-53.zip (5.7 MB)

I’m just about at my wits end with PMS these past few weeks and I cannot for the life of me figure out where I might look next. Almost every day, my server will go unreachable and the only solution is to reboot the container. I have tried upgrading the version, downgrading (was considering trying 1.16*), updated all my system packages and even updated the VAAPI. HW transcoding is enabled on the server but I cannot see anything in the logs that leads me to understand where the issue is. Chuck please help.

Please turn VERBOSE back off. I only ask for verbose logging under extreme cases.

Is the Container running with NAT networking or HOST?

It looks like NAT.

You are correct, NAT networking for the container. I would like to add that this has never been an issue for me before but something must have changed to cause this kind of instability.

I’ve gone ahead and turned verbose logging off so as soon as it happens again I’ll grab logs and post here

Do you have any ideas where I might want to look for more information?

Precisely why I asked.

Is there anything binding you to using a container?
Most folks never move them and now, with systemd, services are as easily controllable as containers or even snaps.

Being on the native host avoids the NAT layer – which is a win.
Being on the native host relieves you from mapping the transcoding hardware – win.

As a test, would you mind trying to copy out your /config and relocating it to /var/lib/plexmediaserver just as it would be when native?

To make the test easier, you can also make --bind mounts to emulate the mount points the container’s database would see.

I suggest this because there’s instability in either the web app or plex.tv
Engineering is working on it but I am looking for a way to get and keep you running.

How’s any of this sound? Plausible?

I can certainly try a non containerized option but I question the fact that this has never been an issue before. Furthermore the container is going unhealthy when it becomes unreachable. I’d be surprised if this was NAT related.

You mentioned the fact that engineering is working on a fix. What exactly is the issue they’re trying to fix?

Also to confirm, you only want the debug option checked, right? I could have sworn verbose was not on when those logs were captured.

If you look around the forum, you’ll see there is discontinuity between Plex/Web & apps with the server.

The symptom you describe does fit that same problem.

One benefit to leaving the container environment is the ability to change installed versions at a whim through dpkg.

If you are savvy enough and feel up to it,

  1. make a copy of the container
  2. unpack an older DEB file (to get that version of code)
  3. drop it over the top of the code which exists in the container’s /usr/lib/plexmediaserver

Another alternative is to restart the container at known times and wait it out until they resolve it. I don’t know where they are at in that process so can’t predict any dates.

Just happened again. Here are the logs Plex Media Server Logs_2020-02-23_02-41-34.zip (1.5 MB)

And in reference to your changing out versions, I have access to tagged versions for pretty much much any version of the code you suggest so I can switch to anything you suggest

There are no error indications whatsoever but there is a loading issue for which there is no reference to within plex. Perhaps an external processing event?

I see a few streaming (through your Plex-Relay == indirect)

This is telling me the CPU (yes, I see it’s a 10K passmark CPU) is not getting a chance to run.

Feb 23, 2020 02:39:16.098 [0x7fea5a7fc700] DEBUG - HTTP requesting GET https://plex.tv/api/v2/user/privacy?X-Plex-Token=xxxxxxxxxxxxxxxxxxxx
Feb 23, 2020 02:39:16.098 [0x7fea1bfff700] WARN - Took too long (0.680000 seconds) to start a transaction on ../Library/MediaStreamSetting.cpp:24
Feb 23, 2020 02:39:16.098 [0x7fea1bfff700] WARN - Transaction that was running was started on thread 0x7fea59ffb700 at ../Library/MediaStreamSetting.cpp:24
Feb 23, 2020 02:39:16.099 [0x7fea58ff9700] WARN - Took too long (0.680000 seconds) to start a transaction on ../Library/MediaStreamSetting.cpp:24
Feb 23, 2020 02:39:16.099 [0x7fea58ff9700] WARN - Transaction that was running was started on thread 0x7fea1bfff700 at ../Library/MediaStreamSetting.cpp:24
Feb 23, 2020 02:39:16.099 [0x7fea617fa700] WARN - Took too long (0.680000 seconds) to start a transaction on ../Library/MediaStreamSetting.cpp:24
Feb 23, 2020 02:39:16.099 [0x7fea617fa700] WARN - Transaction that was running was started on thread 0x7fea58ff9700 at ../Library/MediaStreamSetting.cpp:24
Feb 23, 2020 02:39:16.099 [0x7fea2b7fe700] WARN - Took too long (0.680000 seconds) to start a transaction on ../Library/MediaStreamSetting.cpp:24
Feb 23, 2020 02:39:16.099 [0x7fea2b7fe700] WARN - Transaction that was running was started on thread 0x7fea617fa700 at ../Library/MediaStreamSetting.cpp:24
Feb 23, 2020 02:39:16.099 [0x7fea28ff9700] WARN - Took too long (0.670000 seconds) to start a transaction on ../Library/MediaStreamSetting.cpp:24
Feb 23, 2020 02:39:16.099 [0x7fea28ff9700] WARN - Transaction that was running was started on thread 0x7fea2b7fe700 at ../Library/MediaStreamSetting.cpp:24
Feb 23, 2020 02:39:16.099 [0x7fea58ff9700] DEBUG - We're going to try to auto-select an audio stream for account 21972069.
Feb 23, 2020 02:39:16.099 [0x7fea1affd700] WARN - Took too long (0.670000 seconds) to start a transaction on ../Library/Media

Where would I look for more info? At the moment the behavior is a little different. After I rebooted this last time I can’t get the service up consistently. My container goes unhealthy over and over. What would you advise?

Normally it’ll go at least a day before it has an issue but something may have happened because I swapped down to version 1.6.* of plex and then back to 1.8.*

External load on your server, like any other Linux process would show in top or xload (a cpu load factor graph). Direct CPU usage is best in top (not htop).

The rest is as I find it here in the Plex logs.

Is any of this on a busy HDD?

This runs on an NVME drive but I mean there are other applications running on the same system, yes.

Ive effectively disabled any extraneous applications running on the server and it’s stable for now. As soon as it happens again I will update here with the logs in hopes there will be more for you to sift through.

You should probably bridge your container so that it isn’t double natted.

Or maybe just do what chuck tells you, and stop using containers in the first place.

1 Like

Hey Chuck,

Here are the latest logs from an unhealthy state. Im hoping these logs show something else since . this was more indicative of the behavior I was seeing before. It went for ~16 hours before an issue occurredPlex Media Server Logs_2020-02-23_21-15-25.zip (4.9 MB) :

I will say I see this:

Feb 23, 2020 21:15:02.459 [0x7f800ffff700] ERROR - Failed to begin transaction (../Library/MetadataCollection.cpp:174) (tries=1): Cannot begin transaction. database is locked

Shortly thereafter, the health checks fail:

Feb 23, 2020 21:15:17.801 [0x7f7fe9ffb700] DEBUG - HTTP requesting GET https://172-17-0-1.e739227604034ae1aa335852826bec5d.plex.direct:32400
Feb 23, 2020 21:15:17.801 [0x7f7fbb7fe700] DEBUG - HTTP requesting GET https://plex.strm.media
Feb 23, 2020 21:15:17.801 [0x7f7fe9ffb700] ERROR - Error issuing curl_easy_perform(handle): 7
Feb 23, 2020 21:15:17.801 [0x7f7fe9ffb700] WARN - HTTP error requesting GET https://172-17-0-1.e739227604034ae1aa335852826bec5d.plex.direct:32400 (0, No error) (Failed to connect to 172-17-0-1.e739227604034ae1aa335852826bec5d.plex.direct port 32400: Connection refused)

I see where it stops responding.

In any PMS transaction,

Request:
Complete:

The same is true with Authentication.

Your PMS stopped responding.

I can’t tell if it’s the PMS or instability in the container.
This is why any abstraction layer (added complexity) makes it difficult.

Knowing PMS’s native structure is /var/lib/plexmediaserver/Library/Application Support/Plex Media Server/.....

How would you feel about manually removing PMS from the container
-or-

(hopefully)
Creating a customized configuration on Linux to use the native app with it pointing to the local host location for where /config is mapped.

Given the failures, This brings up another question:

Is any part of the container / PMS metadata, except for the media itself, on a network share ?

To answer your latter question, no. Nothing is on a network share other than the media

I am more than happy to just move all the config items into /var/lib/* as you’re requesting but I will preface this experiment by saying I have plenty of friends running containerized environments with this version of plex that claim no issues whatsoever. To confirm, you just want me to copy all the contents of plex config to the host default directory for plex and start it from there, correct?

When you say my PMS stopped responding, was it a SPECIFIC request it stopped responding to, or just the server in general stopped responding to any requests?

In response, I can tell you have have countless threads of never-ending issues with containers.

IMHO, You don’t put a permanent server in a Docker container, itself an abstraction from the host, which was intended only to be used until such time as a native application was available. This adds overhead (even on linux).

Anyone who has success with PMS hasn’t yet been bitten by the limits of the abstraction.
Their day is coming.

To isolate the PMS from the container, I would like you to create the following:
(assuming you have enough local host space for this)

/home/plexdata/Library/Application Support/Plex Media Server

With the docker container stopped, using whatever means is appropriate (tar | tar works well), clone the container’s PMS installation into this area so it lands in perfect structure.

Now:

  1. sudo chown -R plex:plex /home/plexdata
  2. Download and install PMS from plex.tv/downloads (64 bit)
  3. sudo systemctl stop plexmediaserver
  4. sudo systemctl edit plexmediaserver
  5. Create the following override file in the systemctl editor session
#
# Exported Container   experiment config
#
[Service]
#
Environment="PLEX_MEDIA_SERVER_APPLICATION_SUPPORT_DIR=/home/plexdata/Library/Application Support"

This moves the metadata

If you want to move the transcoder temp directory, ( I don’t know where yours is now)
you can create the temp one in /home/plexdata/tmp_transcoding

If you should do this, also add this to the override file.

Environment="PLEX_MEDIA_SERVER_TEMP_DIR=path-to-temp-dir-here"

Now, you can do the following:

sudo sh
systemctl daemon-reload
systemctl start plexmediaserver

The native host will start up just as if it were the Docker host.

At this point, we are fully native.
If there is a continued problem, troubleshooting will be multi-fold easier because we aren’t abstracting anything from the real host.

  1. File locks will be native
  2. Network traffic will be native.

Alright I’ll go ahead and set it up per your instructions. I’ll check back in afterwards. I’m going to go ahead and delete preferences so I can register it in parallel to my existing server to avoid service interruptions.

That’s a good idea. Instantiate a new UUID & new Friendly Name

Hey Chuck,

I’ve been running the native instance of plex w/o problems but I was not having any users actually play off of it. I’ve gone ahead and taken down the docker instance so that users will actually hit the server natively to see how it performs. That being said, I couldn’t help but notice this message in the logs I submitted earlier as well as with the most recent crash (which took 3 days to occur)

Feb 27, 2020 03:44:47.578 [0x7f7f00ff1700] ERROR - [PlexRelay] kex protocol error: type 7 seq 11

Does this have anything to do with the behavior we’re seeing?

That’s nothing to be worried about.

What it’s telling you is the initial Plex Relay handshake isn’t at the highest level possible.

I’m jumping ahead a bit but there is an update for all that in the works now.
All the https/tls/ssl/etc/etc is being updated to current versions.

There aren’t any issues now except for that Relay text. (plex.tv has already updated to new and is attempting to connect at higher security level).

Everything is still fully secured. The updates are ‘updating runtime library versions to current public standards’.