Almost daily unreachable PMS (Chuck save me)

Alright, here are the logs from the most recent (containerized) instance. I thought maybe the issue was resolved as it went 3 days without issue but then finally happened last night

Plex Media Server Logs_2020-02-27_03-59-07.zip (6.2 MB)

Ill update with native logs after it happens

a bit confused here.

  1. This is from the container?
  2. It contains another container crash / failure?
  3. It’s the Plex Relay message?
  1. As we originally discussed I was running a native plex alongside the dockerized instance to avoid service interruption for my users, so yes these logs are from the docker instance.

  2. Yes, after 3 days of running fine the docker instance finally became unreachable

  3. Yes, this log AS WELL as the originally posted log from a few days back both displayed that plex relay message in thee logs.

Also just so you know I’m running the same version natively as I was inside the container: 1.18.4.2171-ac2afe5f8

Thanks. I just want to be absolutely certain of what I’m looking at.

What’s interesting is : The same binary executables are used for the container as well as the desktop. The only difference possible is glibc. (All of PMS has a single minimum version requirement. glibc must be 2.14.1 or higher.

If there is a difference between the container’s version and the typical desktop which is causing this, it will be a good and interesting find but likely a pain to resolve.

I have a couple things to do first but will start diving deep into this later this afternoon.

Sure, I appreciate it.

The container/native environment both report: (Ubuntu GLIBC 2.23-0ubuntu11)

This in concerning.

Feb 27, 2020 03:58:41.168 [0x7fccb6ffd700] ERROR - Failed to begin transaction (../Library/MetadataCollection.cpp:174) (tries=6): Cannot begin transaction. database is locked
Feb 27, 2020 03:58:41.586 [0x7fccb6ffd700] ERROR - Failed to begin transaction (../Library/MetadataCollection.cpp:174) (tries=7): Cannot begin transaction. database is locked

Where is the metadata stored? Is it on the local machine or across the network?

Local Machine. Those messages appeared in I believe both sets of logs.

But doesn’t it correct itself and keep going or does it die at that point?

It keeps going (as you can see by the retry count).

The concern is “why”. There is something monopolizing the DB resources.

I did see a flurry of sync activity before one set.
I think this is worth investigating; whether sync activity itself is ‘hogging’ the DB time

Yeah, I saw it keep going but after the tenth attempt I see:

Feb 27, 2020 03:58:43.439 [0x7fccb6ffd700] ERROR - Failed to begin transaction (../Library/MetadataCollection.cpp:174) (tries=10): Cannot begin transaction. database is locked
Feb 27, 2020 03:58:43.439 [0x7fccb6ffd700] ERROR - Fixup 29DatabaseFixupsSyncCollections threw an exception: Cannot begin transaction. database is locked
Feb 27, 2020 03:58:43.439 [0x7fccb6ffd700] DEBUG - Running database fix-ups completed in 1.1 seconds.

after which point it never appears again

Yes, that’s exactly where I’m looking too.

It’s starting to look like a messy sync list is causing problems.

I see, how could we investigate that further as the trigger?

I need to first speak with Engineering about how that works.

This might be the red herring error because database fixups are running and the root problem.

There does appear to be a race condition. Now to identify which elements are fighting.

Is there a way to clear out all the sync related data from the db in an effort to remove whatever leftover data is affecting the application?

Yes, manually go through your sync lists and clear out the dead ones.

Well I went to the “syncs” and removed everything. Also saw this

Feb 27, 2020 03:58:43.439 [0x7fccb6ffd700] ERROR - Fixup 29DatabaseFixupsSyncCollections threw an exception: Cannot begin transaction. database is locked
Feb 27, 2020 03:58:43.439 [0x7fccb6ffd700] DEBUG - Running database fix-ups completed in 1.1 seconds.

You cleared it out before or after that?

Also, after performing the cleanup, did you perform a shutdown / restart ?

I ask because, during the shutdown/restart sequence, the contents of the WAL (write-ahead log) is integrated into the static DB

This is after all the crashes that I removed them

so it’s recovering from the crash-- that’s going to happen.
The WAL is going to be recovered (replayed) to ‘fixup’ the current db image

Assuming i’ve cleared it out, there should be no mention of sync any more in the logs, correct? (assuming no one starts a new session)

correct. when sync is free of errors (LPE warnings, etc) and ‘pristine’, none of that should be happening