Server hangs for 30+ seconds, streams unpause by themselves

My Plex setup has been working great for many years now. I’m using Linux/Unraid/Docker and I mostly use the Apple TV client. As of about a week ago, I’m experiencing the following problems:

  1. My server will sometimes become unavailable for periods of around 30-60 seconds. I’ve confirmed this both on local and remote connections.

  2. My local streams will sometimes hit a black screen and need to buffer. More rarely, they will quit the stream with some error message about the playback session (will get exact text of this next time it happens)

  3. The most bizarre thing is, as of when this started happening, if I leave a local stream paused, it will magically unpause itself after a small period of time (it’s random but usually under 2 minutes). I’ve observed this on the Apple TV client with local streams. I’ve tried to reproduce it on Plex Web but it doesn’t seem to happen there.

  4. And to add to the frustration, none of my remote users are seeing any of these problems.

I’ve done the following to try to figure out what’s going on:

  1. Rolled back PMS to a version before the problem, no change

  2. The Apple TV client hasn’t been updated in months, so it’s not that

  3. I use Moonlight to stream between that same Apple TV and my server, and that still works perfectly fine, so I doubt it’s a problem with my local network configuration

  4. Tried repairing my DB using ChuckPA, no change

I’ll try to grab some server logs next time it happens, but nothing has really jumped out at me from them so far other than I’m getting a lot of “Transaction held too long” warnings. These are happening pretty regularly and some of them report stalls longer than 30 seconds, so this is my main suspect right now. However, the DB is on NVMe, no idea how it could be this slow. I’m also not sure whether these slow transaction logs were happening before the problems or not.

I’ll update with more info as I can get it, but hoping somebody has at least a theory as to what could possibly be doing all of this (especially the phantom unpausing).

Here’s an example of what my server warning/error logs look like right now, nothing is even playing. I have a cron job hitting the status endpoint every minute which should be the only traffic here.

How large is your Plex database file?

When ChuckPA was running I think it said the two databases were around 300MB and 700MB, though I’m not sure if that makes sense given I have about 60TB of media in there. I’ll read through this thread and see if any of it applies, thanks for the suggestion!

Seeing the problem with direct streams with no transcoding happening, so unlikely this is related. It’s also happening way more often than any of my library scans. But thanks for the suggestion

Took a peek into the DB size thread and it seems like a very promising lead, it’s specifically calling out statistics which is also showing up a lot in my slow transaction logs. Will try to check that out tonight.

Digging into this now. ls -al output from my Databases folder:

root@Bastion:/config/Library/Application Support/Plex Media Server/Plug-in Support/Databases# ls -al
total 4633952
drwxr-xr-x 1 abc  abc       1120 Jun 11 08:10 .
drwxr-xr-x 1 abc  abc        100 Oct 15  2023 ..
-rw-r--r-- 1 abc  abc    7122944 Jun 11 08:10 com.plexapp.dlna.db
-rw-r--r-- 1 abc  abc      32768 Jun 11 08:10 com.plexapp.dlna.db-shm
-rw-r--r-- 1 abc  abc    7164712 Jun 11 08:10 com.plexapp.dlna.db-wal
-rw-r--r-- 1 abc  abc  776867840 Jun 11 08:10 com.plexapp.plugins.library.blobs.db
-rw-r--r-- 1 abc  abc  764719104 Jun  5 04:10 com.plexapp.plugins.library.blobs.db-2025-06-05
-rw-r--r-- 1 abc  abc  767710208 Jun  8 02:05 com.plexapp.plugins.library.blobs.db-2025-06-08
-rw-r--r-- 1 abc  abc  768340992 Jun 11 02:26 com.plexapp.plugins.library.blobs.db-2025-06-11
-rw-r--r-- 1 abc  abc      32768 Jun 11 17:32 com.plexapp.plugins.library.blobs.db-shm
-rw-r--r-- 1 abc  abc     902312 Jun 11 17:22 com.plexapp.plugins.library.blobs.db-wal
-rw-r--r-- 1 abc  abc  399085568 Jun 11 17:22 com.plexapp.plugins.library.db
-rw-r--r-- 1 abc  abc  413492224 Jun  5 04:09 com.plexapp.plugins.library.db-2025-06-05
-rw-r--r-- 1 abc  abc  417902592 Jun  8 02:04 com.plexapp.plugins.library.db-2025-06-08
-rw-r--r-- 1 abc  abc  417146880 Jun 11 02:25 com.plexapp.plugins.library.db-2025-06-11
-rw-r--r-- 1 abc  abc      32768 Jun 11 17:43 com.plexapp.plugins.library.db-shm
-rw-r--r-- 1 abc  abc    4585592 Jun 11 17:43 com.plexapp.plugins.library.db-wal
-rw-r--r-- 1 root root      1711 Jun 11 08:10 DBRepair.log
drwxr-xr-x 1 root root       632 Jun 11 08:04 dbtmp

I don’t think this is it, my statistics_bandwidth table isn’t very big and my overall DB sizes seem reasonable given the size of my library.

Just took about 10 seconds to start a stream and I got these logs:

No clue how I’m supposed to debug any of this with a closed system.

Tried turning off automatic library scans, still seeing slow transaction logs but it just made it 20 minutes without buffering, will report back if that actually fixes it (which would be bonkers).

Whatever’s going on has gotten worse and my server is now going hard down until I restart it (so remote users are also affected now). Automatic library scans disabled didn’t fix it.

Putting a Datadog synthetic on my server has revealed that whatever’s happening slowly increases the server’s overall response time (this is the status endpoint) until eventually it falls over and needs to be rebooted. This really looks like some sort of resource leak on the PMS side.

Actually, I think I might know what this is! I have a cron job that occasionally hits the server’s status and library endpoints. The drop in the graph here corresponds to when I turned that cron job off to see if it helped something. So it seems very likely there’s something in either the status or library endpoints that is leaking something or accumulating lag as it runs.

Here’s a full gist of that cron job, which is using GitHub - pushingkarmaorg/python-plexapi: Python bindings for the Plex API.

I’m hoping hitting the same endpoints as this library repeatedly should reproduce the problem

I’m continuing to narrow this down. The server hangs seem to happen when this Python lib, PlexAPI, hits the / (server capabilities) endpoint as part of initializing its connection. I haven’t been able to replicate this with just cURL yet, but here are some logs from a request that took down my server for almost 2 minutes:

You can see that the / call, which is usually near-instantaneous, took 104 seconds to return. During this time, the server is completely unavailable.

I’m also noticing that the server starts returning 406s on some calls when this is going on, I don’t see these under normal operation:

Alright! I solved this.

The library is giving a new random value to X-Plex-Client-Identifier on every request fresh connection, which has happened every minute for the last five years because I run a cron job against the status endpoint. The server hits the period of degradation only sometimes, and only when it’s a client identifier it hasn’t seen before. This was why I couldn’t reproduce with cURL, because I was replaying previous requests and not generating new identifiers. Once I told PlexAPI to always use the same identifier, the problem seems to have gone away.

So this is clearly a problem on the server side, but that’s closed source, so I’m not sure how to proceed. Is there a way to flag this to the server team? I suspect that blasting a bunch of requests to / with varying X-Plex-Client-Identifier will reproduce the issue.

edit: I suspect this may also be causing the DB slowness, I probably have a few million unique client IDs stored up somewhere

edit2: also reported to client lib Random X-Plex-Client-Identifier Causes Performance Issues on PMS · Issue #1532 · pushingkarmaorg/python-plexapi · GitHub

Great investigating. I’ve been silently watching your post, but had no ideas to contribute. Unless you get a response from a dev, (like @BigWheel) (poke) acknowledging they are investigating the issue, assume that they are not aware of the issue.

1 Like

I also truncated my statistics_bandwidth table (which had data from 2020 still in it) and my media_streams table (which had data from 2023). Not sure how much that helped, but with those tables cleared and the identifiers no longer randomized, I’m no longer seeing any of the long transaction time warnings from SQLite.

Any other ideas on getting someone from the server team to look at this?

Only poking a dev. @bigwheel.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.