Held transaction for too long - StatisticsManager.cpp keeps increasing

Server Version: 1.40.2.8395

Player Version: 4.125.1

I’m reaching out regarding a recurring issue with my Plex Media Server and would appreciate some assistance in troubleshooting and resolving it.

Issue Description:

In the Plex Media Server logs, I’m consistently seeing warnings indicating that a transaction is being held for too long. An example of the warning message is as follows:

javascript

Copy code

May 21, 2024 19:05:39.317 [22405911681848] WARN - Held transaction for too long (/home/runner/actions-runner/_work/plex-media-server/plex-media-server/Statistics/StatisticsManager.cpp:294): 0.290000 seconds

This continually increases and after a few days, the transaction hold time escalates to several seconds.

Server Configuration:

  • Plex Media Server Version: 1.40.2.8395
  • Operating System: Unraid 6.12.10
  • CPU: 13th Gen Intel(R) Core™ i9-13900K
  • Architecture: x86_64
  • CPU(s): 32
  • CPU max MHz: 5800.0000
  • Memory: 64 GB RAM

Storage Configuration:

  • Filesystem: ZFS RAIDZ2

Plex Database Maintenance:

  • Database Maintenance: Ran @ChuckPa 's DBRepair.sh
  • Database Cache: 9500 MB
  • Database Page Size: 34 KB
  • Dataset Hosting Plex: 34 KB page size

Media Library:

  • Movies: 5000
  • Shows: 57,000

Despite optimizing the Plex database and ensuring adequate server resources, including ample RAM and processing power, I’m still encountering this warning. I suspect it may be related to the size of my media library and the ZFS configuration.

I would greatly appreciate any insights or recommendations on how to resolve this issue.
Plex Media Server Logs_2024-05-21_20-27-47.zip (6.3 MB)

Please confirm ?

  1. You set the DB memory cache size to 9500 MB ?

  2. Your dataset size is 34 KB instead of the default 128 KB?

  3. 5000 Movies, with our without local media extras in with the media ?

  4. 57,000 full series or 57,000 episodes (when counted individually)

  1. You set the DB memory cache size to 9500 MB ? yes in plex settings
  2. Your dataset size is 34 KB instead of the default 128 KB? yes
  3. 5000 Movies, with our without local media extras in with the media ? -without
  4. 57,000 full series or 57,000 episodes (when counted individually)
    -57000 episodes

Thanks for your help

Thanks for confirming those values.

My thoughts on your configuration -

The configuration you have is not at all efficient.

  1. Record size = 34KB is 8x SLOWER than the ZFS default. They selected 128KB for a reason. The ‘cost’ of a ZFS file system read (overhead) is fixed. By forcing it to make 8x as many reads for the same amount of data, you’re forcing 8x the overhead.
    – This is like getting a gallon of water a tablespoon at a time versus a cup at a time.

  2. Database Cache size is where you’re trying to compensate by loading 9GB of the DB into memory. At your quantities, the entire DB is already loaded into memory. You can only access it 34KB at a time. That’s the problem.

I don’t know how easy it will be for you to reconfigure but you need to:

  1. Create a new dataset at the default 128KB record size

  2. Transfer all of the Plex data to that new dataset.

  3. After the transfer is complete, PRIOR to starting Plex, run DBRepair
    export DBREPAIR_PAGESIZE=65536 prior to launching DBRepair
    – Run a ‘auto’ optimization to rebuild the DB with the new record size.
    – See the documentation ( GitHub - ChuckPa/PlexDBRepair: Database repair utility for Plex Media Server databases )

  4. Start Plex,
    – RESET the Database Cache Size back to default.
    – SAVE
    – Restart Plex one more time.

What this does for you:

  1. Disk I/O efficiency will be increased 8 fold. (you’re not reading partial cylinders – 34KB at a time. You’ll get a far better use of RAM cache and physical disk cache (the cost of physical disk read is expensive – time)

  2. ZFS is Copy-On-Write based. It’s already optimized for 128 KB.

  3. Setting the Plex database to a 65536 record size makes all physical I/O to the DB a multiple of the record size (a perfect half… which is the SQLite limit … and a huge performance boost on ZFS)

Hi-

I created a new dataset at 128kb

sudo zfs create -o recordsize=128K dob/plex_main

transferred all data to the new dataset

sudo rsync -av --progress /mnt/dob/plexdata/ /mnt/dob/plex_main/

added DBREPAIR_PAGESIZE=65536 to DBRepair.sh and ran auto

Started plex

Reset db cache size to 40

restarted plex

I will update after it runs for a while. Thanks for the help!

To confirm,

Looking in your DBRepair log, you should see where it tells you the record size it’s setting.

Setting Plex SQLite page size ($DbPageSize)

yes I see it

[2024-05-22 00.29.54] Automatic Check,Repair,Index started.
[2024-05-22 00.29.54]
[2024-05-22 00.29.54] Checking the PMS databases
[2024-05-22 00.30.29] Check complete. PMS main database is OK.
[2024-05-22 00.30.34] Check complete. PMS blobs database is OK.
[2024-05-22 00.30.34]
[2024-05-22 00.30.34] Exporting current databases using timestamp: 2024-05-22_00.29.53
[2024-05-22 00.30.34] Exporting Main DB
[2024-05-22 00.30.44] Exporting Blobs DB
[2024-05-22 00.34.10] Successfully exported the main and blobs databases. Proceeding to import into new databases.
[2024-05-22 00.34.10] Importing Main DB.
[2024-05-22 00.34.10] Setting Plex SQLite page size (65536)
[2024-05-22 00.34.35] Importing Blobs DB.
[2024-05-22 00.34.35] Setting Plex SQLite page size (65536)
[2024-05-22 00.34.56] Successfully imported databases.
[2024-05-22 00.34.56] Verifying databases integrity after importing.
[2024-05-22 00.35.30] Verification complete. PMS main database is OK.
[2024-05-22 00.35.31] Verification complete. PMS blobs database is OK.
[2024-05-22 00.35.31] Saving current databases with ‘-BACKUP-2024-05-22_00.29.53’
[2024-05-22 00.35.31] Making repaired databases active
[2024-05-22 00.35.31] Repair complete. Please check your library settings and contents for completeness.
[2024-05-22 00.35.31] Recommend: Scan Files and Refresh all metadata for each library section.
[2024-05-22 00.35.31]
[2024-05-22 00.35.31] Backing up of databases
[2024-05-22 00.35.31] Backup current databases with ‘-BACKUP-2024-05-22_00.35.31’ timestamp.
[2024-05-22 00.35.32] Reindexing main database
[2024-05-22 00.35.40] Reindexing main database successful.
[2024-05-22 00.35.40] Reindexing blobs database
[2024-05-22 00.35.41] Reindexing blobs database successful.
[2024-05-22 00.35.41] Reindex complete.
[2024-05-22 00.35.41] Automatic Check, Repair/optimize, & Index successful.

1 Like

I still see the held transaction for too long in the logs. I haven’t had more than 1 user on. I’ll attach the logs again. Everything seems to be loading quick; not sure why this keeps increasing.

Plex Media Server Logs_2024-05-22_01-52-03.zip (2.0 MB)

There is something fundamental wrong here.

Please tell me:

  1. Is the ‘dob’ pool on SSD or HDD ?
  2. If HDD, which HDDs (MFG & model number) ?
  3. zpool geometry (how many in the pool and how laid out)?
  4. Is compression enabled or disabled ?

EDIT:

Deep diving into your logs is somewhat disturbing…
PMS is pruning transcoder output blocks because it’s running out of space ??

Here, it sends the requested segments (blocks) to the player then immediately prunes (deletes) old ones from the buffer. Are you that low on space ?

Signed-in
May 21, 2024 20:10:58.648 [22405898455864] DEBUG - Request: [127.0.0.1:42142 (Loopback)] POST /video/:/transcode/session/5E22A90B-26E7-4033-BC47-47CD4ABFCA63/619ba641-0616-4ab4-84d6-ee5fac8bb366/manifest (53 live) #13e7d8 Signed-in
May 21, 2024 20:10:58.648 [22405853346616] DEBUG - [Req#13e7b7/Transcode/5E22A90B-26E7-4033-BC47-47CD4ABFCA63/619ba641-0616-4ab4-84d6-ee5fac8bb366] Transcoder segment range: 5019 - 5322 (5322)
May 21, 2024 20:10:58.648 [22405898455864] DEBUG - [Req#13e7d8/Transcode/5E22A90B-26E7-4033-BC47-47CD4ABFCA63/619ba641-0616-4ab4-84d6-ee5fac8bb366] Transcoder segment range: 5019 - 5323 (5322)
May 21, 2024 20:10:58.648 [22405853346616] DEBUG - [Req#13e7b7/Transcode/5E22A90B-26E7-4033-BC47-47CD4ABFCA63/619ba641-0616-4ab4-84d6-ee5fac8bb366] Pruning segments older than 5021, view offset is 5322.005333, min was 5019, max is 5322, last returned is 5322.005333
May 21, 2024 20:10:58.648 [22405853346616] DEBUG - [Req#13e7b7/Transcode/5E22A90B-26E7-4033-BC47-47CD4ABFCA63/619ba641-0616-4ab4-84d6-ee5fac8bb366] Pruning segment 5020
May 21, 2024 20:10:58.648 [22405853346616] DEBUG - [Req#13e7b7/Transcode/5E22A90B-26E7-4033-BC47-47CD4ABFCA63/619ba641-0616-4ab4-84d6-ee5fac8bb366] Pruning segment 5019
May 21, 2024 20:10:58.648 [22406074678072] DEBUG - Completed: [127.0.0.1:42142] 200 POST /video/:/transcode/session/5E22A90B-26E7-4033-BC47-47CD4ABFCA63/619ba641-0616-4ab4-84d6-ee5fac8bb366/manifest (53 live) #13e7d8 0ms 166 bytes

THIS is disturbing.

May 21, 2024 20:10:50.913 [22406074678072] DEBUG - Completed: [127.0.0.1:56208] 206 GET /livetv/sessions/5d83ce01-8124-4c42-8640-0b0f172d2589/257888C4-1250-4E46-8E57-98C96F93101E/index.m3u8?offset=0.033367 (52 live) #13dead 4ms 335497 bytes (pipelined: 5311) (range: bytes=0-)

There are 5311 pending requests queued up against the server.

What other programs / things are hammering on the server ?

  1. -dob pool is on 2 ssd and 2 nvme
    3.- 4 in the pool see screenshot
    • Yes compression is enabled

attached most recent logs. I’m not running out of space. Media is on a separate nas.

Plex Media Server Logs_2024-05-22_12-10-47.zip (5.0 MB)

root@OfficePlex:~# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
loop0 7:0 0 63.4M 1 loop /lib
loop1 7:1 0 338.8M 1 loop /usr
loop2 7:2 0 1000G 0 loop /var/lib/docker/btrfs
/var/lib/docker
sda 8:0 1 57.7G 0 disk
└─sda1 8:1 1 57.7G 0 part /boot
sdb 8:16 1 119.5G 0 disk
└─sdb1 8:17 1 119.5G 0 part
sdc 8:32 0 1.8T 0 disk
└─sdc1 8:33 0 1.8T 0 part
sdd 8:48 0 1.8T 0 disk
└─sdd1 8:49 0 1.8T 0 part
md1p1 9:1 0 119.5G 0 md /mnt/disk1
nvme0n1 259:0 0 1.8T 0 disk
└─nvme0n1p1 259:1 0 1.8T 0 part
nvme2n1 259:2 0 1.8T 0 disk
nvme1n1 259:3 0 1.8T 0 disk
└─nvme1n1p1 259:4 0 1.8T 0 part
root@OfficePlex:~# lshw -short -C disk
H/W path Device Class Description

/0/100/6/0/0 hwmon1 disk NVMe disk
/0/100/6/0/2 /dev/ng0n1 disk NVMe disk
/0/100/6/0/1 /dev/nvme0n1 disk 2TB NVMe disk
/0/100/17/0 /dev/sdc disk 2TB Samsung SSD 870
/0/100/17/1 /dev/sdd disk 2TB Samsung SSD 870
/0/100/1b/0/0 hwmon3 disk NVMe disk
/0/100/1b/0/2 /dev/ng1n1 disk NVMe disk
/0/100/1b/0/1 /dev/nvme1n1 disk 2TB NVMe disk
/0/100/1d.4/0/0 hwmon2 disk NVMe disk
/0/100/1d.4/0/2 /dev/ng2n1 disk NVMe disk
/0/100/1d.4/0/1 /dev/nvme2n1 disk 2TB NVMe disk
/0/8/0.0.0 /dev/sda disk 61GB Cruzer Fit
/0/8/0.0.0/0 /dev/sda disk 61GB
/0/9/0.0.0 /dev/sdb disk 128GB Flash Drive FIT
/0/9/0.0.0/0 /dev/sdb disk 128GB

Nothing else should be hammering the server. If you go to the most recent logs do you still see this?

I don’t use ZFS for anything with live data (PMS & databases count as live data to me)

At this point, because ZFS compresses On-The-Fly (one record at a time),
When it reads & writes to the DBs, the time required to (de)compress each record will add up.

Therefore, I recommend an experiment.

  1. Disable compression for the Plex metadata dataset
  2. Go back into DBRepair, and again setting the record size = 65536, optimize it again so it’s no longer compressed.
  3. Now run it…

I would like to see a full set of logs next (after it runs a while / it gets to the point of actual failure.

Failure will be when it shows “SLOW QUERY” in the server logs followed by timeout (HTTP 408)

I removed compression on the dataset and ran DBRepair again. It was still showing the same signs.

I have created another instance of plex on a new account 128 dataset and pagesize 65536 right from the start. It’s not showing the same characteristics with statisticmanager held time progressively increasing.

What would happen if I swap preferences.xml to the new plex container? And how risky is transferring watch history?

May I confirm -

  1. You created a whole new instance, reloading the media & metadata , and it’s not misbehaving ?

  2. If this is true, you can move watch history in either of two ways.

  3. If you move the Preferences.xml, you’d be moving a few of the preferences and, most importantly the server ID information. The new server would completely assume the identity the old one had.

Is that what you are thinking?

Is there a performance issue with your system? Your thread is positioned as attempting to eliminate warning messages from your logs. But what is the actual problem you’re trying to solve? Slow searches? Slow library browsing? Something else?

I only ask because the warning messages you posted all pertain to statistics gathering, which should not impact real-time performance of the system.

Hi-
Yes, this is exactly what I was thinking. Could I use your script to move the watch history? If it’s to risky I’m fine losing it. I would rather have this db running perfectly.

The warning message keeps increasing and when it gets in the double digits it starts having performance issues and eventually plex in unresponsive.

Awesome. That’s the kind of thing you probably want to mention in your initial post to ensure the proper issue is being troubleshot (troubleshooted?). Correlation is not necessarily causation and all that.

Plex spouts out all kinds of warnings (and errors) in the logs which mean absolutely nothing, other than the developers are working on some new functionality.

By the way, have you not seen any performance increase since allowing DBRepair.sh to re-index your database? That generally cures a large number of browsing and searching ills. I’d be very surprised if you were an exception, as there have been much larger databases than yours “fixed” by this simple action. Except in the cases where there was severe database damage.

But, mom always told me I was special. Maybe you are, too.

Plex Media Server Logs_2024-05-23_11-31-39.zip (5.0 MB)
Everything seemed to be running great until 10:57am plex crashed. I restarted the db and grabbed the logs