ERROR - Waited over 10 seconds for a busy database; giving up

wilpig · May 31, 2023, 4:32pm

Summary: Started seeing this database busy error a couple of weeks ago. Had a hard drive on that system crash but the db was stored on a secondary drive so ti wasn’t affected, just lost the thumbnail stores. Since getting the system back up it’s had occasional hangs and I started taking note of when they would hit and i’ve narrowed it down to the library scans.

If I start the server up and let it run with library scans disabled I tend to have no issues. I have seen as many as 16 people connected at once with no issues. Within five minutes of running a full library scan these busy database messages start to appear and the webinterface hangs as do client connections.

OS: RHEL 9.2
RAM: 16G
HD: 1.5T free ( software raid1 ext4 | Samsung SSD 870 EVO)
Server Version#: 1.32.3.7089
DB Cache size: 2048

plex plex 3.6G May 31 11:12 com.plexapp.plugins.library.blobs.db
plex plex 1.2G May 31 11:24 com.plexapp.plugins.library.db

Plex Media Server Logs_2023-05-31_11-19-02.zip (4.3 MB)

I’ve run an integrity check on both and vacuum. No errors turned up. Went ahead and did a recovery and rebuild just to make sure that everything was clean. I’ve also tried various values on the db cache size but it doesn’t seem to make a difference. Any idea @ChuckPa

ChuckPa · May 31, 2023, 4:55pm

enable the ssd management

sudo systemctl enable fstrim       # This enables monthly automatic
sudo systemctl start fstrim        # This makes it run now (system will appear to hang -- don't panic)

2.  Run my DBRepair to optimize the PMS database.   I do what PMS can't because it's running.   You will want the 'Auto' mode.


https://github.com/ChuckPa/PlexDBRepair

Intructions and examples in README.md

wilpig · May 31, 2023, 5:31pm

Enabling the fstrim.timer was a good call I hadn’t considered that since I just added the flash over the holiday. As for the autorepair, handy, and I will keep that around for sure, but it’s doing the same steps I just did. which was the integrity check, vacuum, dump, import. I did neglect to reindex, I figured since it was doing an import like that it should have already had them straight.

DBRepair.log (1.5 KB)

wilpig · May 31, 2023, 7:54pm

Had everything running fine for the last two hours. Had it scan the tv library right after I brought it back up and it had no issues.

A few minutes ago I told it to scan the movies library and the waiting on the busy database messages are back and there are corresponding messages in syslog.

May 31 14:52:13 plex sh[312667]: Sqlite3: Sleeping for 200ms to retry busy DB.
May 31 14:52:13 plex sh[312667]: Sqlite3: Sleeping for 200ms to retry busy DB.
May 31 14:52:13 plex sh[312667]: Sqlite3: Sleeping for 200ms to retry busy DB.
May 31 14:52:13 plex sh[312667]: Sqlite3: Sleeping for 200ms to retry busy DB.
May 31 14:52:13 plex sh[312667]: Sqlite3: Sleeping for 200ms to retry busy DB.

wilpig · May 31, 2023, 8:18pm

Plex Media Server Logs_2023-05-31_15-14-44.zip (3.6 MB)

Here’s a fresh set of logs. I did the scan back around 2:45 or so, and it’s now 3:15 and it has hung so many times I restarted the process despite whatever had the lock. This library isn’t that big ~6000 items. The other library has likely 10 times the item count and isn’t displaying this. It’s almost like some type of loop possibly? Is there a way to see what the thread is that gets cited when entries like these pop up?

May 31, 2023 15:10:04.765 [139690401295160] DEBUG - 7 threads are waiting on db connections held by threads: 139690338519864,139689763765048,139689784769336,139689688693560,139690074139448,139689707916088,139690346904376,139690244959032,139689710025528,139689714244408,139690281548600,139690287852344,139689716353848,139690060757816,139690237041464,139689718463288,139690267507512,139689695021880,139689705806648,139690058062648
May 31, 2023 15:10:05.689 [139690070272824] DEBUG - [Req#62cf9] 7 threads are waiting on db connections held by threads: 139690338519864,139689763765048,139689784769336,139689688693560,139690074139448,139689707916088,139690346904376,139690244959032,139689710025528,139689714244408,139690281548600,139690287852344,139689716353848,139690060757816,139690237041464,139689718463288,139690267507512,139689695021880,139689705806648,139690058062648

ChuckPa · May 31, 2023, 9:06pm

@wilpig

Auto does more than vacuum. Vacuum isn’t the complete answer.
The ‘repair’ process does two major steps:

Export (in table sorted order)
Import the sorted table to create a new physically compact and contiguous table
Generates SQLite indexes so PMS doesn’t have to

How many media items do you have indexed in Plex ? (rough count)
How much of that is music ?

I am asking because:

The size of your main DB and size of the blobs DB are big
You have an old i5 CPU (it’s 10 years old)
While an SSD extends the life of older CPUs, there are some things SSDs can’t fix (Like getting in and out of a big database quickly)
The total item count is going to reach a point where you’ll need an i7 or better.

I have i7-7700 & i7-8809 (twice the passmarks of this i5). When dealing with big databases, the Per-core / per-thread speed is critical to PMS running smoothly because many of its operations (especially scanning) grabs Exclusive locks on the DB (which makes other threads wait.

wilpig · May 31, 2023, 9:27pm

Total item count 95408, no music. I’ve been looking at upgrading the CPU/board/memory on that and just haven’t dropped the cash on it yet. Is there any general idea on passmark vs index size?

What is part of the scanning process that is the trigger on that condition? If it were just the item count I’d expect it to hit when I scan any of the other content stores as well, but it only does it with the specific one. Once it gets in that state it doesn’t seem to be able to get out. Whatever task is hung clears and it jerks to life for a few minutes then it’s back into the pit. The primary plexmediaserver pid also jumps to ~350% usage as it consumes all four cores.

ChuckPa · May 31, 2023, 10:02pm

Interesting.

I can load up a 340,000 item count DB, run the tool, and there are no issues.
(there are rarely issues anyway unless I deliberately try to break the DB)

I wonder if your indexes are in need of rebuilding and/or you really do need the full export/import treament. (export/import also drops ALL the dead records which SQLite would otherwise have to step through on its internal table row linked list)

THIS is simply unacceptable. Not even an i5, with this item count, should do hit 3.5 full cores unless the DB is roasted and in serious need of attention.

wilpig · May 31, 2023, 10:25pm

I did the fully export/import before I ran your autofix, and then it did it again as part of that function.

I have two thoughts on it, I have four libraries set up. only this library is showing this weirdness. The others don’t have collections enabled, where this one does. Is it possible that the collections portion is trying to reference something that got nuked when I lost the primary drive last week for the preview files, poster art, something? I’m not sure how it references the preview thumbnails on disk I just assumed it stores some type of ID in the database, but I guess it could be calculating the ID on the fly and then checking the disk to see if there is anything there. Would disabling collections and nuking all the tags to remove possibly help here or just muddy the waters?

The second thought is the cpu usage could be tied to HDR content with this latest release. It’s all showing HW but right now with 3 active clients it should not be showing a load of 13!!!
Screenshot 2023-05-31 at 5.21.33 PM
Screenshot 2023-05-31 at 5.22.22 PM

ChuckPa · May 31, 2023, 10:50pm

How much memory is in that thing ???

Kswap should NOT be taking time. (you’re swapping – VERY VERY BAD)

Why are you running memory compaction ?
That’ll chew the HELL out of your CPU.

wilpig · May 31, 2023, 11:19pm

You again are correct that setting was different. Looks like Red Hat changed the default from 0 in RHEL8 to 20, stealthily, for RHEL9.

grep . /proc/sys/vm/compact*

/proc/sys/vm/compaction_proactiveness:20
/proc/sys/vm/compact_unevictable_allowed:1

I’ve dropped that back to 0. It has 16G which should be plenty for this, it doesn’t run anything else it’s dedicated to its job. The OS is likely consuming all the rest as buffer pulling the content from the network.

All is stable at the moment, I’ll monitor more tonight and report back tomorrow.

ChuckPa · May 31, 2023, 11:31pm

You should probably consider a permanent sysctl.conf override.

wilpig · May 31, 2023, 11:50pm

We’re on the same page.

# cat /etc/sysctl.d/40-plex-lag.conf
vm.compaction_proactiveness=0

wilpig · June 3, 2023, 1:48am

It does appear to be running better but definitely still has a gremlin in the system. I’m currently remote and was watching a movie with family and there were 6 or 7 streams in progress at the time. The movie hung and I checked the system and the usage went from ~10% to 100% again and it started murdering processes according the server log.

Plex Media Server Logs_2023-06-02_20-40-32.zip (4.7 MB)

I’ve gone ahead and ordered parts that will be in next week after I get back from this trip, but this definitely feels like some internal process is going haywire. Based on sar it is happening every 6 hours, and the library scan is set for 6 hours currently.

What is there in the library scan process that could be a trigger other than some major database issue that isn’t getting solved by an check → vacuum → export → import → reindex cycle?
Is there a table that doesn’t get its indexes checked? something that is used an external key that isn’t linked specifically in the schema?
Is there any possibility that something has gotten nested via a collection that is causing some type of loop?

ChuckPa · June 3, 2023, 2:08am

Which GPU are you using?

if you’re running 6 transcodes, at about 680-800 MB each on the GPU, that’s pushing the typical 5GB memory limit on a P2200.

It’s also possible, given the audio conversions I’m seeing, you’re running out of CPU for that. (Dolby Eac3) the Dolby converters are not CPU friendly.

wilpig · June 3, 2023, 1:11pm

P2000. That’s a good chunk of memory per transcode, I don’t typically see that kind of usage.

Here’s the last 35 days of usage out of the card. 16 has been the max i’ve personally seen at any given time before exhausting the memory, but typically when I look there are ~8 jobs going simultaneously. That card has been a real performer.

ChuckPa · June 3, 2023, 5:43pm

Thanks for this info.
Clearly I must be somewhat excessive (?) with my source files?
I use HEVC HDR rips from MakeMKV almost exclusively.

We can put the card to rest for you.

That only leaves the question of: "Are you running out of CPU with Dolby / audio ? "

wilpig · June 3, 2023, 10:51pm

I’m sure at times I definitely am, but I don’t think that is the root of this. I decided to abuse it heavily this morning after I poked at the stats for the last 24 hours. I made some observations.

1.30.1.6497 - can’t downgrade to here because metadata_items schema has changed
1.30.2.6563 - scanning problem gone, hevc decoding problem present
1.31.0.6654 - same hevc problem
1.31.1.6716 - hevc problem gone <–winner, winner, chicken dinner? saw some locking but doesn’t seem to be as bad
1.31.1.6733 - monitoring

Poor database got smacked around and props to the devs that have added in the upgrade/downgrade bits that run. Currently monitoring and just over 20 minutes now without any slow database messages. I’ve hit it with direct plays, hevc transcodes, hevc hdr transcodes, .264 transcodes as well. Just after the scans finished the Scanner task jumped up to 300% usage but that was to be expected. I have not seen the Plex Media Server process get higher than 200% where before it was getting to 400% until it would reset all the processes on its own. It still really feels to me like something changed going into the 1.32.x builds that I might be hitting an edge case that wasn’t considered? Slow DB messages started showing at 28 minutes while I was still editing.

I’m doing family things and not able to watch this as closely as I’d like but I’ll try to check tonight when the traffic usually gets heavy and see if the audio encoder tasks are fighting to be top banana. What all schema changes were in the 1.31 → 1.32 that the main plex media server process are using?

wilpig · June 9, 2023, 8:41pm

Finally back home. Current state of the environment.

Plex: 1.32.4.7164
OS: RHEL9.2
RAM: 64GB
CPU: Ryzen 7 5800G
Hardware assist: Quadro P2000

New hardware is a screaming banshee.

Jun 09, 2023 15:34:07.910 [140186319350584] ERROR - [Req#7f55] Waited over 10 seconds for a busy database; giving up.
Jun 09, 2023 15:34:08.127 [140186308537144] ERROR - [Req#7f8e] Waited over 10 seconds for a busy database; giving up.

I have not rebuilt the DB again since replacing the hardware, everything is current. Either there is something in this new build causing database hangs with large libraries that i’m hitting or there might be some type of db corruption that wasn’t fixed with the previous dump → import → reindex cycle before. What can I do to get the best info to you all to try and isolate what’s up? I can say that it is 100% triggered by a library scan. When the server comes up it’s solid and quick to respond. After I manually trigger a library scan I get the database busy problem.

wilpig · June 10, 2023, 1:15pm

Solved?

Been watching this closely and noticed that it was only getting hangs when scanning one section that reported ~6500(section a) items while it was also scanning another section that lists ~90000(section b) items was having no issues whatsoever.

section a had ~1000 collections many of which were just single items so I cleared any that were less than 2 items and the database hanging went from around 10 minutes on average to 4-5. I left it sit overnight and didn’t get any messages from anyone that it hung up and didn’t see anything in my logs showing any serious issues but it was still hanging upon scanning this section. I kicked off a manual scan and noticed these two lines about directories having files but differing entries in the database. I haven’t changed any content for those to be off.

Each time I kicked off a manual scan those numbers were moving around. I looked at the source for that library and it has 5 directories that it is looking at. Four of them are flat, no subdirectories just files the fifth is for optimized versions that plex creates. After looking at the content in that directory I removed all the “optimized” versions from the “optimized versions” menu then verified that directory was empty on disk. All of the files had been removed but there were a few directories left over and a symlink.

4.0K lrwxrwxrwx. 1 plex plex   52 Dec  8  2021  mobile.versions -> '/media/transcode/Plex Versions/Optimized for Mobile/'
   0 drwxrwxrwx. 1 plex plex    0 May 14 00:50  plex.metadata
   0 drwxr-xr-x. 1 plex plex    0 Jun 10 07:46 'Plex Versions'

Just to be clean I cleared all of this then ran a new scan. There have been no busy database messages since. Those two lines about the directories with differing item counts have not shown up again. I can kick off a scan on this library now and it finishes within 60 seconds, have done so repeatedly and it appears to be good now. I did not create any of this directory structure but it has been in place over several years and many MANY plex versions.

I’ll give it a couple of days to be certain but it does appear to be good, finally.

tl;dr; It appears as though the content scanner was getting in a bad state due to possibly iterating over a symlink that was creating extra content entries.

Topic		Replies	Views
Plex Becomes Unresponsive - DB Locked / Busy logged while unresponsive - pragma integrity_check ok Desktops & Laptops server-unraid	35	2126	June 19, 2024
Issues with my Plex server NAS & Devices server-linux	25	538	November 5, 2023
Plex Media Server: Ongoing Network Issue Desktops & Laptops server-linux	64	1474	February 21, 2019
Library scan causing playback issues Desktops & Laptops server-linux	19	91	September 30, 2025
DB locks and wait times Desktops & Laptops server-linux , server-docker	56	994	March 13, 2022

ERROR - Waited over 10 seconds for a busy database; giving up

grep . /proc/sys/vm/compact*

Related topics