Database backup deadlock with active library scan/credits detection

Description:

Plex Media Server becomes unresponsive when the Butler’s scheduled database backup task runs concurrently with an active library scan or credits detection job.

Observed behavior:

The backup thread captures all database sessions (observed up to session 18), creates a 0-byte temporary backup file, and then stops making progress. No error is logged. Concurrently running
threads (scanner, credits detection, metadata refresh) block waiting on database connections held by the backup thread. The server becomes unresponsive and never recovers — no further log lines are written. The WAL file grows unbounded (observed at ~168MB) because no checkpoint can occur while the backup holds open snapshots.

The only recovery is a service restart.

Sequence of events from logs:

  1. Library scanner is actively processing media (adding/deleting metadata items, writing to DB)
  2. Butler wakes up and starts the scheduled database backup task
  3. Backup thread captures all DB sessions (grabs connection pool)
  4. Backup thread creates the -tmp backup file (0 bytes) but never begins copying pages
  5. Scanner thread attempts further DB writes but blocks waiting for a connection from the pool
  6. Credits detection and other threads also block on DB connections
  7. Log reports: N threads are waiting on db connections held by threads: [backup thread ID repeated]
  8. No further activity occurs — server is deadlocked

Expected behavior:

The backup should either wait for active scans to complete before acquiring locks, or use a non-blocking mechanism (e.g., sqlite3_backup_step() with SQLITE_BUSY retry) that doesn’t hold the
connection pool while waiting on an active writer. At minimum, there should be a timeout that aborts the backup and releases the locks rather than deadlocking permanently.

Environment:

  • Plex Media Server 1.43.0.10492-121068a07
  • Linux (Debian), ext4 filesystem
  • SQLite database in WAL mode
  • ~170MB library database

This unlikely to be a platform-specific issue.

Have you checked your system for corruption using DBRepair?

Confirming this bug — PMS 1.43.0.10492 and still present in 1.43.1.10611

I’m hitting the same issue on my setup and wanted to add another data point since it’s been a month with no fix.

Environment:

  • QNAP NAS (Linux 5.10.60), Docker
  • PMS 1.43.0.10492 when first observed, now on 1.43.1.10611
  • 514 MB library database (~2,600 movies, ~490 TV shows across 6 libraries)
  • Butler window set to 3am–6am
  • Intro/credits marker generation set to “scheduled” (runs during Butler window)

What I observed:

Starting around late March, my Plex server would become completely unresponsive overnight during the Butler window. On investigation, the WAL file had grown well past its normal size and the server was deadlocked — no log activity, no recovery without a full container restart. This happened repeatedly on nights when the Butler ran both the database backup and intro/credits marker detection concurrently.

The pattern was consistent: the backup task would start, grab database connections, and then block the scanner and marker detection threads that were also trying to write. The server would just hang — no timeout, no abort, no recovery. The only fix was to kill and restart the container, at which point the WAL would checkpoint back down to a normal size.

Workaround:

I disabled the backup task entirely via the API:

curl -X PUT "http://localhost:32400/:/prefs?X-Plex-Token=TOKEN" \
  -G --data-urlencode "ButlerTaskBackupDatabase=0"

Since disabling it on 3/27, I’ve had zero deadlocks. WAL file has stayed under 5 MB consistently. The server has been rock solid overnight.

The problem with this workaround is that now I have no automated database backups at all. For a 500+ MB database that represents years of watch history, play counts, and metadata customization, that’s a real risk. I’m doing manual backups via cron as a stopgap, but that shouldn’t be necessary.

Still present in 1.43.1.10611:

I checked the 1.43.1.10611 changelog — 28 fixes listed, none related to database backup concurrency or deadlock. I haven’t re-enabled the backup task to confirm, but nothing in the release notes suggests this was addressed.

Suggestion:

As the OP mentioned, sqlite3_backup_step() with a bounded page count per step and retry/abort logic would solve this cleanly. The backup thread shouldn’t be able to starve all other database consumers indefinitely. Even a simple timeout that aborts the backup if it can’t make progress within N seconds would prevent the deadlock.

This seems like it should be a high-priority fix — the current behavior means users have to choose between database backups and a stable server, which isn’t really a choice.