Plex Media Scanner gets stuck forever at 100% CPU when scanning a particular file

Server Version#: 1.18.7.2438 (Ubuntu Server 18.04)

I’ve been noticing that my plex server has pegged one CPU core at 100% pretty much all the time. Digging in, I find the offending process is called Plex Media Scanner, and digging further still, the process is stuck reading from a specific mkv. The command line included --generate and --chapter-thumbs-only, so presumably it was trying to generate chapter thumbnails and got stuck.

I manually killed that process, and then watched plex spin up new Plex Media Scanners that went through a number of other movies without issue. Taking a stab in the dark, I tried remuxing the offending mkv, and swapping the remuxed mkv into my media library, then told the plex UI to “Scan Library Files”. The Scanner got stuck on that file again. This time the command line was --scan --refresh --section 1 --activity 7f87d28a-7abb-4ad0-9f0f-9d8eda1e468f.

I did an strace on the running process, and it spit out a stream of read(...)s, all of which used the fd that corresponds to the mkv file, so it seemed like it was actually reading data from the file, but it never stopped reading. Network traffic to the NAS where the media lives confirms this, it just read the file for literally days.

And the offending mkv file seems to play back fine, both before and after I remuxed it. Anyone know what I should try next?

If you believe you have hardware problems

No, wait, I was right the first time. The file on my NAS plays fine, I tried copying it locally, and that file is corrupt. I think it is a problem with the local drive.

Please check further and either post & mark as solved or provide additional information. The information provided above does not give nearly enough to assist in troubleshooting.

Sorry, I had an unfortunate coincidence last night, where my desktop’s secondary SSD started returning bad data from file reads. No errors, just successful reads but where the returned data was garbage. After rebooting, the entire drive read as garbage (Windows didn’t mount it, and reported the physical drive as uninitialized). I powered down and went to bed, and now this morning the drive is back with all its contents like last night never happened. So diving into that and deciding if I need to replace the drive will be a fun project for today…

So I deleted the posts I made about corrupt files, as that is not the case. I’m copying the file down from the NAS now to verify its state, but just playing it firstly from the NAS in VLC seemed fine.

I confirmed the file is fine (plays in VLC). I just now re-muxed it into an mkv that is just the video stream, and nothing else, and again confirmed I could play that in VLC, and then put that in my NAS so plex would see, and am now running a Library Scan.

The video-only remux I added to my plex library takes a long time to play back in the plex web client. I am consistently getting a very long delay between clicking play and when the movie actually starts. Testing other movies in my library, so far this seems to be the only one with this delay (others start within a second of clicking play).

During the delay, the web client is just showing me a full-screen dimmed poster for the movie with a spinner over it.

During the delay, the server starts up a Transcoder process which runs for ~20 seconds, then goes away. About a minute to a minute and a half after that, the movie finally starts playing back as a second Transcoder process starts up.

Meanwhile the scanner process is still running in the background, still with the same pid.

Looking in the Plex Media Scanner.log file, I see some FFMPEG errors:

Feb 23, 2020 14:20:46.103 [0x7f8b7339f700] INFO - Plex Media Scanner v1.18.7.2438-f342a5a43 - Ubuntu PC x86_64 - build: linux-x86_64 - GMT 00:00
Feb 23, 2020 14:20:46.103 [0x7f8b7339f700] INFO - Linux version: 18.04.4 LTS (Bionic Beaver), language: en-US
Feb 23, 2020 14:20:46.103 [0x7f8b7339f700] INFO - Processor AMD EPYC Processor (with IBPB)�☺�☺�☺�☺@☺☻@@☺☻@(0
Feb 23, 2020 14:20:46.104 [0x7f8b7339f700] INFO - /usr/lib/plexmediaserver/Plex Media Scanner --scan --refresh --section 1 --activity b11ccd2c-ddb4-49d2-9f4a-c853837d170d
Feb 23, 2020 14:22:07.545 [0x7f8b68c24700] ERROR - [FFMPEG] - cabac decode of qscale diff failed at 25 12
Feb 23, 2020 14:22:07.545 [0x7f8b68c24700] ERROR - [FFMPEG] - error while decoding MB 25 12, bytestream 32531
Feb 23, 2020 14:22:07.549 [0x7f8b68c24700] INFO - [FFMPEG] - concealing 904 DC, 904 AC, 904 MV errors in P frame
Feb 23, 2020 14:24:43.735 [0x7f8b68c24700] ERROR - [FFMPEG] - error while decoding MB 57 34, bytestream -32
Feb 23, 2020 14:24:43.736 [0x7f8b68c24700] INFO - [FFMPEG] - concealing 880 DC, 880 AC, 880 MV errors in P frame
Feb 23, 2020 14:25:15.673 [0x7f8b68c24700] ERROR - [FFMPEG] - left block unavailable for requested intra4x4 mode -1
Feb 23, 2020 14:25:15.673 [0x7f8b68c24700] ERROR - [FFMPEG] - error while decoding MB 0 22, bytestream 3140
Feb 23, 2020 14:25:15.676 [0x7f8b68c24700] INFO - [FFMPEG] - concealing 129 DC, 129 AC, 129 MV errors in P frame
Feb 23, 2020 14:29:49.530 [0x7f8b68c24700] ERROR - [FFMPEG] - Reference 3 >= 3
Feb 23, 2020 14:29:49.530 [0x7f8b68c24700] ERROR - [FFMPEG] - error while decoding MB 40 27, bytestream 28372
Feb 23, 2020 14:29:49.532 [0x7f8b68c24700] INFO - [FFMPEG] - concealing 569 DC, 569 AC, 569 MV errors in P frame
Feb 23, 2020 14:30:09.928 [0x7f8b68c24700] ERROR - [MI] Read to end of file without finding a keyframe
Feb 23, 2020 14:30:09.962 [0x7f8b68c24700] ERROR - [FFMPEG] - mmco: unref short failure
Feb 23, 2020 14:30:09.962 [0x7f8b68c24700] ERROR - [FFMPEG] - mmco: unref short failure
Feb 23, 2020 14:31:18.880 [0x7f8b68c24700] ERROR - [FFMPEG] - cabac decode of qscale diff failed at 25 12
Feb 23, 2020 14:31:18.880 [0x7f8b68c24700] ERROR - [FFMPEG] - error while decoding MB 25 12, bytestream 32531
Feb 23, 2020 14:31:18.884 [0x7f8b68c24700] INFO - [FFMPEG] - concealing 904 DC, 904 AC, 904 MV errors in P frame
Feb 23, 2020 14:33:59.632 [0x7f8b68c24700] ERROR - [FFMPEG] - error while decoding MB 57 34, bytestream -32
Feb 23, 2020 14:33:59.634 [0x7f8b68c24700] INFO - [FFMPEG] - concealing 880 DC, 880 AC, 880 MV errors in P frame
Feb 23, 2020 14:34:32.297 [0x7f8b68c24700] ERROR - [FFMPEG] - left block unavailable for requested intra4x4 mode -1
Feb 23, 2020 14:34:32.297 [0x7f8b68c24700] ERROR - [FFMPEG] - error while decoding MB 0 22, bytestream 3140
Feb 23, 2020 14:34:32.301 [0x7f8b68c24700] INFO - [FFMPEG] - concealing 129 DC, 129 AC, 129 MV errors in P frame
Feb 23, 2020 14:39:09.762 [0x7f8b68c24700] ERROR - [FFMPEG] - Reference 3 >= 3
Feb 23, 2020 14:39:09.762 [0x7f8b68c24700] ERROR - [FFMPEG] - error while decoding MB 40 27, bytestream 28372
Feb 23, 2020 14:39:09.764 [0x7f8b68c24700] INFO - [FFMPEG] - concealing 569 DC, 569 AC, 569 MV errors in P frame
Feb 23, 2020 14:39:29.832 [0x7f8b68c24700] ERROR - [MI] Read to end of file without finding a keyframe
Feb 23, 2020 14:39:29.908 [0x7f8b68c24700] ERROR - [FFMPEG] - mmco: unref short failure
Feb 23, 2020 14:39:29.908 [0x7f8b68c24700] ERROR - [FFMPEG] - number of reference frames (0+6) exceeds max (5; probably corrupt input), discarding one
Feb 23, 2020 14:39:29.920 [0x7f8b68c24700] ERROR - [FFMPEG] - mmco: unref short failure
Feb 23, 2020 14:40:29.721 [0x7f8b68c24700] ERROR - [FFMPEG] - cabac decode of qscale diff failed at 25 12
Feb 23, 2020 14:40:29.721 [0x7f8b68c24700] ERROR - [FFMPEG] - error while decoding MB 25 12, bytestream 32531
Feb 23, 2020 14:40:29.725 [0x7f8b68c24700] INFO - [FFMPEG] - concealing 904 DC, 904 AC, 904 MV errors in P frame
Feb 23, 2020 14:43:11.015 [0x7f8b68c24700] ERROR - [FFMPEG] - error while decoding MB 57 34, bytestream -32
Feb 23, 2020 14:43:11.016 [0x7f8b68c24700] INFO - [FFMPEG] - concealing 880 DC, 880 AC, 880 MV errors in P frame
Feb 23, 2020 14:43:42.964 [0x7f8b68c24700] ERROR - [FFMPEG] - left block unavailable for requested intra4x4 mode -1
Feb 23, 2020 14:43:42.964 [0x7f8b68c24700] ERROR - [FFMPEG] - error while decoding MB 0 22, bytestream 3140
Feb 23, 2020 14:43:42.968 [0x7f8b68c24700] INFO - [FFMPEG] - concealing 129 DC, 129 AC, 129 MV errors in P frame
Feb 23, 2020 14:48:14.395 [0x7f8b68c24700] ERROR - [FFMPEG] - Reference 3 >= 3
Feb 23, 2020 14:48:14.395 [0x7f8b68c24700] ERROR - [FFMPEG] - error while decoding MB 40 27, bytestream 28372
Feb 23, 2020 14:48:14.397 [0x7f8b68c24700] INFO - [FFMPEG] - concealing 569 DC, 569 AC, 569 MV errors in P frame
Feb 23, 2020 14:48:34.522 [0x7f8b68c24700] ERROR - [MI] Read to end of file without finding a keyframe
Feb 23, 2020 14:49:56.672 [0x7f8b68c24700] ERROR - [FFMPEG] - cabac decode of qscale diff failed at 25 12
Feb 23, 2020 14:49:56.672 [0x7f8b68c24700] ERROR - [FFMPEG] - error while decoding MB 25 12, bytestream 32531
Feb 23, 2020 14:49:56.676 [0x7f8b68c24700] INFO - [FFMPEG] - concealing 904 DC, 904 AC, 904 MV errors in P frame

VLC is a completely different engine not based on FFMPEG

What is this? You’re manually adding a remux step youself?

Unfortunately,

  1. You turned off DEBUG logging (which doesn’t save you any disk space. It’s 50 MB regardless) and now I can’t see what it was doing
  2. I can’t diagnose from a snippet. I need the ZIP but I can tell you
Feb 23, 2020 14:39:29.908 [0x7f8b68c24700] ERROR - [FFMPEG] - mmco: unref short failure
Feb 23, 2020 14:39:29.908 [0x7f8b68c24700] ERROR - [FFMPEG] - number of reference frames (0+6) exceeds max (5; probably corrupt input), discarding one
Feb 23, 2020 14:39:29.920 [0x7f8b68c24700] ERROR - [FFMPEG] - mmco: unref short failure

is FFMPEG telling you the file is bad . Notice it is telling you what it doesn’t like about the data. VLC is more accommodating because it’s a different engine/playback mechanism.

Yeah, I was exploring the assumption that the problem was something wrong about that specific file, but the mkv container seemed fine, since vlc and mkvtoolnix had no problem parsing it, so maybe one of the streams was bad. This file has 35 audio streams, so removing them seemed like it could simplify things, and if it worked fine with no audio, then I could try making files with each audio track until I found the offender(s). But the video-only file had the original problem.

I’ll go turn debug logging back on.

Remember, mkvtoolnix doesn’t look at the encoding whereas PMS is looking directly at that encoding.

MKVtoolnix (mkvmerge) only cares about the containerization (MKV)

Folks often confuse video container and video encoding/format.

MKV / AVI / MP4 are containers (files).

HEVC, H.264, MPEG2, LPCM, AC-3, DTS-HD are the encodings we put in them

Cool.

So here’s what I did:

  1. enable debug logging
  2. copy the original mkv that was causing the problems back into an appropriate folder that plex would scan
  3. went into the plex web client and told the appropriate library to Scan Library Files
  4. waited 5 minutes (the scan was still running, but my logs had already grown by ~30MB)
  5. Settings -> Manager -> Troubleshotting -> Download Logs

I then did a bunch of find/replaces to anonymize the logs a bit. I’ll DM them to you.

Is that installation in a VM or something limiting the CPU?

It has terrible performance.

Feb 22, 2020 22:42:43.271 [0x7f9899ffb700] INFO - Sync: downloaded 1 sync list(s) with 3 sync items(s): 0 new, 0 updated, 0 deleted
Feb 22, 2020 22:42:45.773 [0x7f989b7fe700] ERROR - getaddrinfo(x-x-x-x.abcdefghijklmnopqrstuvwxyz012345.plex.direct) failed: -5
Feb 22, 2020 22:42:52.342 [0x7f98997fa700] WARN - NAT: PMP, timed out waiting for response.
Feb 22, 2020 22:42:55.344 [0x7f98997fa700] ERROR - Error issuing curl_easy_perform(handle): 28
Feb 22, 2020 22:42:57.468 [0x7f9855ffb700] ERROR - Unknown metadata type: 
Feb 22, 2020 22:43:02.200 [0x7f9857fff700] ERROR - Error issuing curl_easy_perform(handle): 28
Feb 22, 2020 22:43:02.359 [0x7f988bfff700] ERROR - Unknown metadata type: 
Feb 22, 2020 22:43:32.036 [0x7f98577fe700] ERROR - Unknown metadata type: folder
Feb 22, 2020 22:43:35.116 [0x7f98577fe700] ERROR - Unknown metadata type: folder
Feb 22, 2020 22:44:30.844 [0x7f9855ffb700] WARN - SLOW QUERY: It took 210.000000 ms to retrieve 6 items.
Feb 22, 2020 22:44:31.481 [0x7f98577fe700] WARN - SLOW QUERY: It took 520.000000 ms to retrieve 6 items.
Feb 22, 2020 22:44:40.581 [0x7f9899ffb700] WARN - Held transaction for too long (../Sync/SyncItemGenerator.cpp:153): 0.160000 seconds
Feb 22, 2020 22:44:50.495 [0x7f98567fc700] WARN - SLOW QUERY: It took 390.000000 ms to retrieve 6 items.
Feb 22, 2020 22:46:02.348 [0x7f98577fe700] ERROR - Unknown metadata type: folder
Feb 22, 2020 22:47:29.764 [0x7f988b7fe700] WARN - Held transaction for too long (../Sync/SyncItemGenerator.cpp:153): 0.230000 seconds
Feb 22, 2020 22:47:30.427 [0x7f988b7fe700] WARN - Held transaction for too long (../Sync/SyncItemGenerator.cpp:153): 0.130000 seconds
Feb 22, 2020 23:08:21.660 [0x7f9854ff9700] WARN - Held transaction for too long (../Sync/SyncItemGenerator.cpp:153): 0.150000 seconds
Feb 22, 2020 23:08:22.248 [0x7f9854ff9700] WARN - Held transaction for too long (../Sync/SyncItemGenerator.cpp:153): 0.120000 seconds
Feb 22, 2020 23:08:59.889 [0x7f989bfff700] ERROR - Unknown metadata type: folder
Feb 22, 2020 23:09:15.612 [0x7f986cff9700] WARN - Range could not be satisfied 708 - 0 (total size=-1)
Feb 22, 2020 23:10:13.565 [0x7f9855ffb700] WARN - Held transaction for too long (../Sync/SyncItemGenerator.cpp:153): 0.220000 seconds
Feb 22, 2020 23:10:14.486 [0x7f9855ffb700] WARN - Held transaction for too long (../Sync/SyncItemGenerator.cpp:153): 0.170000 seconds
Feb 22, 2020 23:12:50.353 [0x7f98a3ce8700] WARN - JobManager: Could not find job for handle 15694
Feb 22, 2020 23:19:10.540 [0x7f9899ffb700] WARN - SLOW QUERY: It took 751590.000000 ms to retrieve 1 items.
Feb 22, 2020 23:19:10.585 [0x7f9854ff9700] ERROR - Unknown metadata type: folder
Feb 22, 2020 23:19:12.469 [0x7f9899ffb700] WARN - Range could not be satisfied 708 - 0 (total size=-1)
Feb 22, 2020 23:20:13.284 [0x7f9835ffb700] ERROR - Unknown metadata type: folder
Feb 22, 2020 23:20:13.306 [0x7f989affd700] ERROR - [TranscodeOutputStream] Session appears to have died from under us
Feb 22, 2020 23:20:15.265 [0x7f9854ff9700] WARN - Range could not be satisfied 708 - 0 (total size=-1)
Feb 22, 2020 23:20:24.192 [0x7f989affd700] ERROR - [TranscodeOutputStream] Session appears to have died from under us
Feb 22, 2020 23:20:24.196 [0x7f986cff9700] ERROR - Unknown metadata type: folder
Feb 22, 2020 23:20:25.287 [0x7f98997fa700] WARN - SLOW QUERY: It took 2300.000000 ms to retrieve 5 items.
Feb 22, 2020 23:20:26.209 [0x7f989a7fc700] WARN - Range could not be satisfied 708 - 0 (total size=-1)
Feb 22, 2020 23:42:41.294 [0x7f9898ff9700] ERROR - getaddrinfo(x-x-x-x.abcdefghijklmnopqrstuvwxyz012345.plex.direct) failed: -5
Feb 22, 2020 23:42:51.283 [0x7f988a7fc700] ERROR - Butler chapter thumbnails job completed abnormally. (-9)
Feb 22, 2020 23:42:53.588 [0x7f9898ff9700] WARN - NAT: PMP, timed out waiting for response.
Feb 22, 2020 23:42:56.590 [0x7f9898ff9700] ERROR - Error issuing curl_easy_perform(handle): 28
  1. Database is fragmented
  2. CPU is timing out performing routine operations.

DEBUG logging was disabled. I NEED it enabled. VERBOSE must remain off.

Can you be more specific about why the performance is terrible? I don’t entirely know what I’m looking at with that log snippet you quoted.

To address your questions:

a) It is in a VM, but nothing should be limiting the CPU. The host is a local machine that I control, and looking at the metrics from that time last night, only this VM was registering any activity at the time period covered by the log snippet you quoted above.

b) Fragmented? Interesting. What in the logs indicated that? And does the “Optimize Database” button in Settings -> Manage -> Troubleshooting fix that? I clicked it just now, and it returned almost immediately with “Optimize database complete”.

c) Debug logging was disabled for the time period in the snippet you quoted above. I only enabled it today when you asked me to. From my limited understand of these log files, it looks like it kicked in today (2020-02-23) around 19:28. I then reproduced the bug, which you can see in the suddenly very verbose Plex Media Scanner*.log files.

This line in Plex Media Scanner.3.log in particular looks like where it hit the offending file while scanning:

Feb 23, 2020 19:29:52.052 [0x7efd87321700] DEBUG - [MI] Opening input file: "/mnt/media/Movies/Star Wars (1977) - Despecialized Edition 2.7/Star Wars (1977) - Despecialized Edition 2.7.mkv"

After that, I see a bunch of FFMPEG output that I don’t understand. This goes on for the .2.log, .1.log and .log files, for a total of >30MB of logs in about 6 minutes.

I am looking at that log entry now.

  1. The subtitle tracks are encoded/attached wrong. PGS subtitles need a display size.
  2. The nal messages and the “Before avformat_find_stream_info()” are telling me the video is corrupt.
  3. no picture speaks for itself. Further corruption as it’s missing frame types in the required sequence.
  4. this is bad encoded / damaged
Feb 23, 2020 19:29:52.837 [0x7efd87321700] DEBUG - [FFMPEG] - nal_unit_type: 1(Coded slice of a non-IDR picture), nal_ref_idc: 0
Feb 23, 2020 19:29:52.837 [0x7efd87321700] DEBUG - [FFMPEG] - nal_unit_type: 1(Coded slice of a non-IDR picture), nal_ref_idc: 0
Feb 23, 2020 19:29:52.837 [0x7efd87321700] DEBUG - [FFMPEG] - nal_unit_type: 1(Coded slice of a non-IDR picture), nal_ref_idc: 0
Feb 23, 2020 19:29:52.837 [0x7efd87321700] DEBUG - [FFMPEG] - nal_unit_type: 1(Coded slice of a non-IDR picture), nal_ref_idc: 0
Feb 23, 2020 19:29:52.845 [0x7efd87321700] DEBUG - [FFMPEG] - nal_unit_type: 9(AUD), nal_ref_idc: 0
Feb 23, 2020 19:29:52.845 [0x7efd87321700] DEBUG - [FFMPEG] - nal_unit_type: 6(SEI), nal_ref_idc: 0
Feb 23, 2020 19:29:52.846 [0x7efd87321700] DEBUG - [FFMPEG] - nal_unit_type: 1(Coded slice of a non-IDR picture), nal_ref_idc: 2
Feb 23, 2020 19:29:52.846 [0x7efd87321700] DEBUG - [FFMPEG] - nal_unit_type: 1(Coded slice of a non-IDR picture), nal_ref_idc: 2
Feb 23, 2020 19:29:52.846 [0x7efd87321700] DEBUG - [FFMPEG] - nal_unit_type: 1(Coded slice of a non-IDR picture), nal_ref_idc: 2
Feb 23, 2020 19:29:52.846 [0x7efd87321700] DEBUG - [FFMPEG] - nal_unit_type: 1(Coded slice of a non-IDR picture), nal_ref_idc: 2
Feb 23, 2020 19:29:52.855 [0x7efd87321700] DEBUG - [FFMPEG] - nal_unit_type: 9(AUD), nal_ref_idc: 0
Feb 23, 2020 19:29:52.855 [0x7efd87321700] DEBUG - [FFMPEG] - nal_unit_type: 6(SEI), nal_ref_idc: 0
Feb 23, 2020 19:29:52.855 [0x7efd87321700] DEBUG - [FFMPEG] - nal_unit_type: 1(Coded slice of a non-IDR picture), nal_ref_idc: 1
Feb 23, 2020 19:29:52.855 [0x7efd87321700] DEBUG - [FFMPEG] - nal_unit_type: 1(Coded slice of a non-IDR picture), nal_ref_idc: 1
Feb 23, 2020 19:29:52.855 [0x7efd87321700] DEBUG - [FFMPEG] - nal_unit_type: 1(Coded slice of a non-IDR picture), nal_ref_idc: 1
Feb 23, 2020 19:29:52.855 [0x7efd87321700] DEBUG - [FFMPEG] - nal_unit_type: 1(Coded slice of a non-IDR picture), nal_ref_idc: 1
Feb 23, 2020 19:29:52.864 [0x7efd87321700] DEBUG - [FFMPEG] - nal_unit_type: 9(AUD), nal_ref_idc: 0
Feb 23, 2020 19:29:52.864 [0x7efd87321700] DEBUG - [FFMPEG] - nal_unit_type: 6(SEI), nal_ref_idc: 0
Feb 23, 2020 19:29:52.864 [0x7efd87321700] DEBUG - [FFMPEG] - nal_unit_type: 1(Coded slice of a non-IDR picture), nal_ref_idc: 0
Feb 23, 2020 19:29:52.864 [0x7efd87321700] DEBUG - [FFMPEG] - nal_unit_type: 1(Coded slice of a non-IDR picture), nal_ref_idc: 0
Feb 23, 2020 19:29:52.864 [0x7efd87321700] DEBUG - [FFMPEG] - nal_unit_type: 1(Coded slice of a non-IDR picture), nal_ref_idc: 0
Feb 23, 2020 19:29:52.864 [0x7efd87321700] DEBUG - [FFMPEG] - nal_unit_type: 1(Coded slice of a non-IDR picture), nal_ref_idc: 0
Feb 23, 2020 19:29:52.871 [0x7efd87321700] DEBUG - [FFMPEG] - nal_unit_type: 9(AUD), nal_ref_idc: 0
Feb 23, 2020 19:29:52.871 [0x7efd87321700] DEBUG - [FFMPEG] - nal_unit_type: 6(SEI), nal_ref_idc: 0

:+1: I’ll see if I can find a copy that’s less borked.

I’m fine with this as the resolution, although I feel like it would be great if plex could have just told me there was something wrong with the file, instead of quietly spinning a whole CPU for days on end. Let me know if there’s anything else I can do to file this in the appropriate place as a bug.

Thanks for your help!

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.