Server is deep-analysing old stuff, instead of new

Server Version#: 1.15.3.876-ad6e39743
since 1.15 plex is re deep-analysing old stuff, instead of new added media. This has been going on for days now.

Edit: Still going on, no end in sight.

The deep analysis version number has been updated because it has new information to gather. It runs in random order and will stop once it has re-analyzed all files.

Thanks for the reply.

That may sound like a good idea on paper, but in reality this makes deep-analyze unusable for me. Re-analyzing my whole library, while running 24/7, takes almost 3 weeks. Having a normal 2-3 hour window takes almost half a year. So new stuff that comes in, stuff that most people watch wont get analyzed until month later. Now imagine people having larger libaries, using cloud storage, using distributed storage, that gets even worse.
If you have a larger Library and use ssds for caching, those ssds are dead afterwards because plex decided to fully re-read a 100TB library?

This is obviously not a workable solution. Can you please fix that?

1 Like

Hey,

first off, thanks for the quick reply to this issue, and thanks for your team’s amazing work. I need to reiterate to @Orko that it’s no small feat to develop & support basically a “personal netflix” for that amount of devices, configurations, types of media etc and still … well, work, and not too bad either. I especially like the UX of the UI, it’s smooth but versatile.

But there are some decisions that I disagree with (from an admin perspective), such as the inability to use a real database.

And while I understand from an architecture perspective that leaving everything configurable is not an option (as it increases complexity and creates exponentially more test cases), a feature that requires me to reprocess the whole library RANDOMLY to get the feature working for daily new stuff is unnecessarily destructive and frustrating. You’re basically taking the feature away from me with this update.

Please, please, please, as a favour to me, who has never asked anything from you, please consider changing this to “do new stuff first” or maybe even an option of “do only new stuff”.

I don’t care if I have to do it in a config file or hack a little bit, I just want to be able to use that feature at least for new stuff.

Also, for the future, please consider that there are setups where redoing a daily process for the whole library may cost A LOT of resources, be it in time, bandwidth, ssd life, … those are situations where you basically provoke people like me into reverse engineering your code :wink:

In most other aspects, you’re doing really, really great work. Keep it up :slight_smile:

Perhaps this is of use to you: https://support.plex.tv/articles/201242707-plex-media-scanner-via-command-line/ In particular, note the command-line option --analyze-deeply

1 Like

So what you’re saying is, we should disable the routine (random) analysis and manually use the pms to deep-analyze new content? That would actually be an option.

Another question that I have now is, will the transcoder be able to use outdated results (without this new stat)?

Thank you, that is indeed of use but the --file option does not work, only the --item option does and there is no way, at least that i know of, how I can get the item ID “easily”. If there is, could you please point me in the right direction?

The support article does not state which ID it wants or how to get to that, so I dug around in the db file a bit, but I could not find documentation on that. It seems that the correct ID --item wants to get supplied with is not id or media_item_id but metadata_item_id. This is kind of confusing? Idk. Anyhow i disabled deep analyse in the settings now and wrote a small script that queries the .db file and analyses it “manually” every hour.

Could you maybe consider thinking about changing the default behaviour of plex to not reread terabytes of data randomly with hardly a way to control, interact of influence, besides disabling it completely. It basically disables that feature for larger installations completely. If I would not have caught that randomly (thank you grafana), plex 1.15 would have literally physically destroyed my caching ssd. This is not good, like really not good at all.

Also, is the old analyse data useless now? Do I need to reread all 50TB of media (impossible), or is it just “not as good as the new one”.

Two fast ways:

  1. Use Web App → Get Info on item → View XML → look at the ratingKey
  2. Use Web App → Browse to item (preplay screen) → Look at URL and find the piece like …key=%2Flibrary%2Fmetadata%2F9545&co… (metadata item id is in bold)

Unraid?

The new version gathers additional information which can reduce the number of times a playback results in a transcode. The previous information it gathered is unchanged.

I did not know that, thanks you!
But I am not sure that is scriptable, good to know in any case though.
If pms --tree could also list the proper ID, I think that could be good/help? Maybe, maybe not, idk.

No, a small Cephcluster with a cachetier.

Depending on library size, in my case at least, it would take almost half a year until new items would reliably get deep-analysed again (after replacing the then faulty ssd), idk. Do you see my point here or am I not making sense at all?

Anyhow, thanks for your time and help, I really appreciate it, fixed-ish the problem for now.

My server has a bit over 20TB and completed it in a few days with only a 3 hour window each day. I also run a full scrub (parity check) of the whole array (more than just media) twice a month and this completes in well under half a day. Using these figures and your reported size, the deep analysis could have completed on your media by now. Maybe my storage is just vastly faster than yours but I doubt that.

If this were to be an issue, then I think you must have configured your cache poorly. I see that cachetier is quite configurable so maybe some investigation there is warranted. I would not have an ssd cache data that’s been accessed only once in a time-period. The deep analysis constitutes a single linear read of the file (+ maybe one seek to read the index if its at the end of the file) and that’s it for the I/O.

No.

I am not using your array nor is anyone else. I did not make up those numbers, as you seem to imply, it’s what i calculated after observing it for 3 days. It already ran 3 days straight and only analysed a fraction of the data. So no, it would not and has not completed for me by now, no matter how fast your array is.
Also keep in mind, your zfs array is a lot different than distributed object storage. Also keep in mind people may use cloud storage, may use a (slow) nas for storage or any other kind of storage which all are vastly different than your local array and present vastly different problems. Your numbers simply do not apply to me or anyone else or any other kind of storage solution.
Also your array seems to be fast. On a server I use (not the one in question) with a 18TB zfs array, scrub takes 43h33min on a raidz1.

Your 20TB scrubbing in under half a day is a throughput of roughly 485MB/s at minimum, assuming 12h runtime. This seems very, very fast. You seem to have some top notch HW there. Congratz.
Numbers vary widely, even with similar technology (sw wise at least) used.
Object storage is a whole new ballgame.

That’s a valid point, configuration could have been (and maybe should have been) done differently. Fair enough.
Bur for 2 years this did not present a problem and worked very well for my userbase. Assuming i can plan for all future poor plex decisions is a big ask. We are dangerously close to the “steve jobs telling his customers they are holding their iphone wrong” territory here.

I know this may come of a bit antagonistic here but thats not my intention. I do appreciate all the work and time you guys put into plex and from a users standpoint plex has been mostly a fantastic experience, especially the UI which I vastly prefer to netflix’s for example.

From an admins perspective I wished I could control it more. Rereading TBs of data without permission is just not ok. It just assumes too many things about the storage, resources, limitations or hw, which may be true for a majority, but may also just be horribly wrong for others and may result in actual damage here.

Please reconsider, especially for future decisions.

It sounds like my array truly is vastly faster than your storage. It’s just your quoted size (50TB in one post, 100TB in another) divided by 3 hours per day over half a year seemed too slow to me. I have had (non-ssd) laptop drives from a decade ago that were faster, but if your storage isn’t local then it would make sense. Typically we’ve seen those using cloud storage completely disable deep analysis and not even let it run once. You had let it run entirely before which is why I didn’t entertain the idea that it may be remote storage.

My hardware isn’t top-notch by any stretch. ZFS on Linux has made advances recently and the pool was recently created to get those advantages. My previous array which didn’t have those advantages (and used drives that were a bit slower) took ~20h to scrub. Note: the computer, HBA, and other hardware didn’t change in this comparison, only the drives and the new drives are no where close to 2x the speed of the old. I mentioned scrub because it was a number I had on-hand and scrubs are slower than reading the entire file contents of the array (because reads don’t need to check all the parity and redundant information but scrubs do). It provided a lower-bound on read-speed.

If your non-cached media speed is truly this slow, perhaps you’d be served well by a spinning drive cache and use the SSD only for “hot” data. A single spinning consumer-level drive appears to greatly exceed your primary storage speed.

I believe I’ve told you how to control it more. As per what’s OK, while you may not think this is OK from your perspective, prompting is a nuisance and not OK to several orders of magnitude more users. You have a very unusual setup and when you do that you must keep in mind your expectations/desires or going to sometimes be in direct conflict with the vast majority.

It’s distributed object storage inside the lan. To very oversimplify it, data is divided in chunks and distributed on servers on the local cluster. It has its advantages, some disadvantages but well, it is what it is. In times of huge 4k movies, cloud storage and fast networks, not having everything local inside the machine plex runs on is not that unreasonable to assume, i think? Just having a simple NAS solution with gbit attached, which I think is fairly common?, and a 2-3 hour window would push the upgrade time well into weeks/month territory, depending on size, obviously.

This could be handled via a checkbox in the “scheduled tasks” section, there is already something similar “upgrade media analysis during maintenance” which, confusingly, does not impact the deep analysis upgrade.

Anyhow, I think we need to agree to disagree here.
I thank you for your time, patience and advice.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.