How is Plex hash calculated?

Hey,

One of my drives fell and while some data is gone, I could recover most of my data successfully. However the directory structure and filenames were lost. I still have all of them indexed in Plex though, so I thought I could just iterate over the http://localhost:32400/library/metadata/<number>/tree?X-Plex-Token=redacted API and that has a file and a hash value which would be perfect for me.

However I couldn’t figure out how is this hash generated exactly. It looks like SHA1 by the size of it, but it’s probably not for the whole file. I only found this thread that mentions what it is, but if I just read the first 4096 bytes, I get a completely different hash, same like the guy in the original post.

If I could figure, that would help me a lot, I wouldn’t have to manually redo my entire library…

Thanks!

Some more research: I added some media to Plex, grabbed the hash and wrote a quick program that reads one byte at a time, then calculates the hash and there was seemingly no match… So it can’t be simply just taking SHA1… :cry:

Then to test the theory found in the other post where they claimed Plex only takes the first 4K block, so I used dd on the original file to produce a file with the same beginning, just cut off after 10 MB and the hash was different… So it can’t be just that. Because then the hash would be the same.

Try reading 4 KB from the beginning and 4k from the end. Concatenate these and compute the overall hash.

I did that when I wrote the post, but missed it out. Thanks for pointing that out. Unfortunately the hash changes every time.
I even went as far as to read 10 MB from the beginning and the end of the file and put it together, but unfortunately that also produced a different hash. So they must rely on something else too. I assume they hash the duration or the filesize (or both) or whatever as well.

There are a few steps:

  • Create a string with the filesize in decimal
  • Take the SHA1 hash of the first 64kB (65536 bytes) and append this to the string in hex.
  • If the file is greater than 64kB, do the same with the last 64kB
  • SHA1 hash this string

As an example, let’s look at a copy of Big Buck Bunny:

  • The file size is 928670754 bytes
  • The first 64kB has a SHA1 hash of 87a82ca143a5d84ba4ba33f421f25fbac9811f89
  • The last 64kB has a SHA1 hash of ce2f3dd83c1cc4ffa4deda5588a9118be004ce09
  • Take the SHA1 hash of 92867075487a82ca143a5d84ba4ba33f421f25fbac9811f89ce2f3dd83c1cc4ffa4deda5588a9118be004ce09
  • This is 782e3038c7290470c29320a840e5f92123912e56 which matches the hash column in the media_parts table if you add this exact file to it.
3 Likes

Fantastic. Words can’t express how thankful I am. Here is a simple Python script to hopefully reproduce the algorithm you mentioned: Computes the file hash just how Plex would do it. All credits for the logic go to: https://forums.plex.tv/t/how-is-plex-hash-calculated/904178/5 · GitHub

I could indeed get the correct hash for multiple files back.

PS: Seems like I was close to the solution. But I tried reading 4K and hashed the file size too… :sweat_smile:

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.