Multi-series episode matching BUG

Server Version#: 1.31.0.6654 on RHEL8 (but realistically, all of them…)
Player Version#: All

When considering TV series, Plex scanners/agents are unable or unwilling to match a shared episode to two (or more) series. Examples of this scenario are easily found in cartoons. For instance, all “Tom & Jerry” cartoons are also “MGM Cartoons” cartoons.

I have spent extensive time trying to understand how/why Plex will not allow these multi-series matches to exist. I’ve considered the following strategies to trick Plex into cooperating:

  1. Duplicate the file and change the name (and waste disk space)
  2. Create a symbolic link, where the original file is named appropriately for one series and the symbolic link name is appropriate for the other series.
  3. Create a hard link, where the original file is named appropriately for one series and the symbolic link name is appropriate for the other series.
  4. An extension of case (3). In case the Plex Agent was somehow weighing the relative directory structure name higher than the metadata being provided in .plexmatch files, I placed the original files in a neutral folder with a .plexignore at its root, then hardlinked the files to separate locations for each series, with their respective folders having appropriate .plexmatch files.

The fact that Plex won’t allow two different series to present the same episode as their own simultaneously, especially when there are two completely different and independent files involved, is infuriating. But at least it means that Plex isn’t tracking files at the inode level, which was a concern I had while failing to get results with hard links.

In the process of these experiments, I noticed that files would occasionally bounce back and forth between two series after a Refresh Metadata was run. Sometimes they’d land in in one series, sometimes the other; no obvious pattern to the behavior. I grabbed the XML for a single cartoon, once while identified from each series. In the case of the very first Tom & Jerry cartoon, I noticed at the end of the XML that guid id was the same between the two (!): <Guid id=“tvdb://101415”/>.

That seems to be the real problem. TVDB uses one unique GUID for each unique episode and then associates it to one or more series, as needed. Plex is somehow taking this GUID and treating it as universally unique, even though it isn’t. What’s more, when closely watching Plex scan files, you can see that it does briefly see the second instance and both series are momentarily “complete” but then it apparently goes through a guid duplication removal routine and things are placed back out of whack.

The bug being reported here is that the scanner should allow multiple instances of a TVDB ID, at least as long as they belong to different series. If a scanner depends on TVDB for ID’s, it is an error in logic: a TVDB episode ID alone is not sufficiently unique - the TVDB episode ID must be combined with a TVDB Series ID to establish a complete identity. ID’s not based on TVDB, obviously, would also suffice.

With this naming:

test_tv/
├── MGM Cartoons (1934)
│   └── Season 1940
│       └── MGM Cartoons (1934) - S1940E02.mkv
└── Tom and Jerry (1940)
    └── Season 1940
        └── Tom and Jerry (1940) - S1940E01.mkv

And each series configured to use TheTVDB (Aired) episode ordering, I get the correct metadata for each:

I’m curious to see a specific example of a failure scenario for you (naming and folder organization) and how you have your library and series episode ordering configured.

Also, stick with either or your (1) or (3) scenarios above; symbolic links are problematic in my experience and a .plexmatch would seem to be completely unnecessary here, assuming correct naming.

1 Like

Hi pshanew, thanks for investigating!

So right off the bat I can see that I haven’t been putting the series year in any top-level series folders. I guess I never had a problem with series mismatches to prompt me to include those. I can’t imagine that that’s where my problem is, though, since the .plexmatch files I’m using should serve the same purpose.

Adding the year to the files themselves, though, may be a different story. I figured that giving the files season and episode numbers (see below) while residing in an .plexmatch identified folder would have been sufficient.

The problem becomes very noticeable when trying to handle the content from something like the “Warner Brothers Home Entertainment Academy Awards Animation Collection - 15 Winners, 26 Nominees” 3-DVD set. To assimilate a set like this into Plex, one has to make a choice on how to distribute the files. It’s easier when all of the content on a disc belongs to a single series but in this case, there are about a half dozen series represented. For the sake of source integrity (and in trying not to managed fractured sets of files), I chose to leave all the files under a structure like this:

TV Series/
├── Academy Awards Animation Collection
│   └── Disc 1
│       └── (...)
│   └── Disc 2
│       └── MGM Cartoons
│           └── .plexmatch
│           └── MGM Cartoons S01 E186 - Puss Gets The Boot.mkv     (original)
│       └── Tom and Jerry
│           └── .plexmatch
│           └── Tom and Jerry S1940 E01 - Puss Gets The Boot.mkv    (hardlink)
│       └── Looney Tunes
│       └── Superman
│       └── Popeye The Sailor
│   └── Disc 3
│       └── (...)
├── MGM Cartoons
│   └── .plexmatch
│   └── (other stuff that is not mixed content, but still saved in original package/disc# directories)

You can see that there’s another sinister TVDB problem to contend with in there, as well: sometimes an episode appears in the “Absolute Order” list but not in the “Aired Order” list or vice versa. In the case of MGM Cartoons, I’ve gone back and forth on which way to set the series because I think I’ve found cartoons that fall in opposite camps. (I’ll have to double-check which those were.) Currently, they’re all on “Aired Order”. This first Tom and Jerry cartoon is perhaps a bad example in that respect but the other Tom and Jerry cartoons on these DVDs don’t have that problem - they just have the “can’t belong to multiple series” problem. Incidentally, how did you settle on that season and episode number for Puss Gets the Boot on the MGM-side?

Back to the directory structure: Extrapolate the organization problem to dozens and dozens of discs, some dedicated to one series and some mixed. Even the Looney Tunes Golden Collection and Platinum Collections have bonus cartoons from MGM on them, even though they’re primarily Warner Brothers. Separating files from their source discs to series folders makes it nearly impossible to figure out what file came from which package/disc. One also runs the risk of overwriting a preferred version of a file that way, too.

Most of the time, Plex doesn’t seem to have a problem with this approach and can correctly identify most, if not all, episodes/cartoons from many discs/packages. All of my non-cartoon series content is stored in package/disc# form, though, I don’t have any cross-over episodes that belong to multiple series there; it’s only been a problem in cartoon-land.

For specific failures, the above 3-DVD set contains the following cartoons:

  • 21 Looney Tunes
  • 13 Tom and Jerry
  • 6 MGM Cartoons
  • 1 Droopy
  • 1 Popeye
  • 1 Superman

Plex only picks up 6 cartoons for the MGM Cartoons series. The MGM Cartoon files were the originals and the Tom and Jerry ones were the hardlinked copies (though, strictly speaking each and every instance is a hardlink). It finds the Droopy file and associates it with the Droopy series, so that’s also failing like the Tom and Jerry files. Popeye and Superman are non-issues (no series-crossover). I’m ignoring Looney Tunes for now…they’re probably all duplicated elsewhere, anyway.

Because of this:
https://thetvdb.com/series/mgm-cartoons/seasons/official/1940

On TheTVDB, MGM Cartoons’ first episode aired in 1934; hence the first aired-year in parentheses. Likewise, that specific episode is listed in the 1940 season as episode two; hence S1940E02. These should always be named as a single field with no spaces.

Regarding your folder structure, I would not expect that to work, at least not consistently. Plex does not support a more tiered hierarchy than: Top-level TV Show folder → Show Name folder → Season folder → Episode file. Other hierarchies may seem to work perfectly well for a time, right up until it doesn’t. I suspect this is why you’re experiencing your current problem. Plex likely sees the “second” file as a duplicate.

(As an aside here, you can create whatever hierarchy you want. The important thing here is that Plex only wants the folder-level immediately above the show name folders to be added to a library; you can add multiple of these. As an example:

TV Shows/
    Sitcoms/
        Show Name/
            Season/
                Episode.ext
    Anime/
        Show Name/
            Season/
                Episode.ext

Here, you’d add both “Sitcoms” and “Anime” to your library folders, but not “TV Shows.”)

I’d recommend sticking as closely to Plex’s recommended naming guidelines for series as possible for the most frictionless experience. They should be considered best practices in my opinion.

Another resource you can use to see how you might need to organize and name your series is at watch.plex.tv. It provides a peek into their metadata backend for their modern agents/scanners. For example, here’s MGM Cartoons:

https://watch.plex.tv/show/mgm-cartoons

Oh man, I was really hoping to avoid forcing a series/season/file structure, especially since Plex has largely been cooperative with the package/disc/file or even random/random/random/file structures. The problem is that 99% of all classic cartoons are issued in nonlinear, non-series-based collections and TVDB offers best-fit years as seasons (or absolute order). In addition, the collections are rife with mastering errors and mistakes, making each one’s contribution a mix of good and bad.

Plus, like I mentioned before, forcing the series/season/file structure effective forces me to choose winners and losers - only one version of the file can go there. It’s not like I have infinite time to compare/contrast and pick the best version. Even if I did, I’d have to maintain a spreadsheet or a database to keep track of which source package/disc each file came from.

In an ideal world, Plex would let us users just save an xml or json file to any given folder or txt file named the same as every media file to provide definitive/absolute meta data. (i.e., “Don’t care what you think you found as the match, Mr. Agent, this is what it is. Period.” haha)

Regarding the S###E### convention and whether or not a space exists between them, for me that’s just an aesthetics choice coupled with a reasonable assessment that Plex is using a decent regex search string behind the scenes. I’d rather my files be more legible with the space in place, largely because Plex clients aren’t the sole consumer/player and also, naturally, a touch of OCD. I’ve not seen any evidence that Plex ever fails on the inclusion of a space. A quick count shows over 5600 episodic files.

I’m going to see if I can’t hybridize my package/disc organizational needs with Plex’s series/season needs to see if I can’t get them to play nice together. I can’t give up on sensible file management just yet!

Thanks for the tip on https://watch.plex.tv/show/mgm-cartoons as a resource - I hadn’t realized that was available. Likewise, thanks for the indirect tip that TVDB’s database is not fully reflected by their webpages - none of the TVDB webpages I viewed listed Puss Gets the Boot as S1940 E02, and I thought I made a fairly comprehensive search of it, too!

It’s linked above: MGM Cartoons - Unknown - Season 1940 - TheTVDB.com

I know - that was the first TVDB page I’ve seen that shows it. Not all browsing paths lead to that page (or to that page being constructed correctly). I definitely reviewed the 1940 MGM Cartoons page looking for that title and it didn’t exist. I suppose it’s possible that a TVDB moderator has updated the page since I first tried finding the cartoon and that it only started working recently.

Alright, I’ve taken another stab at getting this to work.

First, I isolated the Academy Awards Animation Collection to its own TV Series Library. As pshanew was able to show in their example, the multi-series membership worked. I left the Puss Gets the Boot example in place here but for the sake of brevity, I’ve excluded the Looney Tunes, Popeye, and Superman content in the tree. Here’s how it was set up:

Test1/
├──.mixed
│   └── .plexignore
│   └── Academy Awards Animation Collection
│       └── Disc 1
│           └── MGM Cartoons (1934)
│               └── .plexmatch
│           └── Tom and Jerry (1940)
│               └── .plexmatch
│       └── Disc 2
│           └── MGM Cartoons (1934)
│               └── .plexmatch
│               └── Season 1940
│                   └── MGM Cartoons (1934) S1940 E02 - Puss Gets The Boot.mkv    (hardlink)
│           └── Tom and Jerry (1940)
│               └── .plexmatch
│               └── Season 1940
│                   └── Tom and Jerry (1940) S1940 E01 - Puss Gets The Boot.mkv    (hardlink)
│       └── Disc 3
│           └── MGM Cartoons (1934)
│               └── .plexmatch
│           └── Tom and Jerry (1940)
│               └── .plexmatch

For reference, the MGM Cartoon .plexmatch contents are:

tvdbid: 241901
title: MGM Cartoons
year: 1934

and the Tom and Jerry .plexmatch contents are:

tvdbid: 72860
show: Tom And Jerry
year: 1940

Plex ignores the actual file structure containing the content, which is still saved in a package/disc sense, due to the .plexignore file. The new TV Series was called “Test 1” and the only directories added to the library were:

  1. …/library/Test1/.mixed/Academy Awards Animation Collection/Disc 1
  2. …/library/Test1/.mixed/Academy Awards Animation Collection/Disc 2
  3. …/library/Test1/.mixed/Academy Awards Animation Collection/Disc 3

The folder structure below each “Disc” directory follows the Plex-preferred hierarchy. The resulting cartoon counts that Plex picked up are as follows:

  • MGM Cartoons: 20
  • Tom and Jerry: 13
  • Droopy: 1
  • Looney Tunes: 21
  • Popeye: 1
  • Superman: 1

This aligns with what’s actually provided on the DVDs:

  • MGM Cartoons: 6
  • Tom and Jerry: 13
  • Droopy: 1
  • Looney Tunes: 21
  • Popeye: 1
  • Superman: 1

The 13 Tom and Jerry cartoons and 1 Droopy cartoon had secondary hardlinks created with MGM Cartoon file names. Clearly, Plex is correctly picking up everything so far (6 + 13 + 1 = 20).

Expanding on the experiment, I added a second package of DVDs to the mix: Tom and Jerry Spotlight Collection, Volume 1. It’s 2 discs contain 40 cartoons and 4 special features. The directory structure now looks like this:

Test1/
├──.mixed
│   └── .plexignore
│   └── Academy Awards Animation Collection
│       └── Disc 1
│           └── MGM Cartoons (1934)
│               └── .plexmatch
│           └── Tom and Jerry (1940)
│               └── .plexmatch
│       └── Disc 2
│           └── MGM Cartoons (1934)
│               └── .plexmatch
│               └── Season 1940
│                   └── MGM Cartoons (1934) S1940 E02 - Puss Gets The Boot.mkv    (hardlink)
│           └── Tom and Jerry (1940)
│               └── .plexmatch
│               └── Season 1940
│                   └── Tom and Jerry (1940) S1940 E01 - Puss Gets The Boot.mkv    (hardlink)
│       └── Disc 3
│           └── MGM Cartoons (1934)
│               └── .plexmatch
│           └── Tom and Jerry (1940)
│               └── .plexmatch
│   └── Tom and Jerry
│       └── Spotlight Collection
│           └── Volume 1
│               └── Disc 1
│                   └── MGM Cartoons (1934)
│                       └── .plexmatch
│                       └── Season XXXX
│                   └── Tom and Jerry (1940)
│                       └── .plexmatch
│                       └── Season 0
│                       └── Season 1940
│               └── Disc 2
│                   └── MGM Cartoons (1934)
│                       └── .plexmatch
│                       └── Season XXXX
│                   └── Tom and Jerry (1940)
│                       └── .plexmatch
│                       └── Season 0
│                       └── Season 1950

Following the approach as before, the .plexignore excludes all of the package/disc organizational structure. The following directories were added to library “Test 1”:

  1. …/library/Test1/.mixed/Academy Awards Animation Collection/Disc 1
  2. …/library/Test1/.mixed/Academy Awards Animation Collection/Disc 2
  3. …/library/Test1/.mixed/Academy Awards Animation Collection/Disc 3
  4. …/library/Test1/.mixed/Tom and Jerry/Spotlight Collection/Volume 1/Disc 1
  5. …/library/Test1/.mixed/Tom and Jerry/Spotlight Collection/Volume 1/Disc 2

Ignoring the specials, the raw cartoon counts from the 2 DVD sets should be as follows:

  • MGM Cartoons: 6
  • Tom and Jerry: 13 + 40 = 53
  • Droopy: 1
  • Looney Tunes: 21
  • Popeye: 1
  • Superman: 1

Here’s what Plex discovers:

  • MGM Cartoons: (6 + 13 + 1) + 40 - 9 (duplicates) = 51
  • Tom and Jerry: 13 + 40 - 9 (duplicates) = 44
  • Droopy: 1
  • Looney Tunes: 21
  • Popeye: 1
  • Superman: 1

So the gimmick still works - hiding the now over-organized package/disc structure and cherry-picking the disc-level folders as individual library-level root folders, combined with multiple hardlinks, produces the desired result of multiple-series association.

I withdraw my submission that this is a bug. However, I maintain that this basic premise requires a ridiculous amount of work and perhaps more than the average user’s level of knowledge (i.e., hardlinks) to accomplish the desired result. (EDIT: see later post - it can’t be done with hardlinks; the files have to be unique files/inodes) It really shouldn’t be any more difficult that having the TV Series scanner mindlessly recurse through all subdirectories and blindly believe every .plexmatch file it encounters. Given the level of TVDB integration/awareness/dependence already present, it’s actually a little surprising that there isn’t simply a checkbox in the Advanced section of a TV Series library’s settings asking whether it should “match multiple series and crossover episodes” automatically.

Ultimately, though, this approach lets one store episodic media in the more logical package/disc scheme for things like classic cartoons that aren’t released in series/season form or simply don’t have real seasons.

Thanks for the hints, pshanew!

I think the main fallacy is to try and shoehorn a collection of unrelated items, divided into “Discs” into a library, which relies on the organizational structure of tv shows with Shows/Seasons/and Episodes.
Plex cannot handle such a thing. It relies completely onto the data from TheTVDB, TheMovieDB, and IMDb and the above mentioned organization.

Plex is not a “file player”, like many other solutions, which simply “mirror” the folder oganization of the media files into the user interface.

Hmmm, but one could say the same of TVDB for shoehorning collections of non-series based content into pseudo-series, as well. I’ve heard of others putting cartoons into movie libraries to get around some of these problems but that approach is not without issues, either.

I’ve not been asking for Plex to be a “file player” of a folder structure, as you’ve described. I’m still trying to rely on metadata and TVDB associations. The scanners or agents used to assess the content in a filesystem are like horses with blinders on, though, and there’s really no reason not to make them a little better, to let them assess content a little more freely. Maybe the development team is low on resources but that shouldn’t prevent good ideas from at least being added to the queue.

Interested customers/users can (and do) already supply the minimum necessary hints to correctly identify everything but the scanner-agent behavior is unnecessarily over-constrained if it is demanding the show/series/season file structure and thereby ignoring available information. I’m just asking for these tools to be improved upon. One solution could be the .plexmatch concept, which is already in place - the file just needs to be respected anywhere it is found, not just at the root show level.

A counter-approach example: It should be just as reasonable (though, not fun to manage) to put all content into a single flat directory and embed the {tvdbid=#####} tag into every file name. That also goes against the show/series/season folder convention but the scanner-agents would capture everything just fine - well, everything except multi-series membership because TVDB IDs are not series-unique. Plex need not be limited by TVDB’s shortcomings; it can be smarter.

I spoke too soon on getting things to work. My “Test 1” library had been copied from the original library and all of the hardlinking done in the original library did not transfer over. I was trying to touch-up the files and folder structure in the original library to mimic that of the “Test 1” library, since it seemed to be working, but found that Plex was still behaving the same as before, the corrected original library is still missing huge swaths of files. Upon closer inspection I found that the hardlinks remained in the original library and that the “Test 1” library got brand new copies of everything. My fault for not checking earlier.

Anyway, what this seems to imply is that Plex is indeed tracking files at an inode level, which is insane. There’s literally no way to associate a TV Series episode (or cartoon) to multiple series (which should be fair game, since TVDB does it) without wasting disk space on every file that needs to cross over.

pshanew - can you confirm whether, in your experiment, you used a single file with two hardlink endpoints or two independent files with separate/unique inodes? I’m thinking it had to be the latter.

That is indeed the case; I tested with two separate (but identical) files when I tried this out. I’ll try to find some time to test with hard links when I can.

@Philoplexer

I was able to test with hard links; unfortunately, I experienced issues with this as well. The episode linked between two series will match to one or the other of them (seemingly alternating between scans), but not both. My initial assumption was that given the different file names and paths that the files would be sufficiently different to warrant separate matches.

Sorry for the confusion. But as it stands, if you’d like for the episodes to appear in two (or more?) series, they must be duplicated it would seem.

Well shoot. At least I’m not crazy then.

I’m trying to poke through Plex’s sqlite databases…so far no evidence of the actual inodes being saved but that doesn’t mean they aren’t being saved indirectly as hashes or being used in a duplicate-removal scheme by the file scanner.

I think at this point I need to let the Plex development team work on this, mainly because I’ve already sunk too much time into researching the problem. Please add the issue to the list - I’d be happy just to know that the problem is acknowledged will eventually be looked at.

The basic premise is that, if/when TVDB cross-references an episode to multiple series, so should Plex, especially since we’re supposed to follow TVDB conventions so that Plex will work with series content correctly.

I thought we determined that it did, when duplicate copies of the files exist in the. filesystem (they can be the same file). I had the same file present in two different series, named appropriately for each series, and it correctly identified it in both, without fail. Granted, it was a limited test case.

The problem arose when hard linking (instead of copying and renaming), at least according to my testing. This caused indeterminate series placement for the episode being linked, with it seemingly alternating between each series up metadata refresh.

The remainder of the issues appear to have been related to file naming and/or organization.

Did I misunderstand where we ended up with this?

Yes, I should clarify. Your original experiment did verify that Plex can maintain two different series with a shared episode but it could only do so by duplicating the file such that the duplicates not only have different names but also different inode addresses, thus doubling the required disk space (at an inode-level and not a byte-level, no less!).

Furthermore, while the .plexmatch file is a great tool for helping the scanner-agents out, it is clearly being ignored or ranked lower than inode addresses during file enumeration. Otherwise, a file with a single inode address and two hardlink endpoints in different folders with correspondingly different .plexmatch files would be properly recognized.

I maintain that Plex should not need any inode-level data to identify media content or metadata. If there’s a good technical argument for it, then I’d love to hear it, but in the meantime, Plex is asking us to follow TVDB conventions when it can’t fully do so itself (or asking that users double their disk usage for every crossover file).

I can think of at least two (general) approaches to solving the problem:

  1. Do not use inode addresses. It is unclear whether this is a scanner-agent issue or a database storage issue. It is unclear why inode addresses were ever needed. Maybe the scanner-agent is resolving duplicates via inode address comparisons; that should be a pretty quick bug fix. I somehow doubt that inode addresses are being used as unique keys in the database. I saw no evidence of them being used as direct keys and one since does not usually calculate guids like hashes (new guids are usually random), so I also doubt that inodes are applied indirectly, either.

  2. Turn it into a feature. Plex already knows about TVDB so any time it comes across a file that belongs to multiple series should already have the ability to identify it as a multi-series episode. Essentially, as the TV scanner-agent discovers each file, it checks if it’s a cross-over and if it is, adds the correct meta data for each series. Heck, it could be made into a toggle-able option in the library settings.

The advantage of (1) is that it is a better design - inode addresses are not the filesystem the user agreed to present to Plex; the directories and files are. I’m betting that multi-series episodes aren’t the only circumstance where users (customers) are getting into fights with Plex because of the inode thing. Plex may very well fix other deficiencies by eliminating inode awareness.

The advantage of (2) is that users need not become expert admins to achieve desired results. If only (1) occurred and inodes were no longer utilized, users would still need to know how filesystems work at a lower level and develop a comfort in working with hardlinks.

There’s no reason both (1) and (2) can’t coexist.

The naming and file organization problem is a separate issue, and not likely an actual problem.

Most TV series are sold in terms of seasons and there aren’t many cases of an episode having many different versions based on which package it was released on. Usually, you’ll find something that got remastered but it’ll be the whole series, not just one episode or even one season. In that case, it’s easy enough to keep two folder trees, one for the original release and one for the remaster, embed the {edition-XYZ} tag in the two top-level folders and click the “split” button in Plex.

Classic cartoons, on the other hand, are pretty much the polar opposite. They are rarely (probably never) sold as season packages and there are always variations in remastering to contend with. After reviewing my admittedly very non-confirming directory structure for Looney Tunes, I think Plex should give itself more credit than it currently does for being able to cope with files not being stored in Library/Show/Season# form. I don’t have a single Looney Tunes folder in that format and I’m certain that it’s detecting most, if not all, files correctly. The trick here is that Looney Tunes doesn’t have any/many cross-over episodes. Sure, there are some MGM cartoons on the Golden Collection discs and whatnot but those aren’t multi-series examples…those can be handled with .plexmatch files.

Any traction? Progress?

Can files be enumerated without inode-level tracking yet? Can we save just one physical file and have it register in multiple series so as to not waste disk space?

I’ve now tried symlinking to hardlinked files (one piece of actual data, two file hardlinks for that data, both hidden from Plex, then separate symlinks pointing to those hardlinked files). No luck - Plex still latches on to the inodes, sees that they’re the same and removes the “duplicate”.

Can we just have an option to turn off the “de-duplicator feature”? It’s not helping.