Need Help Again With SQL Query : Plex SQLite.exe

Hello,

This is a follow-up to a previous post that I opened here : Need Help With SQL Query : Plex SQLite.exe

With much help from @DTR last year I have 2x SQL scripts that help me find duplicate entries in plex. I am running these once a week but I am starting to encounter larger and larger numbers of false-positives due to my use of files with CD1 and CD2 in the filename.

Can someone help me to add this one exception, I keep getting syntax errors. I need something like:

WHERE strfilename NOT LIKE ‘%CD1%’ and strfilename NOT LIKE ‘%CD2%’

But I am getting syntax errors trying to add this to:

.header on
.mode csv
.output //MyServer/MyShare/Misc/Scripts/DuplicateMovies/PLEX/SQL_DUPES_PLEX_RESULTS.csv

SELECT metadata.title AS Title, parts.file AS Path FROM media_parts parts
INNER JOIN media_items items ON items.id=parts.media_item_id
INNER JOIN metadata_items metadata ON metadata.id=items.metadata_item_id
INNER JOIN (
SELECT guid FROM metadata_items
WHERE metadata_type=1
GROUP BY guid
HAVING COUNT(*) > 1
) duplicates ON duplicates.guid=metadata.guid
ORDER BY title, metadata.guid;

.header off

SELECT metadata.title AS Title, parts.file AS Path FROM media_parts parts
INNER JOIN media_items items ON items.id=parts.media_item_id
INNER JOIN metadata_items metadata ON metadata.id=items.metadata_item_id
INNER JOIN (
SELECT metadata_item_id FROM media_items
GROUP BY metadata_item_id
HAVING COUNT(*) > 1
) duplicates ON duplicates.metadata_item_id=metadata.id
WHERE metadata.metadata_type=1
ORDER BY title, metadata.guid;

.exit

I am sure that I am missing something obvious and trying to exclude the specific filenames in the wrong part of the SQL. I would appreciate any help.

~TJ

Maybe something like:

SELECT 
	title as Title
	, parts.file AS Path
FROM (media_parts AS parts
INNER JOIN 
	media_items AS items
ON items.id = parts.media_item_id
INNER JOIN 
	metadata_items AS metadata
ON metadata.id=items.metadata_item_id
INNER JOIN (
SELECT guid FROM metadata_items AS mdItems
WHERE mdItems.metadata_type=1
GROUP BY mdItems.guid
HAVING COUNT(*) > 1
) duplicates ON duplicates.guid=metadata.guid
)
where parts.file NOT LIKE '%CD1%' AND parts.file NOT LIKE '%CD2%'
ORDER BY title, metadata.guid;

Thanks @dane22,

This is mostly working but I need the SQL to do the count after I do the NOT LIKE CD1, otherwise it still muddles my results. See below where I still have some positive results for 3D movies that I want to exclude.

My revised SQL is here:

SELECT metadata.title AS Title, parts.file AS Path FROM media_parts parts
INNER JOIN media_items items ON items.id=parts.media_item_id
INNER JOIN metadata_items metadata ON metadata.id=items.metadata_item_id
INNER JOIN (
SELECT guid FROM metadata_items
WHERE metadata_type=1
GROUP BY guid
HAVING COUNT(*) > 1
) duplicates ON duplicates.guid=metadata.guid
where parts.file NOT LIKE ‘%CD1%’ AND parts.file NOT LIKE ‘%CD2%’ AND parts.file NOT LIKE ‘M:\3D%’
ORDER BY title, metadata.guid;

New results are like this:

Title,Path
“Alice Through the Looking Glass”,“M:\Movies\Alice Through the Looking Glass (2016)\Alice Through the Looking Glass (2016)_new.mkv”
“Alita: Battle Angel”,“M:\Movies\Alita Battle Angel (2019)\Alita Battle Angel (2019)_4K.atmos.mkv”
Ant-Man,“M:\Movies\Ant-Man (2015)\Ant-Man (2015)_4K.atmos.mkv”

whereas the old sql was giving this:

Title,Path
“Alice Through the Looking Glass”,“M:\Movies\Alice Through the Looking Glass (2016)\Alice Through the Looking Glass (2016)_new.mkv”
“Alice Through the Looking Glass”,“M:\3D\Alice Through the Looking Glass (2016) 3DSBS\Alice Through the Looking Glass (2016) 3DSBS.mkv”
“Alita: Battle Angel”,“M:\Movies\Alita Battle Angel (2019)\Alita Battle Angel (2019)_4K.atmos.mkv”
“Alita: Battle Angel”,“M:\3D\Alita Battle Angel (2019) 3DSBS\Alita Battle Angel (2019) 3DSBS.atmos.mkv”
Ant-Man,“M:\Movies\Ant-Man (2015)\Ant-Man (2015)_4K.atmos.mkv”
Ant-Man,“M:\3D\Ant-Man (2015) 3D SBS\Ant-Man (2015) 3D SBS.mp4”

Any ideas about ignoring M:\3D\ and CD1 higher up in the order of operations? I have something similar running in Kodi and that SQL is doing the count at the very end which is 100% what I am after. See below:

.header on
.mode csv
.output //MyServer/MyShare/Misc/Scripts/Scheduled/KODI/SQL_DUPES_KODI_EPISODES_RESULTS.csv

select

        tvshow.c00 show,

        episode.c12 s,

        episode.c13 e,

        episode.c00 title,

        count(*) num,

        group_concat(path.strPath || files.strFilename) locations

from files

join episode on episode.idFile = files.idFile

join path on files.idPath = path.idPath

join tvshow on episode.idShow = tvshow.idShow

where files.strFilename NOT LIKE ‘%CD1%’ AND files.strFilename NOT LIKE ‘M:\3D%’

group by episode.idShow, s, e having num > 1;

Really confused about what you want here, but maybe this:

SELECT 
	title AS Title,
	parts.file AS Path,
	library.name AS Library,
	metadata.guid AS Guid	
FROM (
	SELECT 
		file,
		media_item_id
	FROM media_parts
	WHERE file NOT LIKE '%CD1%' AND file NOT LIKE '%CD2%'
	)
AS parts
INNER JOIN 
	media_items AS items
ON items.id = parts.media_item_id
INNER JOIN metadata_items AS metadata
ON metadata.id=items.metadata_item_id
INNER JOIN library_sections as library
ON metadata.library_section_id = library.id
INNER JOIN (
SELECT guid FROM metadata_items AS mdItems
WHERE mdItems.metadata_type=1
GROUP BY mdItems.guid
HAVING COUNT(*) > 1
) duplicates ON duplicates.guid=metadata.guid
ORDER BY title, metadata.guid;

Edit: Also added Guid and library name to the output

I appreciate your efforts, dane22. Sorry that I’m not explaining myself very well. I kept your suggestion of including the library in the output but I dropped the Guid.

I’m even more confused now then when I started this. Why do some files appear in the output but not others? I am now excluding only ‘CD1’ for testing purposes, see below:

SELECT
title AS Title,
parts.file AS Path,
library.name AS Library
FROM (
SELECT
file,
media_item_id
FROM media_parts
WHERE file NOT LIKE ‘%CD1%’
)
AS parts
INNER JOIN
media_items AS items
ON items.id = parts.media_item_id
INNER JOIN metadata_items AS metadata
ON metadata.id=items.metadata_item_id
INNER JOIN library_sections as library
ON metadata.library_section_id = library.id
INNER JOIN (
SELECT guid FROM metadata_items AS mdItems
WHERE mdItems.metadata_type=1
GROUP BY mdItems.guid
HAVING COUNT(*) > 1
) duplicates ON duplicates.guid=metadata.guid
ORDER BY title, metadata.guid;

I was expecting to find the below in my output because we are ignoring CD1, but this movie is a 3-parter and CD2 and CD3 should still be in the report but they are not listed. Even if I remove the line with NOT LIKE ‘CD1’, none of these appear in my output:

M:\Movies\An Evening with Kevin Smith (2002)\An Evening with Kevin Smith (2002) CD1.avi
M:\Movies\An Evening with Kevin Smith (2002)\An Evening with Kevin Smith (2002) CD2.avi
M:\Movies\An Evening with Kevin Smith (2002)\An Evening with Kevin Smith (2002) CD3.avi

Another example, this is in my output when I exclude CD1, but shouldn’t the count not be greater than 1 if I am ignoring CD1?

The Lord of the Rings The Two Towers (2002)_4K.atmos_new.CD2.mkv"

If I do a ‘get info’ in plex web for the above two films I am not seeing any obvious differences - all CD# files are listed and the run time is the sum of all CD# files.

I also experimented with WHERE file NOT LIKE ‘%CD1%’ AND file NOT LIKE ‘M:\3D%’

But I still get 68 results which are all 2d versions of movies that I also have in 3D, but I have aprox 200 3D movies so this result is not making sense to me. The Martian (2015) is another example where I have a 2D and a 3D version, also not listed in the output.

Scratching my head,
~TJ

And for the record, your naming std. is not optimal for plex!

Above should be:
M:\Movies\Alita Battle Angel (2019) {imdb-tt0437086}\Alita Battle Angel (2019) {imdb-tt0437086} [4K.atmos].mkv

That’s cool and all but not within scope of my issue here. Latest docs from plex are still saying the ID# is optional.

I am also using Kodi and so there has been a bit of a balancing act getting my media to scan into both 100%. Kodi and Plex don’t always agree on naming standards, even some of my 2013 films (example) won’t scan into Plex because it’s expecting 2014 - or vice versa.

In the early days 4K and atmos items were not being identified correctly and so I am using the _4K and .atmos. parts of the filename as a workaround, maybe now in 2024 the scanning is better but I have learned to leave well enough alone. I can test that out the next time I get a 4K/atmos/dts-x item.

I do not know Kodi, nor how their database is structured

What I gave you, is sadly all I got, and I did show you how to change a join into a select

When said, you do know that Plex in the web interface has an option to filter on duplicates, right?

Thanks @dane22,

I do check for duplicates in the web interface from time to time, but I have these and other SQL running weekly to avoid some of that manual work. A powershell script will send me an email if I have duplicate Movies or TV episodes in Kodi; I would like to eventually expand into checking for duplicate TV Episodes in Plex (something the web interface cannot do) but the SQL results so far in plex are not 100% as expected.

With your suggestions I have reduced the number of false-positives to half but I think there is still room for improvement. Are there any DB experts over at plex who check the forums?

Kodi vs Plex DBs - not much commonality unfortunately, what I learn in one doesn’t carry over to the other.

I ended up with the below which is mostly working, still some false positives but I’ve cut that # in half.

SELECT metadata.title AS Title, parts.file AS Path
FROM media_parts parts
INNER JOIN media_items items ON items.id = parts.media_item_id
INNER JOIN metadata_items metadata ON metadata.id = items.metadata_item_id
WHERE parts.file NOT LIKE ‘%CD1%’
AND parts.file NOT LIKE ‘%CD2%’
AND parts.file NOT LIKE ‘M:\3D%’
AND metadata.guid IN (
SELECT guid
FROM metadata_items
WHERE metadata_type = 1
GROUP BY guid
HAVING COUNT(*) > 1
)
ORDER BY title, metadata.guid;