How to join metadata_items and metadata_item_views?

As mentioned in a few previous posts, I’m building a reporting solution to track my music listening habits. I’m reporting on data from the Plex SQLite database, and for the most part I’m figuring out the schema through exploration, but I’m currently stuck.

How to join metadata_items and metadata_item_views?

The simple answer is that I join them on metadata_items.guid = metadata_item_views.guid, but metadata_items.guid isn’t unique. I have many tracks where Plex has assigned the same guid value, which means that a single listen can show up as many listens when I join these tables.

Is there a better / correct way to do this?

Thanks in advance!

This should be unique, are you sure it’s not.

Tracks can have the same id if it’s the same track / recording but included on multiple albums. Plex uses track associations at musicbrainz for this to know it’s the same track.

Pretty sure.

WITH DuplicateGuidCTE 

AS

(

  SELECT guid

        ,COUNT (DISTINCT ID) AS id_count

    FROM metadata_items 

   WHERE guid LIKE ('plex://track/%')

GROUP BY guid

  HAVING id_count > 1

)

  SELECT * 

    FROM metadata_items WHERE guid IN (SELECT guid FROM DuplicateGuidCTE)

ORDER BY guid ASC

Produces these results:

Most or all (I haven’t looked exhaustively) duplicates come from the same song being represented on multiple albums. Examples include a single that’s also on an LP, or tracks from an LP also being included on a compilation.

I could disambiguate by including the parent metadata item in the join, but metadata_item_views doesn’t include a parent_guid column - it only includes guid and grandparent_guid.

I’m looking for a way to make it so that if I listen to the LP I can say in a query “I listened to this track on this LP” and if I listen to a compilation I can say “I listened to this track on this compilation.”

In theory I can do something hacky to always assume I listened to the most recently added instance of a track (or whatever winner I choose) but I’d prefer to have an approach that represents my actual listening habits if possible.

Any ideas or tips?

I don’t think this is possible but will let someone from plex answer.

I seem to remember asking this exact question years ago when I started adding music to my server. I noticed duplicate played history on the dashboard when I played a track and it was due to the fact that tracks are linked as they are same recording (based on musicbrainz musicbrainz_trackid) and when plex queries it, plex returns based on the guid and as the track appears on multiple releases its shown multiple times.

EDIT: here is the thread Dashboard reporting play history tracks from different albums to what was actually played (user anon5074910 is my old account).

I don’t have music on my Plex server so I am only going by what I have with TV shows (which would be the “closest similarity” to music). Couldn’t you do something like the manager/employee setup (where you have an employee table and the manager field holds an employee id) to do something where you can append the duplicate titles/“entries” to have hold the title from parent_id (I am guessing that is the album title)? That should allow you to differentiate the albums. As an example, if the track “This Song is Awesome!” appears on 2 albums, “Awesome Album 1” and “Some Awesome Songs,” then have the track played appear as “Awesome Album 1 - This Song is Awesome!” and “Some Awesome Songs - This Song is Awesome!” to give you distinct tracks.

I am not the best with SQL when it gets to more complicated joins (and this might not even be complicated, I just struggle some times with it), but I make do. I feel like something like that should be possible and I think accomplish what you are trying to go for.

Edit: Showing code of what I mean from the employee/manager example (got this from Google search asking about the employee/manager example).

Per the result, this assumes a table named Employee with columns EmployeeID, Name and ManagerID.

SELECT
    e.Name AS EmployeeName,
    m.Name AS ManagerName
FROM
    Employees e
INNER JOIN
    Employees m ON e.ManagerID = m.EmployeeID;

More edits!

OK, here is an edit with an example from my database using the first episode for the series ALF. I did not join to the metadata_item_views table, but I am hoping this is beneficial/helpful in understanding what I was saying. Also, I am hoping it relates to music as I do not know how the parent/grandparent thing works for that (for shows parent = season, grandparent = series title). (Side note, I do not normally format my SQL exactly like below, but since it was how the above one was formatted from the Google search and kind of how @MatthewRoche formatted their SQL, I attempted to do similar).

SELECT
	episode.title as "Episode Title",
	season.title as "Season",
	series.title as "Series",
	series.title || ' - ' || season.title || ' - ' || episode.title as "Combined Title"
FROM
	metadata_items episode
INNER JOIN
	metadata_items season ON season.id = episode.parent_id
INNER JOIN
	metadata_items series ON series.id = season.parent_id
WHERE
	--episode.title = 'A.L.F.';
	series.title = 'ALF';

(last minute change to filter on the series instead of a single episode, but left that part in).
Screenshot showing some of the results:

Again, hope that can actually be of use.

-Shark2k

Thank you - that link strongly suggests I need to go down the hacky path to eliminate duplicates.

Closing the loop on this thread: This is the hacky approach I implemented in my reporting solution:

WITH NoDupGuidsCTE

AS

(

-- For each Plex metadata item GUID, find the latest ID value

  SELECT guid

        ,MAX (id)   AS max_id

    FROM metadata_items

GROUP BY guid    

),

DeDuplicatedItemsCTE

AS

(

-- Replace metadata_items, making guid unique

  SELECT i.*

    FROM metadata_items i

         RIGHT OUTER JOIN NoDupGuidsCTE n

            ON i.id = n.max_id

)  

  SELECT * FROM DeDuplicatedItemsCTE

This isn’t ideal because now I can’t tell which version of a track I listened to (was it the single, the EP, or the LP? who knows?) but now the listen times in my reporting are no longer skewed.

:woman_shrugging:

@MatthewRoche Just wondering if there is any reason my answer will not work for you? I ended up creating a test music library on my test server so that I can get some artist, album and track data into that DB.

I used “Weird Al” Yankovic as my test and loaded the albums Even Worse, UHF, Off the Deep End and The Food Album specifically because there are duplicate tracks between these albums (specifically there are 5 songs that appear on 2 different albums each, so a good spread).

Anyway, I updated my SQL from my answer above to the below and based on what you provided in your last post and the fact that you are only looking at metadata_items from that, I believe it would help accomplish what you are trying to do.

SELECT
	artist.title as "Artist",
	album.title as "Album",
	track.title as "Track Title",
	artist.title || ' - ' || album.title || ' - ' || track.title as "Combined Title",
	track.guid as "Episode GUID",
	album.guid as "Album GUID",
	artist.guid as "Artist GUID"	
FROM
	metadata_items track
INNER JOIN
	metadata_items album ON album.id = track.parent_id
INNER JOIN
	metadata_items artist ON artist.id = album.parent_id
WHERE
	artist.title = '“Weird Al” Yankovic';

Note that WHERE clause in this case is specifically to narrow my results and all the fields in the select aside from the “Combined Title” field were just for confirmation and also showing the individual field.

Screenshot of the results from above query:

Screenshot of the results of above query filtering on specific tracks:

I realize the GUID fields do not actually shown anything in the screenshots with the way DB Browser for SQLite shows the results, but you run the query on your DB and check those out.

Additionally, I do not know if you are looking at the metadata_item_views table at all, but I just took a look and noticed that for music, in that table you have the artist, album and track information, so you should be able to use that from there to determine which version was played.

I am not sure if I am missing/not understanding something, but it seems to me that with the above query, you should be able to differentiate the version you are listening too.

-Shark2k

As far as I can tell, your response isn’t related to the specific scenario and issue I’m asking about. Unless you’re proposing a fix for multiple tracks in metadata_items that each have the same guid value that allows them to be joined deterministically to metadata_item_views, it’s not relevant to my question.

I appreciate the time you’ve put into your responses, but neither of the queries you’ve shared gives me what I need.

Again, unless I am misunderstanding what you are trying to accomplish I thought it was clear that I was proposing a fix for multiple tracks in metadata_items that each have the same guid value that allows them to be joined deterministically to metadata_item_views. I thought I was giving you building blocks to do that, but since it appears I was incorrect, here is a query that I think is doing what you are trying to accomplish (I have no idea how you are actually using this, so I do not know if not using the guid matters or not).

WITH UniqueMetadataItemsCTE
AS
(
-- create unique metadata_item "index" for entries with the same GUID across multiple albums
	SELECT
		artist.title || ' - ' || album.title || ' - ' || track.title as "combined_title", track.guid
	FROM
		metadata_items track
	INNER JOIN
		metadata_items album ON album.id = track.parent_id
	INNER JOIN
		metadata_items artist ON artist.id = album.parent_id
	WHERE
		artist.title = '“Weird Al” Yankovic'
)
SELECT
	u.combined_title,  COUNT(u.combined_title) as "plays"
FROM
	UniqueMetadataItemsCTE u INNER JOIN
		metadata_item_views miv ON u.guid = miv.guid
WHERE
	miv.grandparent_title || ' - ' || miv.parent_title || ' - ' || miv.title = u.combined_title
GROUP BY (miv.grandparent_title || ' - ' || miv.parent_title || ' - ' || miv.title)

(Again, the WHERE clause for the artist is just for my testing purposes to only include audio tracks, that could be changed to track.library_section_id = 15 [with 15 being your library_section_id for music], or just removed if you only have a music library).

Screenshot of the results from the above query (added the guid just for validation purposes):

As I created this library just for testing this, I can confirm that I did in fact play each of those songs the number of times shown in the result. I specifically made sure to do multiple plays to be able to show the result.

-Shark2k

@MatthewRoche
Slight update to my previous reply as I created a second CTE just cause I am not sure if that might be more efficient then what I did previously.

WITH UniqueMetadataItemsCTE
AS
(
-- create unique metadata_item "index" for entries with the same GUID across multiple albums
	SELECT
		artist.title || ' - ' || album.title || ' - ' || track.title as "combined_title", track.guid
	FROM
		metadata_items track
			INNER JOIN metadata_items album ON album.id = track.parent_id
			INNER JOIN metadata_items artist ON artist.id = album.parent_id
	WHERE
		-- can filter on artist if you just want to see play counts for a particular artist
		-- artist.title = '“Weird Al” Yankovic'
		-- can filter on the library_section_id here so you are only including the music library, use track.library_section_id in (#,#,#) if you have multiple music libraries
		track.library_section_id = 15
),
CustomUniqueIdentifierCTE
AS
(
-- create same unique "index" from above CTE for the metadata_item_views data
	SELECT
		miv.grandparent_title || ' - ' || miv.parent_title || ' - ' || miv.title as "custom_guid", miv.guid as "original_guid"
	FROM
		metadata_item_views miv
	WHERE
		-- filter for the music library at this stage to reduce the amount of data being processed in the final select
		miv.library_section_id = 15
)
SELECT
	u.combined_title,  COUNT(u.combined_title) as "plays", c.original_guid
FROM
	UniqueMetadataItemsCTE u
		INNER JOIN CustomUniqueIdentifierCTE c ON u.combined_title = c.custom_guid
GROUP BY
	c.custom_guid
ORDER BY c.original_guid

Pretty sure the above accomplishes what you want (you can remove the artist from the combined_title/custom_guid as it probably is not necessary, I just added because, no real reason).

With artist:

Without artist:

-Shark2k

I appreciate the effort, but this only works if the combination of titles is unique, and that is not the case. In my library I have multiple examples where a track appears on a single and an album that both have the same title, and they’re both by the same artist.

Your approach is flawed. You are only looking at the id, which as you’ve found is not unique. The id only tracks the song. You need to change and use a combination of fields so you can track the song and album. You’re not going to get what you want with just the id.