Cast & Crew + Show, Season & Episode info
Thank you very much SolarPlex for the brilliant thread. I just wanted to share my experience about pulling metadata from Plex SQLite db.
It seems that db design has improved over the past 2 years to accommodate multiple values joined by the '|' operator in metadata_items.tags_xxx fields, as in 'Black-and-white|Drama', my understanding is that for common tags such as director, country etc... tags and taggings tables are now used to store user manual input 'for some time', then on some occasion, these values are stored back into metadata_items.tags_xxx fields.
If this may be of some help, below is the query I use to populate an Excel catalog of movies :
[EDIT 12 FEB 2014] : the last field, i.hints holds TV Show specific data : show, season and episode
I use VBA code to expand the field onto 3 columns; feel free to PM me if you need more details.
SELECT
d.id AS id, MAX(p.id) AS part, p.media_item_id AS item,
CASE WHEN ifnull(d.tags_country,'') <> '' THEN d.tags_country ELSE COALESCE((SELECT t.tag FROM taggings AS g JOIN tags AS t ON g.tag_id = t.id WHERE g.metadata_item_id = d.id AND t.tag_type = 8),'') END AS country,
CASE WHEN ifnull(d.tags_director,'') <> '' THEN d.tags_director ELSE COALESCE((SELECT t.tag FROM taggings AS g JOIN tags AS t ON g.tag_id = t.id WHERE g.metadata_item_id = d.id AND t.tag_type = 4),'') END AS director,
CASE WHEN ifnull(d.tags_star,'') <> '' THEN d.tags_star ELSE COALESCE((SELECT t.tag FROM taggings AS g JOIN tags AS t ON g.tag_id = t.id WHERE g.metadata_item_id = d.id AND t.tag_type = 6),'') END AS actor,
CASE WHEN ifnull(d.tags_genre,'') <> '' THEN d.tags_genre ELSE COALESCE((SELECT t.tag FROM taggings AS g JOIN tags AS t ON g.tag_id = t.id WHERE g.metadata_item_id = d.id AND t.tag_type = 1),'') END AS genre,
CASE WHEN ifnull(d.tags_collection,'') <> '' THEN d.tags_collection ELSE COALESCE((SELECT t.tag FROM taggings AS g JOIN tags AS t ON g.tag_id = t.id WHERE g.metadata_item_id = d.id AND t.tag_type = 2),'') END AS collection,
d.title || CASE WHEN ifnull(d.original_title,'') = '' OR ifnull(d.original_title,'') = d.title THEN '' ELSE ' (' || d.original_title || ')' END AS title,
d.[year], p.file,
i.hints
FROM
metadata_items AS d, media_items AS i, media_parts AS p, library_sections AS l
WHERE
d.id = i.metadata_item_id AND i.id = p.media_item_id AND l.id = d.library_section_id AND
l.[name] = 'movies' AND d.deleted_at IS NULL
GROUP BY p.media_item_id;
Part II
The above is, of course, far from complete. One thing I have not mentioned so far is how "cast & crew" information is handled. The database layout is similar to what we see on the Internet Movie Database web site. The fields "tags_director", "tags_writer" and "tags_star" appear on top of the page and are stored in the "metadata_items" table in PLEX, but if you dig deeper IMDB also has an extra "Cast & Crew" information that contains a more detailed list of all the people involved in making a particular movie - partly repeating the same information.
PLEX does exactly the same, it uses an extra table for this type of information and it may be a little tricky to understand how it works.
Three tables are involved:
- 'metadata_items':
This is the table we already know with all the general info on a movie, where each movie has exactly one 'id'
- 'tags':
Think of the table "tags" as a people database for a moment. It contains the name of a person or category as 'text', the function of a person or category as 'tag_type'. Each person or category with one specific function has exactly one 'id' here. Some people can appear twice or even thrice with different IDs because they may appear in movies aswriters, actors and directors. Clint Eastwood would be a prominent example. Clint Eastwood as director would have tag_type=4, the actor would be tag_type=6,
the writer would be tag_type=5
[*]'taggings':
this one brings movies and people together. Whenever Clint Eastwood appears in a movie and entry in taggings is generated (or should be) that consists of the ID from metadata_items (metadata_items_id), the ID from tags (tag_id) an optional text (text) that may contain the role he plays in that movie, and an index that gives the position in the cast & crew list.
table "taggings"
----------------------------------------------------
text = character played in movie
tag_id = id from table "tags"
metadata_item_id = id from table "metadata_items"
index = Position in list
table “tags”
tag = name of person
tag_type = 1=genre,2=collection,4=director,5=writer,6=actor,8=country
id = tag_id from table “taggings”
Complicated? Well, not really, is it? What is the benefit of all this, why do we need this cast & crew list when we already have tags_director, tags_star and tags_writer? The reason is that the
people’s tags in “metadata_items” are limited: There is just one single row for each movie, but there may be more than one piece of information, i.e. multiple actors, multiple writers, multiple genres, etc. A database, however, stores one piece of information once in one place and refer to that same information from all other places. Actually, the information about actors, directors, writers, genres, collection in our table metadata_items are derived from other tables to speed up some often needed queries where we do not need the complete information.
If we want to know all movies that have Angelina Jolie in it, we can search all of the tags_star fields in the table metadata_items for that string (as discussed above). But the proper and more reliable way is to look up tag=“Angelina Jolie” where tag_type=6 in the table “tags” and feed the found ids to the table “taggings” that returns the ids in the metadata_items. We could do it like this (clumsy):
SELECT title FROM metadata_items JOIN taggings JOIN tags WHERE tags.tag=“Angelina Jolie” AND tags.tag_type=6 AND tags.id=taggings.tag_id AND taggings.metadata_item_id=metadata_items.id;
Or much more elegantly with nested SELECT commands and the keyword IN that you may already know from shell scripts, C, Python or whatever:
SELECT title FROM metadata_items WHERE id IN (SELECT metadata_item_id FROM taggings WHERE tag_id IN (SELECT id FROM tags WHERE tag=“Angelina Jolie”));
“IN” is pretty cool. You can use it very often with nested commands where you pass on results from one table to query a second one.
That was the database READING perspective.