Welcome to our forums! Please take a few moments to read through our Community Guidelines (also conveniently linked in the header at the top of each page). There, you'll find guidelines on conduct, tips on getting the help you may be searching for, and more!

[REL] TGC (The Great Courses) Metadata Agent

bubonic314bubonic314 Members Posts: 5 ✭✭

Hello-

If you're like me and have a large collection of The Great Courses that you have downloaded, you would like there to be a metadata agent for these. Well, a while back I wrote code using urllib, BeautifulSoup and dryscrape to pull series summary, lecture titles, lecture descriptions, lecturer/professor, episode dates, ratings, fanart and poster art and was manually copying and pasting the data into my series. After a few times of doing this, I figured that I should spend the time to code a PLEX Agent for TGC. So I did. This has been tested only on Ubuntu 15.10 running the PLEX media server. No information on other systems is currently available.

The TGC.bundle https://github.com/bubonic/TGC.bundle that I ended up with doesn't use the PLEX HTMLparser API or the JSON API, but instead I did a quick hack of the code I had already written using urllib and BeautifulSoup and inserted this into my Agent. Thus, full compatibility is still a ways off, so we will call this a BETA release of the TGC Agent. It pulls everything but rating; adding incremental dates from the day of series addition so that after a lecture is watched the new one will appear in the On Deck portion of PLEX.

Even though my agent pulls lecturer/professor metadata from TGC website, I am still having issues populating episode.writers and episode.directors. I tried various means of populating this data to no avail. So if anyone has any insight as to what I need to do to get this populated please chime in. Right now I have:

                            episode.directors.clear()
                            episode.writers.clear()
                            episode.directors.add(lecturer)
                            episode.writers.add(lecturer)

The old documentation I found said this was a Set object, which I interpreted in various ways (i.e., as a Set object, as a string, as a list, so on and so forth). I even set these values to different types of objects containing the lecturer/professor data, however nothing seemed to work.

The rating was pulled using dryscrape which I believe is incompatible with the restricted PLEX python library. So that code has been commented out for now. Please carefully read the instructions on how to get the TGC Agent working for you. I will include the README.md from the git repository here for completeness.

TGC.bundle

Requirements

For now, this PLEX Agent requires urllib/urllib2 and BeautifulSoup to run. Please have these installed in your python set-up.

INSTALL

Download the complete contents of this git project (git clone) and copy the bundle (TGC.bundle) to your PLEX plug-ins directory.

The appropriate plug-ins directory can be found on the PLEX website (https://support.plex.tv/hc/en-us/articles/201106098-How-do-I-find-the-Plug-Ins-folder-)

Be sure to restart the PLEX server after copying the Agent plug-in.

Usage

Rename your course to filenames of the type:

FULL COURSE NAME S01E## Some Text.ext

where E## is the lecture number. The Full Course name should be taken from TGC website.
The files must be renamed properly for this Agent to pull the data from TGC website.

Example:

Games People Play: Game Theory in Life, Business, and Beyond S01E01 The World of Game Theory.mp4

or

Change and Motion: Calculus Made Clear, 2nd Edition S01E10 Blah.avi

Next:

Create a TV Shows Library in PLEX and set the primary agent to TGC. i.e.,

Add directories individually (not directories of directories) to your library. i.e.,

That should do the trick.

There remains a few minor bugs in this code which made me reluctant to release it, but no code is perfect. I'm very busy with multiple projects, but I will try to keep this updated whenever things change. Also, I have a TODO list for this Agent that is on the git repository and I do have ambitions to see them through, time permitting.

With that said:

Enjoy!
-bubonic

Comments

  • ZeroQIZeroQI Members Posts: 1,157 ✭✭✭

    Yeah it changed and undocumented. Only way is to analyse the recent agent code...
    I have code to read values and only update if needed shall you require, if not this shall suffice:

    episode.directors.clear()
    meta_role = episode.directors.new()
    meta_role.name = 'name' # role name
    meta_role.role = None # actor name
    meta_role.photo = None #url of actor photo

  • bubonic314bubonic314 Members Posts: 5 ✭✭

    Beside the old documentation, I was looking at a git clone of TheTVDB Agent; must have been older code. Anyway, that did the trick. How exciting. Thank you very much!

  • ZeroQIZeroQI Members Posts: 1,157 ✭✭✭

    https://github.com/ZeroQI/Hama.bundle/releases/download/Beta/Hama.bundle.2017-05-02.01h37.zip
    Have a look at the code if you wish, really funky stuff in there ;)
    Am glad it solved for you

  • ZeroQIZeroQI Members Posts: 1,157 ✭✭✭
    edited May 2

    https://github.com/ZeroQI/Hama.bundle/releases/download/Beta/Hama.bundle.2017-05-02.01h37.zip
    have a look, you might find nice functions in there... Like the impricated dict functions, allowed me to make shorter code with less tests... Let me know if you miss other info or stuff i may be able to help...
    Bubonic314... Hence the black pi ;)

  • bubonic314bubonic314 Members Posts: 5 ✭✭

    Niice! That code is pretty dense and might take me a little while to go through it all. However, at first glance there is a lot I can use in here that is really high quality code. UpdateMeta and other things in common.py should provide me with the tools for future releases. Not to mention your GetMetadata functions. This is a treasure for anyone developing a PLEX agent.

  • tramp78tramp78 Members Posts: 11 ✭✭

    First off, I can't thank you enough for this. I even signed up for Reddit just to thank you THERE. Bravo, Kudos, seriously, geebus this is great.

    I've been going through this all day against a library of about 700 lectures and this is what I have found out so far.

    I don't have urllib/urllib2 and BeautifulSoup and if I do I am unaware of it. Just a standard PLEX server install.
    **It works just fine. **

    The files themselves can be named:
    S01E01 - text
    S01E02 - text
    You don't need the name of the lecture in front of the S0xE0x and once you see below you will be glad.

    You DO need the name of the lecture as the folder name and it needs to be the exact text as it reads in the URL, not the lecture name. For example...

    African Experience from "Lucy" to Mandela - won't work
    African Experience from Lucy to Mandela - won't work
    African Experience from quot Lucy quot to Mandela - works just fine. This is how the lecture is written out in the URL.

    http://www.thegreatcourses.com/courses/african-experience-from-quot-lucy-quot-to-mandela.html

    Other examples include:
    Experiencing America A Smithsonian Tour through American History - won't work despite the fact that it is the title
    Experiencing America A Smithsonian Tour through History - does work because that is what the URL is.

    Some lectures have words like "World's" spelled "worlds" in the URL. Remote the apostrophe and it works.
    Upper and lower case words don't seem to matter. But if you put all lower case as the URL says, you will get a lower case name in PLEX and will need to either change the folder name or edit the metadata title. I prefer the folder. Do it once and that's it.

    Best suggestion is to give it a go and see what it can't find. They will fall into four types.

    1. Audio lectures. If you any OLD lectures that either came out before TTC started producing video versions (yeah, I've been collecting them that long) or if you have any audio versions because you haven't been able to track down the video versions, you will have to load these in as Music and add your own art and descriptions for now until bubonic314 in his amazing and generous wisdom decides to do a version for music as well. Not that big a need since a LOT of the audio only lectures no longer exist.
    2. Discontinued Video lectures. If you have any of the old VHS based lectures that TTC discontinued years ago, you will have to add your own art and metadata.
    3. Single lecture demos. TTC sends out demos from time to time to judge the marketablility of a topic and the skill of the professor. Some eventually become real lectures. Most don't. Same as above you will have to come up with your own art and metadata for these. Or don't include them. I don't honestly know why I do.
    4. Stuff that has the wrong name for the lecture title in the folder. I have yet to find a lecture that it won't locate as long as you give it the specific text of the title in the URL. Really.

    If you have recent stuff - stuff that is all currently for sale and on their website - you can get it loaded up just fine if you do as I did above.

    Again, thanks to bubonic314, whoever you are. So far I have loaded in history, art, music, better living and and working on literature/language and it is going just fine. That's 357 lectures so far and all's well.

  • ZeroQIZeroQI Members Posts: 1,157 ✭✭✭

    @bubonic314 it will replace my normal agent code on github soon. I went overkill but didn't want to update field unless needed and am coding a log per series in agent data folder now that include scanner logs... I loved the dict functions can don't throw errors if the fields don't exist... Thanks for the kind words, I really appreciate that as a coder... It took ages to get it there to this point, but somebody could just create a module file and modify slightly the init.py to get started and have great logs :D

  • bubonic314bubonic314 Members Posts: 5 ✭✭

    Updated Code

    I have updated TGC.bundle in the git repository. The new update includes:

    • Course description is now the full course description found on TGC website and formatted accordingly.
    • Fuzzy lecture course matching via the SearchCourse() method when urllib receive a 404 Not found from initial course URL formatting.

    As mentioned by @tramp78 if you name your directories (or files) specific to what shows up in the course URL, it will always match the lectures exactly and pull all relevant data for that course. However, I like the presentation of the course names found on the TGC website to show up on PLEX, which contains commas, colons and quotes and the like. The new update allows for you to name your directories or files with these course names as it will filter out the unnecessary data and find a match to the course URL on TGC website.

    My new SearchCourse() method allows for fuzzy matching of courses. For an example consider the course:

    Conquest of the Americas

    If your directory or lecture files is named, for whatever reason, Conquest of America S01E01.mkv the following logs show what the new method does:

    2017-05-12 20:42:53,373 (7f1f89ffb700) : INFO (__init__:386) - metadata.title: Conquest of America 2017-05-12 20:42:53,373 (7f1f89ffb700) : INFO (__init__:398) - update() CourseURL: http://www.thegreatcourses.com/courses/conquest-of-america.html 2017-05-12 20:42:54,420 (7f1f89ffb700) : INFO (__init__:417) - courseURL not found... Searching for related courses: Conquest of America 2017-05-12 20:43:00,253 (7f1f89ffb700) : INFO (__init__:324) - Title: 1066: The Year That Changed Everything 2017-05-12 20:43:00,254 (7f1f89ffb700) : INFO (__init__:324) - Title: Conquest of the Americas 2017-05-12 20:43:00,254 (7f1f89ffb700) : INFO (__init__:327) - Match found for: Conquest of America 2017-05-12 20:43:00,254 (7f1f89ffb700) : INFO (__init__:328) - Title found is: Conquest of the Americas 2017-05-12 20:43:00,255 (7f1f89ffb700) : INFO (__init__:329) - Link is: http://www.thegreatcourses.com/courses/conquest-of-the-americas.html 2017-05-12 20:43:00,279 (7f1f89ffb700) : INFO (__init__:334) - Finding best match... 2017-05-12 20:43:00,279 (7f1f89ffb700) : INFO (__init__:337) - Span length for is: 23 2017-05-12 20:43:00,279 (7f1f89ffb700) : INFO (__init__:342) - CourseTitle is: Conquest of the Americas 2017-05-12 20:43:00,280 (7f1f89ffb700) : INFO (__init__:343) - CourseURL is: http://www.thegreatcourses.com/courses/conquest-of-the-americas.html

    Or in an example by @tramp78 something that contains a apostrophe (and is dear to me >:) ), i.e.,

    The Black Death: The World's Most Devastating Plague S01e17 Plague Saints And Popular Religion.m4v

    The course URL is: http://www.thegreatcourses.com/courses/the-black-death-the-worlds-most-devastating-plague.html
    A lot of times the course URL will place World's as world-s- and my formatting usually edits it that way. Let's see what the log looks like:

    2017-05-13 00:51:59,642 (7f7f93fff700) : INFO (__init__:386) - metadata.title: The Black Death: The World's Most Devastating Plague 2017-05-13 00:51:59,642 (7f7f93fff700) : INFO (__init__:398) - update() CourseURL: http://www.thegreatcourses.com/courses/the-black-death-the-world-s-most-devastating-plague.html 2017-05-13 00:52:02,322 (7f7f93fff700) : INFO (__init__:417) - courseURL not found... Searching for related courses: The Black Death: The World's Most Devastating Plague 2017-05-13 00:52:13,928 (7f7f93fff700) : INFO (__init__:320) - Locating search results... 2017-05-13 00:52:13,928 (7f7f93fff700) : INFO (__init__:324) - Title: The Black Death: The World's Most Devastating Plague 2017-05-13 00:52:13,928 (7f7f93fff700) : INFO (__init__:327) - Match found for: The Black Death: The World's Most Devastating Plague 2017-05-13 00:52:13,929 (7f7f93fff700) : INFO (__init__:328) - Title found is: The Black Death: The World's Most Devastating Plague 2017-05-13 00:52:13,929 (7f7f93fff700) : INFO (__init__:329) - Link is: http://www.thegreatcourses.com/courses/the-black-death-the-worlds-most-devastating-plague.html 2017-05-13 00:52:13,929 (7f7f93fff700) : INFO (__init__:324) - Title: An Economic History of the World since 1400 2017-05-13 00:52:13,930 (7f7f93fff700) : INFO (__init__:324) - Title: (Set) The Black Death & Late Middle Ages 2017-05-13 00:52:13,930 (7f7f93fff700) : INFO (__init__:324) - Title: (Set) The Black Death & Medieval World 2017-05-13 00:52:13,931 (7f7f93fff700) : INFO (__init__:324) - Title: (Set) The Black Death & The Guide to Essential Italy 2017-05-13 00:52:13,931 (7f7f93fff700) : INFO (__init__:324) - Title: (Set) The Black Death & The Great Tours: Experiencing Medieval Europe 2017-05-13 00:52:13,931 (7f7f93fff700) : INFO (__init__:324) - Title: (Set) The Black Death & Story of Medieval England 2017-05-13 00:52:13,932 (7f7f93fff700) : INFO (__init__:334) - Finding best match... 2017-05-13 00:52:13,932 (7f7f93fff700) : INFO (__init__:337) - Span length for is: 52 2017-05-13 00:52:13,932 (7f7f93fff700) : INFO (__init__:342) - CourseTitle is: The Black Death: The World's Most Devastating Plague 2017-05-13 00:52:13,932 (7f7f93fff700) : INFO (__init__:343) - CourseURL is: http://www.thegreatcourses.com/courses/the-black-death-the-worlds-most-devastating-plague.html

    As you can see it found the correct URL.

    In short, you don't have to follow the naming scheme of the course URL anymore, TGC.bundle now does all that based off of the Course Lecture name. Thanks to @tramp78 for recognizing that if you follow the URL naming scheme all is well.. that is the full proof way of making sure this works; but, it is nice to have the formatting of the full course lecture name as your "Show" name in PLEX.

    Thanks to @tramp78 and @ZeroQI for their input and help.

    Download

    TGC.bundle

  • tramp78tramp78 Members Posts: 11 ✭✭

    Seems to work fine. Analyze each library to re-download the new formatted Course Descriptions. Haven't tried changing the folder names but what I have done is leave the folder names the same as the URL and edit the name in PLEX and lock it. Kind of defeats the purpose of having your agent do all the work but over time I'm sure it will all get worked out. Any way to locate backgrounds and banner art? Not sure where on the website that it.

  • tramp78tramp78 Members Posts: 11 ✭✭

    Strange thing. Some lectures pick up the formating on the course description and some just refuse to do so. Specifically:
    Classical Mythology
    Books That Have Made History Books That Can Change Your Life

    There may be more but I just can't get these guys to update.

  • bubonic314bubonic314 Members Posts: 5 ✭✭

    @tramp78 It does pull background and banner art. The Agent uses BeautifulSoup to pull those links into the metadata object. If you're not seeing art, it could be that you don't have BeautifulSoup installed. Here is a snippet of the code:

        Log("Downloading Art")
       @parallelize
       def DownloadArt(html=html):
           Log("DownloadArt()")
           art = [ ]
           Art = { }
           soup = BeautifulSoup(html)
             for link in soup.findAll("a", "cloud-zoom-gallery lightbox-group"):
               art.append(link.get('href'))
           Art['fanart'] = art[0]
           Art['poster'] = art[1]
           Log("Fanart URL: %s" % Art['fanart'])
           Log("Poster URL: %s" % Art['poster'])
           if Art['poster'] not in metadata.posters:
    

    and here is a screenshot from my server:

  • tramp78tramp78 Members Posts: 11 ✭✭

    My bad, I do see the banner and background stuff show up. It pops up between the time you hit play and the video starts.

Sign In or Register to comment.