[REL] TGC (The Great Courses) Metadata Agent

scanner-agent-dev

#10

Seems to work fine. Analyze each library to re-download the new formatted Course Descriptions. Haven't tried changing the folder names but what I have done is leave the folder names the same as the URL and edit the name in PLEX and lock it. Kind of defeats the purpose of having your agent do all the work but over time I'm sure it will all get worked out. Any way to locate backgrounds and banner art? Not sure where on the website that it.


#11

Strange thing. Some lectures pick up the formating on the course description and some just refuse to do so. Specifically:
Classical Mythology
Books That Have Made History Books That Can Change Your Life

There may be more but I just can't get these guys to update.


#12

@tramp78 It does pull background and banner art. The Agent uses BeautifulSoup to pull those links into the metadata object. If you're not seeing art, it could be that you don't have BeautifulSoup installed. Here is a snippet of the code:

    Log("Downloading Art")
   @parallelize
   def DownloadArt(html=html):
       Log("DownloadArt()")
       art = [ ]
       Art = { }
       soup = BeautifulSoup(html)
         for link in soup.findAll("a", "cloud-zoom-gallery lightbox-group"):
           art.append(link.get('href'))
       Art['fanart'] = art[0]
       Art['poster'] = art[1]
       Log("Fanart URL: %s" % Art['fanart'])
       Log("Poster URL: %s" % Art['poster'])
       if Art['poster'] not in metadata.posters:

and here is a screenshot from my server:


#13

My bad, I do see the banner and background stuff show up. It pops up between the time you hit play and the video starts.


#14

New Update

The TGCAgent now pulls the Professor/Lecturer photo into the cast section of the series:

Also, I have added, at the suggestion of a user on GIT, a method for using the Course Number found on the TGC website. This has to be added to the file naming scheme in the following manner:

The Black Death: The World's Most Devastating Plague (TGC8241) S01E01.mp4

Even in the poorest naming of the files (or directories), if the TGC#### is included, it should always find an exact match of the course now. This should resolve any discrepancies of courses not being matched. Here is an example where the file names are not quite the name of the course:

And the Log Results:

2017-06-10 14:18:15,054 (7f6d32e7d700) : INFO (init:325) - Title: (Set) How to Listen to and Understand Great Music, 3rd Edition & Concerto
2017-06-10 14:18:15,054 (7f6d32e7d700) : INFO (init:325) - Title: (Set) How to Listen to and Understand Great Music, 3rd Edition; The Symphony & The Concerto
2017-06-10 14:18:15,054 (7f6d32e7d700) : INFO (init:325) - Title: (Set) How to Listen to Great Music & Great Masters: Mozart
2017-06-10 14:18:15,055 (7f6d32e7d700) : INFO (init:325) - Title: (Set) 30 Greatest Orchestral Works & Great Masters: Mozart
2017-06-10 14:18:15,055 (7f6d32e7d700) : INFO (init:325) - Title: (Set) Best of Robert Greenberg
2017-06-10 14:18:15,055 (7f6d32e7d700) : INFO (init:325) - Title: (Set) The Everyday Guide to Wine & How to Listen to and Understand Great Music
2017-06-10 14:18:15,056 (7f6d32e7d700) : INFO (init:335) - Finding best match...
2017-06-10 14:18:15,056 (7f6d32e7d700) : INFO (init:338) - Span length for is: 39
2017-06-10 14:18:15,056 (7f6d32e7d700) : INFO (init:345) - CourseTitle is: Great Masters: Haydn-His Life and Music
2017-06-10 14:18:15,061 (7f6d32e7d700) : INFO (init:346) - CourseURL is: http://www.thegreatcourses.com/courses/great-masters-haydn-his-life-and-music.html
2017-06-10 14:18:20,483 (7f6d32e7d700) : INFO (init:357) - Course Number Search: 751
2017-06-10 14:18:20,483 (7f6d32e7d700) : INFO (init:358) - Course Number Found: 751
2017-06-10 14:18:20,483 (7f6d32e7d700) : INFO (init:452) - Course found, URL: http://www.thegreatcourses.com/courses/great-masters-haydn-his-life-and-music.html
2017-06-10 14:18:23,667 (7f6d32e7d700) : INFO (init:462) - Adding metadata summary
2017-06-10 14:18:23,668 (7f6d32e7d700) : INFO (init:467) - Calling MyLDESCParser()

Enjoy!
-bubonic

TGC.bundle


#15

life just keeps getting better and better. Any chance you can port this to Music so I can aim it at the audio lectures? If it will take too long, don't bother. I'm not sure how many audio lectures I have that are still on theri website so I'm not sure how much good it would do.


#16

The goal is to eventually get the audio lectures supported too. As of right now, I'm not sure how to get the agent to support multiple types of media. It might be easier to create a separate Agent for the audio lectures; but that could actually turn out to be more work than what is needed. Just stay tuned for updates and hopefully one day you'll see it support audio lectures as well.

Thanks for your ongoing interest.


#17

I agree it might not be worth the time. I think it would take a second agent and I'm not sure how many of the audio lectures I have are still around as audio only lectures. I might have to check some day. But until then, thanks again for what yiou have done.


#18

Thought I would let you know of a few things that I'm seeing. Two lectures keep wanting to be labeles as a set that they are part of instead of themselves. I can't see a need to ever label a lecture as a set. The two are
The Long 19th Century European History from 1789 to 1917
History of the Bible: The Making of the New Testament Canon

I've tried putting (TGCXXXX) in the folder name & the episode name
I've done the rename the folder to the text in the URL

But it seems to see the Set description which throws off the art and descriptions.

Just an FYI since you are such a wizard as keeping this thing up to date.


#19

You can add "Apocalypse Controversies and Meaning in Western History" to the courses that pull up as a set. Specifically "(Set) Apocalypse: Controversies and Meaning in Western History & History of Christian Theology" Not sure how to make this see the right one.


#20

Sorry it's been a while since I checked this. I was just thinking today that I should remove the (Set) listings from the search result as my algorithm might match the set instead of the individual course. I've been a little busy with some other projects, but I should be able to get to this tomorrow. I'll let you know when it's all updated and ready to go.


#21

Update

There have been some changes to TGC.bundle:

  • Excludes (Sets) from matching when searching for course.
  • Adds a rating value for the course extracted from the course website.
  • Adds roles for multiple lectures. i.e., Professor 2, Professor 3, etc.
  • Adds metadata studio as TGC

Screenshots

TODO

Well, the thing that has been eating up my night has been adding genres to the courses. This has proved to be rather elusive based on the primary_subject in the courses webpage. Courses are filed under multiple genres and I've been trying to decode the primary_subject code to what the category it is. There is a category section in the course web pages, but unfortunately it's not populated unless the referrer page is from a search or category listing. So far this is what I think I've decoded, but I'm not entirely sure and the code is not yet complete:

product_category = {'901' : "Economics & Finance", '902' : 'High School', '904' : 'Fine-Arts', '905' : 'Literature & Language', '907' : 'Philosophy, Intellectual History','909' : 'Religion', '910' : 'Mathematics', '918' : 'History', '926': 'Science', '927': 'Better Living' }

Based on what I've seen so far, this list might be pretty accurate. Of course, there is the brute force way of pulling every course and the respective primary_subject code, which is in the course web page, that it has and attributing the categories respectively. This might be the only option to get an accurate genre. Anyway, it's on the the TODO list.

Enjoy!
-bubonic

TGC.bundle


#22

VERY cool. Yup, that fixed the Set issue.
Regarding the Audio lectures, I looked through my library and compared it to the website and it looks like there are only 40 courses that are still for sale that are audio only. Not that many. Here is the list if you are interested. LOL Not sure this is worth it compared to other things you are working on.

1066 (The Conquest)
20th Century American Fiction
36 Big Ideas
A Day’s Read
Abraham Lincoln: In His Own Words
Aeneid Of Virgil
American Religion History
Americas in the Revolutionary Era
Business Law - Contracts
Business Law - Negligence and Torts
China, India, and the United States: The Future of Economic Supremacy
Espionage And Covert Operations: A Global History
Ethics of Aristotle
European Thought and Culture in the 20th Century
Explaining Social Deviance
Exploring Metaphysics
Francis of Assisi
History of the US Economy in the 20th Century
How the Crusades Changed History
How to Read and Understand Poetry
Language A to Z
Legacies of Great Economists
Life and Writings of C. S. Lewis
Life and Writings of Geoffrey Chaucer
Literary Modernism
Modern British Drama
Moral Decision Making: How to Approach Everyday Ethics
Plato, Socrates and the Dialogues
Practical Philosophy: The Greco-Roman Moralists
Quest for Meaning: Values, Ethics, and the Modern Experience
Rights of Man: Great Thinkers and Great Movements
Skepticism 101: How to Think like a Scientist
The Art of War
The First Amendment and You
The Greatest Controversies of Early Christian History
The Skeptic's Guide To The Great Books
The Soul and the City: Art, Literature, and Urban Living
Turning Points in Medieval History
Understanding Literature and Life
Lives and Works of the English Romantic Poets


#23

Update

(v 0.4.1)

TGC.bundle

A few updates have been added:
* Adds Lecture thumbnails from available courses on the TGC+ website
* Cleaned up some of the code

We have a new addition. For courses found on the TGC+ (https://www.thegreatcoursesplus.com/) website, the TGC Agent now pulls the Lecture thumbnails from the respective course webpage on TGC+. This was a suggestion from a friendly user of github.

Screenshot

It should find most of the courses from TGC that have a counterpart TGC+ course site. Not all courses are available on TGC+. There might be a few straggling courses that are on TGC+ that TGC.bundle doesn't quite pull, so if you find any, PLEASE LET ME KNOW and I'll make adjustments to the code to find the TGC+ course page. Unfortunately I had to use a brute force method to find the courses on TGC+ because the search results are all in javascript and TGC.bundle primarily parses HTML.

TODO

  • Make course description identical to TGC course site. (lots of html parsing! - half way done)
  • Add genres.
  • Clean up some of the code.
  • Add more try/except error checking.
  • Add checks for existing metadata so it's not updating the metadata every time.
  • Make the code less demanding on the PLEX server.
  • Add compatibility for audio lectures.

Download

TGC.bundle

Enjoy!
-bubonic


#24

Quick question on your latest TTC plug in. I'm trying to get the episode thumbnails to work and I need to know if there is a naming convention. Most of my lectures are named
S01E01 - name
S01E02 - name
And none of those actually get the thumbs.
But I just added one that had the following names

Tgc course# - S01E01 - name
Tgc course# - S01E02 - name
And that one did. Is the key putting in the "TGC course number" before the episode numbers? IF so I guess I may need to track all those down somewhere.....


#25

Followup - just adding those values didn't do it after a rescan. But if you unmatch it and rematch it (or just fix match) it works regardless of whether you put the course number in.


#26

File Renamer

I have written a little script to rename the TGC course files, gathering data from the directory name of the course (usually contains part or most of the TGC course name), and matching it with a course from a database file. It will rename all the multimedia files to appropriate names for the TGC.bundle PLEX Agent to use in collecting metadata for the course.

THIS IS FOR LINUX ONLY.

About

If you're like me and working on a headless machine with a PLEX server installed, you've probably had to do a bunch of commands like:

for i in *; do echo "FILE: $i" && mv "$i" "echo "$i" | sed 's/.*lecture-//g'"; done

and again and again until you got the TGC course video files named corretly. Well, I got tired of doing this so I
wrote a shell and perl script to do all the work for us.

These three files:

  • AllTGC_TItlesandCNonly.csv - All TGC course names and numbers separated by '['.
  • levenstein.pl - A fuzzy matching algorithm written in PERL.
  • TGCrename.sh - The brain (or heart) of it all.

will use the data in the directory name to find the best course match and will rename all the multimedia files
to an appropriate name that the TGC.bundle Agent can then use to collect metadata.

Usage

First, copy all files in this repository to the course directory:

cp AllTGC_TItlesandCNonly.csv levenstein.pl TGCrename.sh directory_of_course_containing_lecture_files

Then cd to the course directory and run

./TGCrename [option]

There are two options that can be used:

-y , Trust TGCrename to automatically rename all the files, but will prompt for acceptance of matching course.

-yy , Fully trust TGCrename to do everything for you. i.e., it finds the correct course and names all the files correctly.

If you just run:

./TGCrename

within the course directly, it will prompt for acceptance of course name and will prompt for acceptance to rename each file.

If you run:

./TGCrename -y

It will prompt for acceptance of the course name and will use that data to automatcially rename the files.

if you run:

./TGCrename -yy

It won't prompt you for any acceptance and will use the data collected to automatically rename the files. This means you are fully trusting the program. (It tends to work almost all the time, but it's up to you what level of trust you would like).

Download

TGCrename

Companion Files

I found a list of all the courses with some data included about the courses on reddit. It took sometime, but I was able to convert this to pdf, csv, ods, and xlsx. It can be useful for researching the data and finding basic course information. All the files can be found here:

All The Great Courses Lists


#27

Note - Introduction to Archaeology (TGC193) is now Out of Print.

Also, I have a large collection of sample and or audition lectures. Do you want a list of those? For example these were one off lectures:
Olympics From Ancient Greece to Athens 2004
Christmas Traditions in Victorian Britain and America
Fiction In The Da Vinci Code
Papal Elections
Einstein’s 100th 'Anniversary'
St. Patrick- The Patron Saint of Ireland
Holiday Music

These were lectures that were sent out due to current events in most cases.

There were also a bunch of lectures that were sent out as auditions for lectures that were never made. Stuff like:

Book of Ruth
The Religion of our Founding Fathers
The Science of Memory
Voting - Determining The Will Of The People
Great Leaders - Abraham Lincoln And Winston Churchill


#28

@tramp78 I would love a list of those, anything to complete my collection. I wasn't even aware of those lectures. It'd be cool to add them too. Thanks!

On another note, the TGCrename doesn't always find the right match based on the directory name, but most of the time it does. One thing to be aware of is that the database that TGCrename uses still has unicode characters in there and the agent doesn't handle those yet. I'm in the process of removing the unicode characters and fixing some of the titles that are not the exact course name found on the page. But, in using the TGCrename program, I'm finding it saving a lot of time for me. I usually use TGCrename.sh -y just in case.


#29

Here is my list of video auditions. A few of us, who have put together a xbmc nfo agent pack, just threw all the auditions in as one season as follows. I'm still using the nfo solution for out of print materials - but thanks again for all your work on this agent! Also FYI Tramp at least half of those current audio only also have been sold as video - they are just not being sold any more.

Auditions S01E01 Phillip Braun - A Brief Introduction to Portfolio Diversification.avi
Auditions S01E02 Philip Brown - The Chinese Economic Miracle.avi
Auditions S01E03 Stuart Sutherland - The Death of the Dinosaurs.avi
Auditions S01E04 Rick Roderick - The Emancipatory Challenge of Critical Theory.avi
Auditions S01E05 William Fitch - The Evolution of Speech.avi
Auditions S01E06 Paul Pasles - Great Mathematical Ideas - The Polygon and Leonard Euler.avi
Auditions S01E07 William Newman - Isaac Newton and 17th-Century Alchemy.avi
Auditions S01E08 Donald G. Saari - A Mathematical Look at Elections.avi
Auditions S01E09 David Kung - Mathematics and Music.avi
Auditions S01E10 Nathan Harshman - Measuring Gravity.avi
Auditions S01E11 Bruce H. Edwards - Prime Numbers - Pure Math That Is Useful.avi
Auditions S01E12 Patrick Bahls - Prime Numbers and the Mathematics of Cryptography.avi
Auditions S01E13 Grace Spatafora - The Rise and Fall of Antibiotics.avi
Auditions S01E14 Bruce E. Fleury - The Rise of Germ Theory.avi
Auditions S01E15 Kevin Ryan - The Science of Cooking - Secrets of Sugar.avi
Auditions S01E16 Frank Summers - Things that Go Kaboom in the Night.avi
Auditions S01E17 David J. Helfand - Your Universal Insignificance.avi
Auditions S01E18 Lee Branstetter - Will China and India Dominate the 21st-Century Global Economy.m4v