Developing a method for automatic TV Episode name identification

alehel · April 11, 2023, 9:06am

As we all know, ripping TV shows from DVD/Blu-Ray can be a real pain. Especially when said tv show doesn’t display an episode name as part of it’s intro. I’ve therefore been looking into methods of creating automations for this process.

I’ve been looking into the python DejaVu library which uses audio fingerprinting and wondering if this might be a possible solution. Fingerprint the primary audio track of the file and use that to identify season number and episode number of each file using (for example) the first 5 min of each file.

Current issues I see are,

fingerprint file size. It seems one needs quite a lot of storage to store the fingerprints.
sharing fingerprint. One needs some way of sharing the fingerprints easily (which is again difficult due to the size of the fingerprints).

Currently I’m wondering if a possible solution might be that instead of having a single shared fingerprint database one could download a database dump for the show one is ripping. Then the identification process would go quicker as there will be fewer fingerprints to compare against, and the error rate should be much lower as you’ve already identified the show. It shouldn’t be to much of a hassle to have people download 1 file for each show either. Certainly quicker than having to identify each episode manually.

Before I waste any more time on this possible route I was wondering if anyone has experimentet with this idea before? One could host the fingerprints as 1 file per tv show on a github repo. The script would then ask you to identify the show using (for instance) the id from themoviedb, and the script would then download the relevant fingerprints from the github repo. For example, to run identification on a folder of MKVs from the big bang theory blu-rays you would simply run,

python3 identify.py . -id 1418

where id is the moviedb id for the show.

Any thoughts from fellow programmers?

nibbles · June 7, 2023, 2:29pm

Haven’t ripped a Blu-ray, but programmatically I’m thinking of inspecting the meta data on the files and on the disc. If I can’t consistently identify the ep from those, then I would move in the direction you’ve taken. My gut tells me I need 3 key frames near min 5, 10, 20 and if those match, I’ve identified the ep.

Or It might be small and easy enough to sample 30sec of audio instead.

What have you learned so far?

I remember around 2001 a guy at Stanford mentioning to me that he was working on a project of theirs to identify by fingerprint the files being passed around the campus network to stop copyright violation. We got into a passionate argument because from a science point of view it was ridiculous when the whole point of a computer was to store and reproduce exact copies of data and because the internet was going to contain the sum total of all human knowledge. But Napster was making a lot of waves on campus.

So I think the software is out there or has been attempted, right?

ElderEmo · August 7, 2023, 11:47am

As far as a no storage required solution, I’d probably consider using a headless browser to search subtitle text in google, then use an algorithm to find the titles from search and return them. You could also download third party subtitles for each episode and run a comparison against the blu ray subtitles to find matches, timing differences would be irrelevant as you’d be comparing the text content. That’s probably the most practical solution. Otherwise, you’d be looking at really complex imaging/different compression algorithm keyframe or fingerprint comparisons. If rips weren’t timed perfectly, you’d end up having to extract a lot of keyframes to find relative matches, then algorithmically decide which color differences were the result of possible compression. Otherwise you could use machine learning to detect matches, but that seems like a lot of work compared to just matching text.

Topic		Replies	Views
Auto detect TV show episodes by their audio tracks Feature Suggestions	3	101	April 13, 2023
Content-based file matching (Fingerprints) Feature Suggestions library-management	21	1281	October 14, 2024
automatically determining episode number? Dev/API Corner scanner-agent-dev	2	72	December 21, 2019
is there a way to show me my missing tv shows ? Desktops & Laptops server-windows	5	3572	January 8, 2020
TV Shows are not being correctly identified General Discussions	25	575	January 8, 2020

Developing a method for automatic TV Episode name identification

Related topics