New series scanners

Matthew_Wilkes · November 30, 2010, 9:19pm

Hi all,

A while ago I wrote a new series scanner (I posted it here before this subforum opened) and I’d like some feedback from people if possible.

I was very frightened by the default scanners, they were quite densely coded and the regexes made them hard to read. In addition, they are marked as all rights reserved, so didn’t want to step on anyone’s toes by hacking them around.

I’ve made the basis of a new series scanner, with the following properties:

Lightning fast
Can scan dump directories

Makes mistakes with titles much more often
Pickier about naming formats, but handle everything I have

I’d be interested in hearing from people if they like this scanner, and if so, what mistakes it makes on their files. I’m also working on a scanner for sport (as the ashes is on ;)) so if anyone’s interested in collaborating on a metadata provider for that I’d be interested.

The file is at: http://dl.dropbox.com/u/10004672/WilkesTV.py

I hope it would be a good base for people writing their own scanners, too, I think it’s clearer than the ones shipped by default.

elan · December 2, 2010, 9:49am

Matthew, very nicely done, it’s awesome to see people taking advantage of the new architecture like this!

You are permitted to fix/hack/enhance the shipping scanners, just to be clear.

Matthew_Wilkes · December 2, 2010, 8:56pm

Thanks elan,

The main motivation wasn’t the legal issue (I was about 95% sure you’d say ‘do what you like’ anyway), but more that the hacks I’d seen around for the scanners to handle dump directories were very slow for me as my unsorted-in directory has lots and lots of stuff in it, so the default scanner with a mod to allow top-level episodes was taking about 20 minutes to scan it. This does it in a few seconds.

The main speedup is the SystemFilter decorator, which throws out subdirs based on a system call to find. This lets me find all directories that have something that looks like a media file in, and throw away those subtrees that don’t. That’s much faster than iterating through them.

Writing this did give me some thoughts on the API for scanners though, especially in comparison to WSGI that has a similar pipe-line process.

It would be nice to just be able to return an iterable (i.e. also support generators) rather than relying on the fact lists are mutable
I’d like to be given control of the process earlier than currently, i.e. just called with a path, without files and subdirs

The current behaviour would look something like this under that model:


<br />
def wrapNormalScanner(path):<br />
    output = []<br />
    subdirs = getSubDirs(path) <br />
    NormalScanner.Scan(path, filesInFolder(path), output, subdirs)<br />
    found = iter(output)<br />
    while True:   <br />
        found = itertools.chain(found, wrapNormalScanner(subdirs.pop()))<br />
    return found<br />

Pseudo-python, obviously, but it shows how the API design goes to some lengths to provide all the information to the scanners instead of trusting us to go find it ourselves. The big win over the real design is that things like VideoFiles.Scan can be implemented with a simple filter callable rather than in-place manipulation of the lists.

Not very relevant, I know, as you've already settled on a format, but thought I'd give you my feedback from using it as an outsider to the devteam.

friedflix · February 4, 2011, 2:15am

Matthew,

From which site does this ‘scan’, is this an alternative to TheTVDB?

EDIT: Guess i have the terminology wrong…I am guessing i would need to create a different Metadata Agent if i wanted the scanner to work for TVRage…

ube · October 29, 2011, 5:09pm

Did a small update to the scanner, it’s gone from 12+ min to 2-3 min to scan my library. The main problem with the old scanner was that it did a "find | grep " for each path and sub directory. With my alteration it just searches the current path for files, not it’s sub directories.


    extensions = "|".join(".*"+ext+"$" for ext in VideoFiles.video_exts)<br />
    command = """find \"%s\" -maxdepth 1 -type f -iregex \"%s\"""" % (os.path.split(base)[0], extensions)<br />

Nothing fancy or anything, but it does it's job.

alon.albert · November 15, 2011, 3:47am

Have you ever completed this sports scanner?

system · December 20, 2019, 8:53pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Generalize meta data scanner, Seasons, DVD, AirDate, Categories Dev/API Corner other-dev	11	123	January 8, 2020
Where is the howto write a scanner document? Dev/API Corner scanner-agent-dev	21	1230	January 8, 2020
Trying to get a very basic Scanner/MetaData-Agent setup to work Dev/API Corner other-dev	9	183	December 20, 2019
Testing some scanners Dev/API Corner plugin-dev	13	111	December 20, 2019
Custom Scanner - TV, Movie, Series ... Dev/API Corner scanner-agent-dev	6	173	December 21, 2019

New series scanners

Related topics