New series scanners

Hi all,



A while ago I wrote a new series scanner (I posted it here before this subforum opened) and I’d like some feedback from people if possible.



I was very frightened by the default scanners, they were quite densely coded and the regexes made them hard to read. In addition, they are marked as all rights reserved, so didn’t want to step on anyone’s toes by hacking them around.



I’ve made the basis of a new series scanner, with the following properties:


  • Lightning fast
  • Can scan dump directories
  • Makes mistakes with titles much more often
  • Pickier about naming formats, but handle everything I have



    I’d be interested in hearing from people if they like this scanner, and if so, what mistakes it makes on their files. I’m also working on a scanner for sport (as the ashes is on ;)) so if anyone’s interested in collaborating on a metadata provider for that I’d be interested.



    The file is at: http://dl.dropbox.com/u/10004672/WilkesTV.py



    I hope it would be a good base for people writing their own scanners, too, I think it’s clearer than the ones shipped by default.

Matthew, very nicely done, it’s awesome to see people taking advantage of the new architecture like this!



You are permitted to fix/hack/enhance the shipping scanners, just to be clear.

Thanks elan,



The main motivation wasn’t the legal issue (I was about 95% sure you’d say ‘do what you like’ anyway), but more that the hacks I’d seen around for the scanners to handle dump directories were very slow for me as my unsorted-in directory has lots and lots of stuff in it, so the default scanner with a mod to allow top-level episodes was taking about 20 minutes to scan it. This does it in a few seconds.



The main speedup is the SystemFilter decorator, which throws out subdirs based on a system call to find. This lets me find all directories that have something that looks like a media file in, and throw away those subtrees that don’t. That’s much faster than iterating through them.



Writing this did give me some thoughts on the API for scanners though, especially in comparison to WSGI that has a similar pipe-line process.


  1. It would be nice to just be able to return an iterable (i.e. also support generators) rather than relying on the fact lists are mutable
  2. I’d like to be given control of the process earlier than currently, i.e. just called with a path, without files and subdirs



    The current behaviour would look something like this under that model:



<br />
def wrapNormalScanner(path):<br />
    output = []<br />
    subdirs = getSubDirs(path) <br />
    NormalScanner.Scan(path, filesInFolder(path), output, subdirs)<br />
    found = iter(output)<br />
    while True:   <br />
        found = itertools.chain(found, wrapNormalScanner(subdirs.pop()))<br />
    return found<br />




Pseudo-python, obviously, but it shows how the API design goes to some lengths to provide all the information to the scanners instead of trusting us to go find it ourselves. The big win over the real design is that things like VideoFiles.Scan can be implemented with a simple filter callable rather than in-place manipulation of the lists.

Not very relevant, I know, as you've already settled on a format, but thought I'd give you my feedback from using it as an outsider to the devteam.

Matthew,

From which site does this ‘scan’, is this an alternative to TheTVDB?



EDIT: Guess i have the terminology wrong…I am guessing i would need to create a different Metadata Agent if i wanted the scanner to work for TVRage…

Did a small update to the scanner, it’s gone from 12+ min to 2-3 min to scan my library. The main problem with the old scanner was that it did a "find | grep " for each path and sub directory. With my alteration it just searches the current path for files, not it’s sub directories.



    extensions = "|".join(".*"+ext+"$" for ext in VideoFiles.video_exts)<br />
    command = """find \"%s\" -maxdepth 1 -type f -iregex \"%s\"""" % (os.path.split(base)[0], extensions)<br />




Nothing fancy or anything, but it does it's job.

Have you ever completed this sports scanner?

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.