Absolute Series Scanner (for anime mainly)

This is just a wild guess but the reason that season 2 gets picked is because it is at the top of the list of possible options. When I do a fix incorrect match the Shinryaku!? Ika Musume is at the top and just underneath is Shinryaku! Ika Musume. If it picks the first option to grab the metadata for then it would always pick the second season. Does this make sense?

If it picked the wrong serie, the automatic matching would be wrong but then the title would be wrong and it is not.

For proof, in hama logs, the serie is labelled correctly.

2014-07-13 19:30:45,773 (80636e400) :  DEBUG (__init__:596) - parseAniDBXml - AniDB title need no change: 'Shinryaku!? Ika Musume' original title: 'Shinryaku!? Ika Musume' metadata.title 'Shinryaku!? Ika Musume'

The only reasons to cumulate would be the same metadata id due to buggy agent previously that mapped the other serie id, and library needs re-creating to solve. also, plex could be removing the question mark but somehow i doubt it

if there is too many 100% matches, it could cause problems. i need to lower 100% to 99% for making sure the exact match (and not exact match after cleaning title) gets picked. will reproduce and see if i am impacted

[EDIT] you are right, both point to the new serie. it was my agent that despite gave the new serie name to the old serie, while i thought it was the other way around....

I did recreate the library and when I upgraded I deleted the existing files and logs show the old settings wouldn't be messing it up. When I do a Fix incorrect match on the first season it lists both seasons with a 100 next to them. so I bet it is the multiple 100% messing it up. If I have time I can try looking through the hama code to see if I can think of anything that might help.

Jesus you are too quick to reply :P

Exact match 100%, exact match after cleaning 99%. ta da!!!

Solved that meaning series with a similar sequels will not be buggy, and sovled a crash on "AttributeError: _strptime" on first call. Attached new version with few

Hehe this time huh  :P.

Yay, it did indeed grab the Squid show correctly. Now I will have it rescan in my entire anime library and I will post what happens when it is finished.

Slightly revised version

There are now several shows that the agent doesn't grab the metadata for. 

Elfen Lied    41

Excel Saga     29

Kishin Heidan    31

Nekomonogatari (Kuro)    75

Samurai 7    34

I HAMA doesn't make a log when I scan the test folder so I can only post the other two logs. the numbers on the right of the name is what plex says do a fix incorrect match on them. I will look at it more later.

Did you use the "Slightly revised version" or the other one?

I can see something scanned in scanner log but not in the custom log :/

can't do much without logs... force refresh and it should create hama logs. there must be then an error in it...

Yeah I used the slightly revised version. I will mess with it more when I get home from work.

Okay, I deleted the test library and added it again and it seems to have made the logs this time. I don't know why it wouldn't yesterday. Here they are

when it crashes at times, it tend not to create logs...

In the meantime i searched your full log posted days ago and recreated dodgy folders, and could reproduce...

I managed to find a major flow design issue (picked the first match of a serie, even if incomplete, despite many titles) that made matches with no space better than the exact same title...

All maps except "Kishin Corps" as it match only partially one title, since it is not an official, or even short or synonym http://anidb.net/perl-bin/animedb.pl?show=anime&aid=1655

SearchByName: Local exact search - Strict / Contained in match 'Kishin Corps' matched title: 'Geo-Armor: Kishin Corps' with aid: '1655' score: '52'

now the scores will look logical on a scale of 0 to 100 so it should be plenty accurate

Okay I renamed Kishin Heidan its fine now. It still doesn't grab Nekomonogatari (Kuro) which is the name in the anidb and Ika Musume combines again. here are the logs like always.

 

Let's see the anime-titles.xml from anidb: (title database) for the second serie


ikamusume2
Návrat Sépijky!
Shinryaku!? Ika Musume
Shinryaku! Ika Musume 2
Shinryaku! Ika Musume II
Squid Girl 2
侵略!? イカ娘
侵略!?イカ娘

Hama agent logs show the first serie are taken for the second one

Title: 'Shinryaku! Ika Musume'
match 'Shinryaku! Ika Musume' matched title: 'Shinryaku! Ika Musume' with aid: '7486' score: '100'
match 'Shinryaku! Ika Musume' matched title: 'shinryaku!ikamusume' with aid: '8294' score: '100'
getMainTitle - LANGUAGE titles: ['Shinryaku!? Ika Musume', 'Shinryaku!? Ika Musume', '', '']

Agent issue came back after re-coding the search function... So the code now:

   ### Local exact search ###
    elements      = list(AniDB_title_tree.iterdescendants())
    cleansedTitle = self.cleanse_title (origTitle)
    Log.Debug( "SearchByName - exact search - checking title: " + repr(origTitle) )
    match = [ None, None, 0 ]
    for element in elements:
      if element.get('aid'):
        if match[2]: #only when match found and it skipped to next serie in file, then add
          Log.Debug("SearchByName: Local exact search - Strict / Contained in match '%s' matched title: '%s' with aid: '%s' score: '%d'" % (origTitle, match[1], aid, match[2]))
          langTitle, mainTitle = self.getMainTitle(match[0], SERIE_LANGUAGE_PRIORITY)
          results.Append(MetadataSearchResult(id=aid, name=langTitle, year=None, lang=Locale.Language.English, score=match[2]))
          match = [ None, None, 0 ]
        aid = element.get('aid')
      elif element.get('type') in ('main', 'official', 'syn', 'short'): #element.get('{http://www.w3.org/XML/1998/namespace}lang') in SERIE_LANGUAGE_PRIORITY or element.get('type') == 'main'):
        show = element.text.encode('utf-8')
        if    origTitle     == show:                                        match = [element.getparent(), show, 100]
        elif  cleansedTitle == self.cleanse_title (show) and 99 > match[2]: match = [element.getparent(), cleansedTitle, 100]
        elif  origTitle in element.text and 99 > match[2]:                  match = [element.getparent(), element.text, 100*len(origTitle)/len(element.text)]
        else:  continue #no match 
    if match[2]: #last serie detected
      Log.Debug("SearchByName: Local exact search - Strict / Contained in match '%s' matched title: '%s' with aid: '%s' score: '%d'" % (origTitle, match[1], aid, match[2]))
      langTitle, mainTitle = self.getMainTitle(match[0], SERIE_LANGUAGE_PRIORITY)
      results.Append(MetadataSearchResult(id=aid, name=langTitle, year=None, lang=Locale.Language.English, score=match[2]))
    if len(results)>=1:
      Log.Debug("=== searchByName - End - =================================================================================================")
      if LOCK is not None: LOCK.release()
      return

The strip function did strip the '?' so the title match...

"elif cleansedTitle == self.cleanse_title (show) and 99 > match[2]: match = [element.getparent(), cleansedTitle, 100]" should be a 99

I can either code a proper percentage or since only a couple characters will differ says it's a 99 instead of 100... good enough unless a proper match is there... will go for the 99 for now. Will upload proper archive when i come home and edit this post

Ika Musume is fixed

Nekomonogatari (Kuro) poses a problem to me, as i remove all in parenthesis non year and brackets. so either catch everything wrongly in parenthesis for a couple series, or for couple sereis we have to custom match...

Everything scanned in correctly except of course Nekomonogatari. It is still great to grab 329 out of 330 correctly and only have to apply a quick fix to one show.

Is there anything else you wanted me to do for you?

That should be it. Thanks a lot, as you can see, not many people there, and it allowed me to at least make it work for your squicky (spelling unknown) clean sorting... Hama, a proven 99.69% efficacity on your data :D

I need to find what to work on now: (feel free to suggest)

To Do

=====

   . endings need to recover title from anidb

   . threading support to improve

   . download new serie xml only when present episode is present but not in xml

   . scanner: logs working with no change (need custom logs from many people)

   . make cache really optional

possibilities:

==========

better collection support: for example if a serie is in a folder more than usual, make that the collection name

Download through rss but how to select resolution and group ???

   . NZB: http://fanzub.com/rs...0p horriblesubs

   . Torrent: Nyaa.se: http://www.nyaa.se/?...orriblesubs 720

   . Torrent animesuki: http://animesuki.rut...s.php?style=alt (can't give proper example, blocked at work)

Optional season like tvdb but keeping absolute numbering ???

Please let me know of possible improvements...

Squeaky clean is the spelling. Threading I don't know much about. I know that sometimes that new episodes are not always in the xml file so you might need to make the title something like "Episode XX" temporarily. Sometimes the title for a brand new episode isn't on the anidb and it just has an Episode XX listed in the series xml for a few days. it would be neat if the scanner could go back and get the updated title for those new episodes.

An rss downloader would be cool but that is probably seperate from the scanner. if you did make one as a seperate program it would probably be best to download the .torrent files to the watch folder of whatever torrent client the user has. look at I really like utorrents interface for managing rss downloads I think it would be good to look at for ideas about how to make on.

You can download torrents through rss through rtorrent/rutorrent or even transmission & flexget currently. rutorrent is easier than flexget because it has a web interface you can configure the rss scanner from rather than a complex config file like flexget.

I don't know about keeping absolute numbering and organizing them into seasons, having both combined seems difficult. What I think would be neat is if the scanner detected a season folder structure for the show and maybe handed it off to the tvdb agent for processing. I don't even know if that is possible though. This would be useful for the way I set up the Dragon Balls as it only scans in the first season and ignores the rest.

I have a show called inferno cop that I keep in my Anime folder because I wanted it separate from my television shows. when you search it on the anidb it brings up this page http://anidb.net/perl-bin/animedb.pl?show=anime&aid=4552. I don't know how it determines when to pull this up since there is nothing in the anime-titles.xml file but if you can figure it out it would be neat if you could also pass these shows to the tvdb agent for it to grab them.

These are all the ideas I have. I will see if anyone I know is willing to help you test, but the only guys I can think of are probably too busy to be able to find the time. 

If it show up in plex, it's scanned by the scanner

If it show metadata information it's scraped by the agent (Metadata Scraping)

It is an extremelly good idea to allow both tvdb and anidb type of numbering and after checking the code is all in place to have most of that done already since i build already a table of TVDB episode summaries... you can in the meantime use thetvdb by doing a custom search and that should work

Hi, sorry in advance if this is just some stupid mistake on my part. I just updated my agent and scanner using the latest "HAMA + Absolute Series Scanner.zip" that you posted a few posts back. Now some of the anime that was previously working fine is now showing up as blank, no metadata, posters not even a name under the anime category only shows a blank show and if i click on it I can see the number of episodes, I then need to play one of the episode to actually see what the show is.
 
The one that I have been trying to get to work is called Slap Up Party: Arad Senki

Plex Media Scanner.log i see nothing in there, apart a bleach episode 334 being scanned

Plex Media Scanner Custom.log  show Infinite Stratos only

can't see the following part:

os.uname():   'LinuxNAS3.2.40#15 SMP Sun Nov 17 00:44:13 CET 2013x86_64'
Sys.platform: 'linux2'
os.name:      'posix'
os.getcwd:    '/volume1/homes/plex'
root folder:  '/volumeSATA2/satashare2-1/Anime/■■■ Sub En/0-9'
Did you write the log path by hand? if not what os and where was it saved ?
 
 Plex Media Scanner Custom - Skipped files.log Infinite Strator OVA skipped
 
com.plexapp.agents.hama.log  better, i see things, especially the titles, nothing but the main title is found...
 
parseAniDBXml - AniDB title changed: '' original title: 'Gintama`'
parseAniDBXml - AniDB title changed: '' original title: 'Initial D Battle Stage 2'
parseAniDBXml - AniDB title changed: '' original title: 'Initial D Extra Stage 2'
parseAniDBXml - AniDB title changed: '' original title: 'Initial D Fifth Stage'
parseAniDBXml - AniDB title changed: '' original title: 'Kakumeiki Valvrave (2013)'
parseAniDBXml - AniDB title changed: '' original title: 'Princess Lover!'
parseAniDBXml - AniDB title changed: '' original title: 'Sankarea (2012)'
parseAniDBXml - AniDB title changed: '' original title: 'Slap Up Party: Arad Senki'
parseAniDBXml - AniDB title changed: '' original title: 'Kimi ga Aruji de Shitsuji ga Ore de: They Are My Noble Masters'
parseAniDBXml - AniDB title changed: '' original title: 'To Love-Ru: Trouble (2009)'
parseAniDBXml - AniDB title changed: '' original title: 'Ar-Tonelico Sekai no Owari de Utai Tsuzukeru Shoujo'
parseAniDBXml - AniDB title changed: '' original title: 'Bokusatsu Tenshi Dokuro-chan Second'
parseAniDBXml - AniDB title changed: '' original title: 'Fate/Zero (2012)'
[...]
 
What language selection do you have in the agent setting? the default main, romaji, english?
When you upgrade the agent, you should create a new library pointing to the same folder, select the Absolute Series Scanner +hama, so you can compare the results and avoid this, although i am surprised... 
 
Coding failsafe so empty strings don't update, working on the title function to figure why the last string is always empty

I had another idea for a feature. There could be a plugin that would install and update the HAMA agent and this scanner script. It could also have a little config menu like HAMA does to turn off updates or enable debugging and selecting a folder for logs to go.