[adult] data18.com Metadata Agent - Release

Hi, 

just ran the agent again in the latest version you have posted and it did not find a single match :( Could it be that the name convention is not correct ? The previous bundles were called data18.bundle and not it is called data18-content.bundle ??Thanks a lot for the help

Hey Spanish,
The data18-content.bundle is for the web-video part of the site. Clips from places like x-art etc. If you look at the first post there is still a data18.bundle. That is the one you want to use for movies. Cheers

So does the data18.bundle agent support scenes, or is it strictly for full movies? If it does support scenes then what would the naming convention be?

Data18.bundle is for full movies; or http://www.data18.com/movies/NNNNNNNN/ where N is the movie id.
Data18-Content.bundle is for data18 Content from websites; or http://www.data18.com/content/update_NNNNN.html where N is the content id.

Both of these are easily searchable on the data18 website. We use the Movie or Video scanner to choose the title from the filename, and then the agent searches the same way you would if you went to the site.  The scanners are far from perfect, but if you name your file exactly the same as the title on data18 it will match.

Scenes (split scenes) do not seem to be searchable on data18. They are linked from within a movie.

I would need to build a custom scanner that would decipher the movie title and scene number from your filename, and then build an agent to search for the movie title, and then search for the scene number (kinda like how the TV scanners and agents work).

It's very doable if one has the time to spend figuring it out.

Edit: I mistakenly had the word scene where it should have said movie.  I edited this to stop any confusion.

Gotcha, thanks chidychi! 

Update to the Data18-Content bundle.

New Version. Changed search match to look for "content" instead of update in url of search results. This provides a fix when Data18.com hides videos in "Gallery" updates.

I've made some modifications to the search, which uses google instead, and is much more reliable.

Here are the changes, use them if you want:

Line 9:    EXC_SEARCH_MOVIES = 'http://www.google.com/search?q=%s+site:data18.com/content'
Line 48:   for movie in searchResults.xpath('//div[@class="rc"]//h3[@class="r"]//a[contains(@href,"content")]'):

It finds pretty much anything you throw at it without any special requirements for naming.

The only downside is that new content won't be found until the google crawler has indexed it.

EDIT: Well so apparently google detects that this is not human activity when using this method. I know the XBMC version of this agent uses this method, so I'll try to figure out what they do differently. Perhaps a short delay between calls.

Hi halkon,

I set up the github.

https://github.com/chidychi/Plex-Data18-Agent-Adult

Cheers!

Thank you very much! I'll fork the repo and hopefully we can both learn and make the agent better. 

I've forked Data18-Content.bundle, and done a full rewrite. You can find it here https://github.com/mvestergaard/Data18-Content.bundle

There are quite a few changes/improvements:

  • First of all, code is cleaned up, and should be much easier to follow.
  • Searches will now be with google searches restricted to site:data18.com/content
      - First a request will be made to the Google JSON API
      - If that search yields no results, a normal search will be done, and the HTML will be parsed.
      - (For some reason the results from these two services are not equal, which is why there is a fallback)
  • Thumbs will be included in search results when doing manual searches (not really sure how they're used, but why not?)
  • Posters and fan art will be populated equally using all images from the data18 viewer page. When no images are available, it will use the image from the main page as a placeholder. (That image is added if the other images are there too, but it's lower resolution, so it is given the least priority)

Notes:

  • Because the google search is so good at finding results. The scores that are calculated using the Util.LevenshteinDistance function are generally quite low, unless you wrote the name pretty dead on. To make up for that, i made a score rebasing function which is used when searching automatically. It will bump up the scores so that the best match has a score of 85 (which will make Plex auto select it). This can cause some incorrect matches it seems. But it's better than having to match everything manually.
  • There seems to be some anti-robot functionality in the Google HTML search. So I've experienced some 503 Service Unavailable errors, because of a Captcha being forced. Perhaps the delay on the requests should be increased from 0.5 to maybe 1, or more?
  • Currently, the first poster/fan art is always selected. If possible it would be great if it tried to find the first image that is bigger on the Y axis than X axis, and use that as a poster, and the other way around for the fan art.
  • The search needs a bit of work. I've run it on a collection of about 200 clips, and almost everything was matched. Some things were incorrectly grouped together. This can either be due to the need of a bit of a naming convention, or maybe the score rebasing being a little too aggressive.

Known bugs and a few things I'm still working on, when I have time:

  • With certain scenes, the Google API search doesn't provide the correct result, and the same goes for the other method. I'm going to try to combine the results from both the API search and normal search, and take the best matches.
  • Might also add a fallback to a Bing search if the Captcha issue with the google search isn't resolved with a longer delay.
  • Releases from sites like Naughty America don't have thumbnails on the image viewer page, and therefore my method for getting all the images does not work for those scenes (it still gets the player image, and the first of the gallery images from the main page though). I'll need to make an extra fallback for these situations.
  • I can imagine I'm missing some error handling for unexpected data. I'll need time to find some edge cases to test on.

I hope you'll take a look. In my opinion continued development of this agent should be based on this version going forward.

This is the first python code I've ever touched, so some things can probably be improved, but I'm quite unfamiliar with the language and tools available.

mkv, great job on using the google search api, almost everything now matches better (given that the data18 search is very very picky).

However, I am having to go through the link it finds to select the cover art, as of now it doesn't automatically try to gatehr the images from the data18 site, any ideas?

MKV has rewritten data18.bundle

This code is more efficient and makes use of a proxy that must be configured.

Here are the notes MKV made for this bundle:

Data18 metadata agent

This metadata agent will receive data from Data18.com.

To get the best results, follow the standard plex naming convention for movies.

A few options are available from the agent configuration:

  • Ouput debugging info in logs - This is for debugging purposes. Off by default.
  • Allow alternate poster source if Data18 poster is not split in front/back - Some of the posters provided on Data18 seems to be a combination of front/back which is not very good for our purposes, so enabling this setting will allow the agent to go to alternate sources, if they are found on the movie page. This can either be AEBN or the Data18 store, where AEBN is preferred.
  • Fan art scene image count (-1 = none, 0 = all) The agent can follow links to scenes from the movie page, and download scene images to use as fan art. By default this is set to 0 meaning that it will get all images. Setting it to -1 will disable the functionality, and setting it to any other positive value will limit the number of images from each scene to the given number.
  • Image proxy url - The images on Data18 are not receivable without a proper referer in the request. This means that we cannot use the preview functionality that Plex provides, so we cannot use thumbnail previews. This setting specifies a URL to a proxy that can be used to add a referer to the request. If this URL is not specified, the images will be downloaded without thumbnails (note that this is alot slower, and will take up more harddisk space). See below for more information.
Image proxy

The included referer-proxy.py can be used to allow getting images where a referer is required, by providing it in the URL. The proxy is built using CherryProxy and Requests. See those pages for installation instructions.

Once the prerequisites are installed, run the proxy with this command:

python referer-proxy.py -a HOST -p PORT

Replace HOST and PORT with appropriate values (eg. HOST = 0.0.0.0, PORT = 1234)

The proxy will not run as a daemon. I'll let people more familiar with python assist with that part.

Tried MKVs rewritten Bundle.

Unfortunately a lot of my files are being identified wrong and matched automatically with wrong results.

Also quite often when i search for the exact filename from Data18 it doesn't show up in the results.

When i had this problem earlier, i went for "Update xxxxx" and got the proper result.

The new rewrite unfortunately doesn't work that way, so i'll stick to the old one.

Still thanks for your effort...

When using the agents from me, make sure to read the README on github, and follow the naming conventions described.

I've had about a 99% match success rate with both the movie and content agents in my own tests.

My priorities will be elsewhere, at least for a while. So if anyone wants to continue development. Feel free to do so.

Links to my repositories. Feel free to fork them, but make pull requests to chidychi's repositories and not mine.

Data18.bundle - https://github.com/mvestergaard/Data18.bundle

Data18-Content.bundle - https://github.com/mvestergaard/Data18-Content.bundle

Is it somehow possible to use this plugin add on with plex server running on openelec xbmc ?

When using the agents from me, make sure to read the README on github, and follow the naming conventions described.

I've had about a 99% match success rate with both the movie and content agents in my own tests.

My priorities will be elsewhere, at least for a while. So if anyone wants to continue development. Feel free to do so.

Links to my repositories. Feel free to fork them, but make pull requests to chidychi's repositories and not mine.

Data18.bundle - https://github.com/mvestergaard/Data18.bundle

Data18-Content.bundle - https://github.com/mvestergaard/Data18-Content.bundle

Hi There, 

thanks to all the contributors for this great agent. I have just ran it again after the upgrade to the latest version of PMS and now I am getting a slightly strange result. Might be solved with the proxy but would like to understand it a bit better. I get the movies matched correctly but then I am getting a Poster (and only that one) that looks like a fold of the movie not the nice front cover. Any chances / ideas to fix ?

blaresteep created a fix and I pulled in the merge this morning. Thanks!

Please update using the Unsupported Appstore or go to the repository and get the latest version.

Cheers

Hi, 

downloaded the latest version from Github. I get correct match and Metadata but not the Poster download.... Any suggestion

Some more. There seems to be a problem with the matching as well. They are matched but need to have individual fix incorrect match treatment.

The cover problems seem to be mainly with Evil Angel and Jules Jordan

Hey.
I noticed the same thing.
Rebooting my computer helped.

Followed your steps but the problem persists. Mainly with Evil Angel & Jules Jordan. Any further idea anybody ?

Does anybody else experience problems with the Coverart for certain labels ?