Weird HTTP request hijinks.

Hey guys,

 

One of my channels has recently broken on its own overnight.  Seems now whenever I do a HTTP.ElementFromURL request to that domain, I am only returned the following element:


Eg:

2013-12-24 09:01:27,261 (-4f863490) :  DEBUG (networking:172) - Requesting 'http://www.animeplus.tv/popular-anime/1'
2013-12-24 09:01:28,138 (-4f863490) :  INFO (__init__:57) - 
2013-12-24 09:01:28,139 (-4f863490) :  INFO (__init__:77) - No shows found! Check xpath queries.

Previously, the same code has worked fine.  Considering this error appeared overnight and the meta tag is seemingly redirecting me to nothing, could this be the website attempting to circumvent my http requests?  Or something else?  Is there any way around this?  The actual source page include that particular meta tag so I'm not even sure where that's coming from.  Is there a tool I can use to see whats actually going on when we do these requests?

 

Channel code here.  Thanks in advance.  

The URL in a browser returned a list of Anime to me, so maybe you are blocked somehow.....

Only thing I can thing of to digest this, is low level analyzing, meaning Wireshark

But that will still only tell you about what is going on in your case, and not the back-end logic

/Tommy

Thanks Tommy,  yeah the URL works fine in a browser for me as well.  Its only when using the HTTP request from Plex, which is what I find strange.  The log excerpt above is the result of me calling HTML.ElementFromURL then using HTML.StringFromElement on that element.  Then finally logging the resulting string. It appears the all I'm getting from the initial request is the quoted line above.  I'll check out Wireshark.  Thanks for the link.

Ok, I'm completely stumped here.  Would someone mind taking a look at my plugin (in the original post) and see if they can provide any insight?  As far as I can tell, I'm doing everything right and I'm not getting anything back from my HTTP requests.  Other channels/websites work fine using the same code?  Is it possible that the devs behind the website can identify and block/redirect requests specifically from the Plex API?

Seems like the following is happening(logged from Safari):

  1. The URL is requested
  2. The website returns the above HTML together with a Set-Cookie request
  3. The URL is requested again but now with the cookie set from step 2.
  4. The website returns a redirect(302) together with a new cookie(session id)
  5. The redirected URL is requested with both cookies set from step 2 and 4.
  6. The "real" webpage is returned.

Using CURL gives the same result as from the Plex API. I guess you have to build some own logic around this, it seems like a DOS(Denial Of Service) protection mechanism? 

Thanks for the quick reply!  That would make sense.  However I was under the impression that the Plex API would handle cookies for me?  Could you recommend another plugin that uses session id cookies for me to experiment with?

Thanks for the quick reply!  That would make sense.  However I was under the impression that the Plex API would handle cookies for me?  Could you recommend another plugin that uses session id cookies for me to experiment with?

Yes you're right, the Plex API will handle the cookie part(I've must have remembered it wrong). The problem is that the HTML code is telling the browser to refresh and therefore we must do the same in the plugin.

I added the following routine to __init__.py:

def HTMLElementFromURL(url):
	request = HTTP.Request(url)
if 'http-equiv="refresh"' in request.content:
	request = HTTP.Request(url, cacheTime = 0)

return HTML.ElementFromString(request.content)

and replaced all calls to the standard 'HTML.ElementFromURL' with the above.

Not the nicest piece of code but it did work at least  :)

It seems like their server (whoever is hosting that site) is actually not configured correctly ... they are returning raw PHP code within that request when I do a direct cURL of it ... 

$ curl "http://www.animeplus.tv/popular-anime/" 

Thanks guys; awesome work Meo.  Works great.  You're right, it's not pretty, but it'll do.  Thanks a million.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.