Debugging 4oD

Hi,



I’m trying to debug why the 4oD plug-in hasn’t been working for the last month (since 18th Feb I think).



In doing so I’ve found that the handler for the plug-in tries to use a list from the dictionary to create the programme list. However, the main handler crashes as the list is null. The list is meant to be populated by the function UpdateCache(), but this function is never called (I’ve added a log function to the start of this to check and it’s never written).



From the sparse documentation I’ve found it seems to indicate that UpdateCache() is a framework function that is called when a plug-in is started and periodically afterwards.



Does anyone know if this behaviour has changed in Plex or why the UpdateCache() function might not be getting called?



Cheers.

4oD isn’t working because Channel 4 changed the layout of the site. This means that the XPath expressions used to identify the relevant HTML tags in the pages (used for extracting information) now return nothing, and in some cases the page URLs are no longer valid. Nothing has changed on the Plex side. UpdateCache() would definitely get called eventually, about 5-10 mins after starting (many plug-ins rely on this behaviour) but it’s only called immediately after startup when a plug-in first runs, otherwise you’d end up with every plug-in you’ve installed trying to update itself simultaneously when Plex started.

Jam,



Thanks for clarifying UpdateCache() - that makes perfect sense and I expect that the behaviour of Plex would have changed.



By reinstalling I’m able to get the log from when UpdateCache() fails. This shows it to fail on the following:



fsets = XML.ElementFromURL(URL_ROOT, isHTML=True, cacheTime=CACHE_1HOUR).xpath("//li[@class=‘fourOnDemandSet’]")



where URL_ROOT is http://www.channel4.com/programmes/4od



Now, if I browse to that URL and check the xpath in the function parameter with xpath checker it appears to be correct in that it displays what I would expect to be passed to the XML object above. This is the bit that I don’t get and why I don’t think that the site layout has actually changed.



I’d like to test changing the user agent to see if channel 4 are denying access to the site based on the user agent presented by Plex (grasping at straws really). However, if I change the site configuration and restart the plugin is there any way I can force UpdateCache() to run? Reinstalling the plugin causes the site configuration to be overwritten.



Thanks for any insight you can offer.

Hmm, never mind the user agent request. I changed the user agent in the site configuration and restarted the app then left it long enough to allow UpdateCache() to run which it did and errored on the same function call.



Any ideas anyone? Jam could you have a look when you get a chance and verify what I said about the xpath - seems correct to me…



Cheers.

i have checked all the xpaths and they all seem fine… the only one i wasnt sure was correct was when the xpath references h3 i wasn’t sure if they should be h3 class, could someone confirm what is correct.

@41John



I’m now convinced that the xpaths are correct - the problem is not with the site layout. The problem is that when XML.ElementfromURL makes a call to httplib to get the HTML before parsing the site returns a HTTP header that causes httplib to throw a badstatusline exception.



The question is - why would the HTTP server do this? My only guess at the moment is a missing cookie in the site configuration that is now required? Anyone else come across anything like this with any other plugins? The last time 4od was broken it was a change to the cookie for age verification so this could be something to go on…



Cheers.




Yeah I thing you are on the right path to fixing this, but I too am stuck in how to get any further as this is my first attempt at looking into problems with a plex plugin.

Just a further update on this:



To test the httplib call I created the following script:



# -- coding: utf-8 --

import httplib

conn = httplib.HTTPConnection(“www.channel4.com”, strict=True)

conn.request(“GET”, “/programmes/4od”)

r1 = conn.getresponse()

print r1.status, r1.reason

print r1.getheaders()

print r1.msg




This enforces the use of the strict HTML parsing just to try to make it fail, but this returns successfully with 200 OK and the headers as expected.



Which tells me it isn’t a problem with the site layout, or the Python lib, but more likely lxml or Plex’s XML lib.



Will keep you posted if I find anything conclusive…

Unfortunately I can offer no help but I’d just like to say that I really appreciate the effort your going to to make this work again.



So thanks :slight_smile:

@suive: Something on the site has to have changed. There have been no updates to Plex or the framework code, and these things don’t just randomly break after working perfectly for months on end :slight_smile:

I think I’m going to go back to the drawing board and think logically on this.


  • From what i remember if you have cached show listing those shows still work, surely this rules out a site layout change on how it finds the programme as the cached shows would not work either as the cached information would be incorrect?



    -I can’t see anything wrong with the site configuration file


  • it can’t be anything to do with plex or the framework code as they have not been updated since the problems with 4od have started


  • its clearly something in how its updates its cache. I have checked nearly all the xpath expressions manually and from what i can see they are all correct. However I don’t understand the bit of code below, could this be the problem?


Caches programme metadata for all 4oD programmes on a given page. Kept in a subfunction so it can be called recursively.

def CacheProgsOnPage(url, tag):<br />

page = XML.ElementFromURL(url, isHTML=True, cacheTime=CACHE_1WEEK)

# Find all shows on 4oD on this page
for item in page.xpath("//div[@class='cf']/ul/li/ul/li[@class='catchUp']/../.."):
href = item.xpath("h3/a")[0].get("href")
if href[-1] == "/": href = href[:-1]
progtag = SubstituteTag(href.split("/")[-1])



Where would there be a li class called catchUp? Because I can't find it on the 4od site.

Actually, cached shows still working indicate that the site has changed, as episode links gathered by the plug-in prior to the change still work.



The plug-in parses the 4oD HTML to create a cache of where the episode pages are located. It doesn’t store the HTML itself, it stores a record of the episode links contained within the HTML. The actual pages containing the Flash player may not have changed, which is why cached shows still work. The part that’s changed are the pages containing links to the episodes (things like genres, tags, schedules, etc.) - it no longer knows how to find new episode links, which is why the cache isn’t being updated. The lack of a ‘catchUp’ element proves this - it used to be there, now it’s not.



Yeah thats makes sense. I needed to edit my post really as I was ruling things out as i went along. I added the last bit about an xpath expression that i can see is wrong (or that i don't understand) after i was questioning how the cached stuff works and that it would prove that there wasn't a site change. I contradicted myself but your right that there seems to be a layout change on the code i have listed in my previous post, only problem is I'm not sure what the code should be.


To clarify my previous posts - whether the actual site layout has or hasn't changed is irrelevant.

The xpath first requested by updatecache is correct - I've verified this. When the call is made to Python's httplib to request the HTML content, the calls fails throwing a badstatusline error. This indicates that the server responded with a header that httplib didn't like.

I have verified also that Python 2.5 and 2.6 on Linux and the stock Python install on OSX 10.6 does not throw this error with the code snippet in my above post.

The next step in debugging this is to determine exactly what is being passed to httplib in this context and insert it into the code snippet and see if the same error is thrown or not.

Unfortunately I have very little opportunity to debug this as my only Mac is a mini attached to my living room TV - so perhaps Jam you could determine and test this?

Incidentally the class catchUp does exist - it appears on the categories pages, so this proves nothing.


so it does, my bad.

this is the full traceback for the UpdateCache exception:


Traceback (most recent call last):<br />
  File "/Users/billyjoe/Library/Application Support/Plex Media Server/Plug-ins/Framework.bundle/Contents/Resources/Versions/1/Python/PMS/Plugin.py", line 640, in __call<br />
    return function(*args, **kwargs)<br />
  File "/Users/billyjoe/Library/Application Support/Plex Media Server/Plug-ins/4oD.bundle/Contents/Code/__init__.py", line 33, in UpdateCache<br />
    fsets = XML.ElementFromURL(URL_ROOT, isHTML=True, cacheTime=CACHE_1HOUR).xpath("//li[@class='fourOnDemandSet']")<br />
  File "/Users/billyjoe/Library/Application Support/Plex Media Server/Plug-ins/Framework.bundle/Contents/Resources/Versions/1/Python/PMS/XML.py", line 23, in ElementFromURL<br />
    return ElementFromString(HTTP.Request(url, values, headers, cacheTime, autoUpdate, encoding, errors), isHTML)<br />
  File "/Users/billyjoe/Library/Application Support/Plex Media Server/Plug-ins/Framework.bundle/Contents/Resources/Versions/1/Python/PMS/HTTP.py", line 133, in Request<br />
    f = urllib2.urlopen(request)<br />
  File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/urllib2.py", line 121, in urlopen<br />
    return _opener.open(url, data)<br />
  File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/urllib2.py", line 374, in open<br />
    response = self._open(req, data)<br />
  File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/urllib2.py", line 392, in _open<br />
    '_open', req)<br />
  File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/urllib2.py", line 353, in _call_chain<br />
    result = func(*args)<br />
  File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/urllib2.py", line 1100, in http_open<br />
    return self.do_open(httplib.HTTPConnection, req)<br />
  File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/urllib2.py", line 1073, in do_open<br />
    r = h.getresponse()<br />
  File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/httplib.py", line 924, in getresponse<br />
    response.begin()<br />
  File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/httplib.py", line 385, in begin<br />
    version, status, reason = self._read_status()<br />
  File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/httplib.py", line 349, in _read_status<br />
    raise BadStatusLine(line)<br />
BadStatusLine<br />


its definitely bad headers, if i edit the framework’s urllib2.request call to remove the headers, the plugin works.

the framework’s default headers are __headers = {“User-agent” : “Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_6; en-us) AppleWebKit/530.1+ (KHTML, like Gecko) Version/3.2.1 Safari/525.27.1”, “Accept-encoding” : “gzip”} , now i just need to find ones that will work for 4oD

{“User-agent” : “Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2) Gecko/20091218 Firefox 3.6”, “Accept-encoding” : “gzip”} works



heres the plugin’s init.py with new headers for all the http requests, it seems to be working, but not being in the UK i cant be 100% sure



edit: make sure you delete ~/library/application support/Plex Media Server/Plug-in Support/Data/com.plexapp.plugins.4oD/ and then restart the media server



Cheers Billy Joe - testing now. Note: there's an extra "," in the import statement after "re". Attached is an updated init.

yeah and i broke something else too, i gotta put the cache timer back, gimme a sec