I am currently attempting to develop a plugin for the TV Land site and am having some problems with xpath. I have developed a rough site config file which will play the videos from TV Land and have also written the code for the first menu of the plugin (I have been able to get the information from the www.tvland.com/fullepisodes page using xpath). However the issue I am running in to is that when I try to get the path for the individual episodes, thumbs and info the python script doesn't seem to be able to find it. The following is an example of a generic issue that I am running into. When on the episodes page for Everybody Loves Raymond using xpath checker I try an generic xpath of //img which should return all of the images on the url (see screenshot) which it does. However when using the same line in the python script:
for indshows in XML.ElementFromURL(EPISODE_PREFIX + showpath, isHTML=True, headers=HEADERS).xpath('//img'):
Log(indshows.get('src'))
all of the thumbnails except for 4-8 are returned in the log file. 4-8 being the thumbnails for each episode. This is the same issue I have when attempting to access the video path (href). Being very new to python and plugins I am struggling with why xpath cannot extract the information for the video files, obviously I am missing something here. I would greatly appreciate any input or tips to try and overcome this issue. The plugin works great when manually entering the paths just can't access them from the webpage. Please let me know if there is any other information which I could provide to help. Thanks.
ok, so it seems its more complicated than that… i just grabbed that from what i could see in firebug/xpather, but… it seems that bit of the DOM is all generated client side by a nasty little bit of javascript, and as the plugin framework is just dumping the raw http response and not running the scripts on the page, that bit of html doesnt exist when your plugin sees it. that section instead looks like:
which means easily grabbing it with xpath is out, the data you want IS still there (lucky you, that's not often the case), but you'll need to resort to regex or some other messier method to grab it.
Thanks for the help (i was really struggling with this), regex worked perfectly. Now with a little TLC the plugin should be ready for some initial testing