TVO channel needs testing/advice

Lexy · March 25, 2014, 12:55am

Hi folks,

So I have written my first channel. It can be found here: https://github.com/lexy0/TVO.bundle

However, I get quite a few timeouts. Probably in part due to my noob code, but also in part due to the slowness of TVO's website. I can't do anything about the website, but if folks could offer suggestions on how to improve/speed up my code, I would be grateful.

My set up is Win7, Chrome browser, Roku 2XS, and Apple TV 3 through Plexconnect. The slowness issues are particularly bad on ATV via Plexconnect.

Thanks!

Gerk · March 25, 2014, 3:09pm

One thing that jumps out at me is this here:

https://github.com/lexy0/TVO.bundle/blob/master/Contents/Code/__init__.py#L63

epElement = HTML.ElementFromURL(epURL)

When you're looping through all of the potential episodes you're making a new HTTP request for each individual show. If it all possible you should avoid this. Is it not possible to extract at least some basic information from the single page that lists all of the episodes? Pretty sure this would be at least a good start to sort out the slowdowns. The ideal situation is that you make a single HTTP request and extract all of the information from that page.

Lexy · March 26, 2014, 5:14am

Thank you for your reply Gerk. I have separated the code as you suggested and found a better anchor for my xpath.

However, now I am running into another error. Unfortunately the site lists upcoming episodes that have no video. Obviously I only care about those episodes with video. So I came up with the code below. When I run it, I get every episode (with or without video) and it seems that my code only grabs the first instance of an URL and uses that for each episode.

So I'm not sure what is going wrong here. If I use epURL = item.xpath('./span[contains(@class, "vid")] I get no episodes and if I use epURL = item.xpath('//span[contains(@class, "vid")] I get every episode with or without video. Neither is what I expect.

I hope this makes sense.

To illustrate, take this example: http://ww3.tvo.org/program/201110

Using "//", I get all 6 episodes in my list and all point to the URL for episode #4 (the first instance of an URL in this case).

Here is my code:

###################################################################################################
@route(PREFIX + '/showepisodes')
def ShowEpisodes(title, pass_url, showpg):
    oc = ObjectContainer(title2=title)
    pageElement = HTML.ElementFromURL(pass_url)
    for item in pageElement.xpath('//li[contains(@class, "views-row views-row")]'):
        try:
            epURL = item.xpath('//span[contains(@class, "vid")]/a[contains(@class, "watch-video")]')[0].get('href')
            epTitle = item.xpath('./span[contains(@class, "ep-title")]')[0].text
            oc.add(DirectoryObject(key=Callback(PlayEpisodes, title=epTitle, pass_url=epURL), title=epTitle, summary='Add summary'))
            Log(epTitle)
            Log(epURL)
        except:
            continue
       
    return oc

Here is a snippet from the website:

Episode 6

April 6, 2014

It’s harvest time, winter is drawing in and the Dissolution of the Monasteries is on the horizon. This week the team will be bringing in the barley and celebrating with a harvest feast, to give thanks for their bounteous crop.

Episode 5

March 30, 2014

This episode explores the theme of hospitality in Tudor England. With no provision for the poor from the state, it was down to the monasteries to provide welfare for those in need.

Episode 4

      Watch Video

March 23, 2014

This week the team learn to master the landscape away from the farm in order to supplement their income. The monasteries’ land covered a variety of landscapes, from rivers and woodlands to hills and mines, all of which would be expected to be exploited by the tenant farmer to raise income for themselves and the monastery.

Gerk · March 26, 2014, 3:32pm

When using // you are searching the entire data structure (from the top down) when using xpath, but when using ./ you are searching the direct children of the specific node you're within (from the previous result). You should probably be using .//span there to get what you want. I just did a quick test on that page and that seems to work with what you're trying to do.

so summary:

// searches top down in the whole data structure

./ looks at just those direct child nodes (from the previous search result)

.// searches all subsequent child nodes (from the previous search result)

Hope this helps.

EDIT: After looking at the code you could probably even shorten the xpath to something like this (you don't need to limit by that particular span as the watch-video class a tags are only going to be visible on shows you can watch anyway):

epURL = item.xpath('.//a[contains(@class, "watch-video")]')[0].get('href')

Harry_Rasmussen · August 5, 2014, 6:30pm

Has anyone done more development with this channel?

Lexy · January 18, 2015, 6:17pm

Has anyone done more development with this channel?

Indeed there has. After ignoring the channel for quite a long time, I've picked the project back up. It's working for the most part (see read me).

It can be found here https://github.com/lexy0/TVO.bundle

Darwinner · January 19, 2015, 5:44pm

Can't wait to check it out!

system · December 21, 2019, 1:40am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
New Plugin for TV Land Dev/API Corner plugin-dev	4	176	December 20, 2019
Scraping problems Dev/API Corner plugin-dev	7	93	December 21, 2019
xpath coding Dev/API Corner plugin-dev	50	416	December 20, 2019
Mtv2 Dev/API Corner plugin-dev	28	221	December 20, 2019
Newb Help! Dev/API Corner plugin-dev	11	128	December 20, 2019

TVO channel needs testing/advice

Related topics