@parallelize and @task

papalozarou · August 28, 2014, 10:29am

How do I use them?

I have a site that I've grabbed all the URLs for pages that contain videos (but there's no meta data on the pages, hence not using a URL service). There's 18 in total (and there could be more or less).

At present, I'm doing the below, which basically does each channel in turn, getting the title, url for the page the channel video is on, then requests that URL and finds the HLS file. These three values are then stored in an array called CHANNEL_LIST.

This is really really slow, and I'd like to do some of the requests in parallel.

################################################################################
# Gets a list of channels to iterate over
################################################################################
def GetChannelList():
    # Check to see if CHANNEL_LIST is already populated, if yes return it, if
    # no construct it.
    if CHANNEL_LIST:
        return CHANNEL_LIST
    else:
        # Construct CHANNEL_LIST_URL and grab HTML
        CHANNEL_LIST_URL        = URL_BASE + URL_MEMBERS + URL_CHANNELMENU        
        CHANNEL_LIST_SOURCE     = HTML.ElementFromURL(CHANNEL_LIST_URL)
    # Find the channel links in the HTML source with xPath
    CHANNELS                = CHANNEL_LIST_SOURCE.xpath("//p/a")

    # Remove the last link from the CHANNELS list (the 'Return
    # to desktop version' links)
    CHANNELS.pop()

    # Add each channel to CHANNEL_LIST
    for CHANNEL in CHANNELS:
        # Grab the link text and convert from list to string
        # N.B. xpath ALWAYS returns a list
        CHANNEL_TITLE       = "".join(CHANNEL.xpath(".//text()"))
        CHANNEL_URL         = URL_BASE + URL_MEMBERS + "".join(CHANNEL.xpath(".//@href"))
        
        # Extracts the actual video URL for a channel. We do it inside 
        # this function so we can store it as part of CHANNEL_LIST and 
        # only do it once, not every time we hit the main menu
        CHANNEL_VIDEO       = GetChannelVideoStreamURL(CHANNEL_URL)

        # Gets the correct channel thumbnail
        CHANNEL_THUMB       = GetChannelThumb(CHANNEL_TITLE)

        # Appends the channel details to the CHANNEL_LIST
        CHANNEL_LIST.append([CHANNEL_TITLE,CHANNEL_VIDEO,CHANNEL_THUMB])
    
    CHANNEL_LIST.sort()
    
    return CHANNEL_LIST

################################################################################
Extracts the actual video URL for a channel
################################################################################

def GetChannelVideoStreamURL(URL):

# Grab the source from the Channel’s URL – done inside here so we

# only do it once, not every time we hit the main menu

CHANNEL_SOURCE          = HTML.ElementFromURL(URL)
# Gets the relevant script that has the mediaplayer info in it, by using
# xPath to search for a script containing the string 'mediaplayer'
CHANNEL_SCRIPT          = CHANNEL_SOURCE.xpath("//script[contains(., 'mediaplayer')]//text()")[0]

# Grabs the video URL via regex
CHANNEL_VIDEO           = re.findall(r'(http:\/\/[\d].*)\'',CHANNEL_SCRIPT)[0]

return CHANNEL_VIDEO</pre>

However, I'm not really sure how to @parallellize/@task this. When i try, CHANNEL_LIST returns empty, as if it's returning before the parallel tasks have finished, or that I'm not getting the information back from @task.
 
Any help? I've looked at the devour plugin mentioned in this thread but I'm none the wiser.

mikedm139 · August 28, 2014, 3:32pm

Is there a reason that you need to store the actual HLS url? It would be much more efficient to just parse the page when a play request it made (whether you're using a URL Service or not). If the urls don't change, you could always include code to check if the stored value exists and if not, then parse the html and store the value for the selected channel.

That being said, I use parallelization in the UnsupportedAppstore. See here for an example.

papalozarou · August 28, 2014, 5:43pm

Is there a reason that you need to store the actual HLS url? It would be much more efficient to just parse the page when a play request it made (whether you're using a URL Service or not). If the urls don't change, you could always include code to check if the stored value exists and if not, then parse the html and store the value for the selected channel.

That being said, I use parallelization in the UnsupportedAppstore. See here for an example.

I could do it that way though I don't really know how to – would I have to do a playvideo function it as an indirect?

papalozarou · August 28, 2014, 6:29pm

Have managed to do it via indirect, and according to the logs the URL is only requested when playback is initiated.

Thanks for pointing me in that direction.

papalozarou · August 29, 2014, 8:44am

Scratch the above.

The channel only works when the PMS and client are on the same machine now. Everytime I request the URL for the video file, it basically asks for login again (I seem to remember running in to this problem when I originally tried to use a URL service when I started).

Is there a way to get round that? I've tried sending the user agent header with the request for the page containing the video, but get the same result.

mikedm139 · August 30, 2014, 5:41pm

Scratch the above.

The channel only works when the PMS and client are on the same machine now. Everytime I request the URL for the video file, it basically asks for login again (I seem to remember running in to this problem when I originally tried to use a URL service when I started).

Is there a way to get round that? I've tried sending the user agent header with the request for the page containing the video, but get the same result.

Is there a cookie or session header which needs to be passed with the requests? That's usually the sort of thing that causes a recurring request for login. Do you have your code on github?

papalozarou · August 31, 2014, 10:23am

If I make the request for the pages containing the video streams immediately after logging in and getting the channel list page, it works fine (I.e. The original slower way).

If I delay and make the request when a user wishes to play the video that’s when it tries to login again.

Code is here:

https://github.com/papalozarou/SportseBooks.bundle/blob/master/Contents/Code/init.py

mikedm139 · August 31, 2014, 2:52pm

So it seems, the content server uses a time based session. I would expect the http traffic to include a session header of some type. If you retrieve that with the login, then you should be able to pass the header back with each subsequent request.

Alternatively, it’s not unreasonable to force the channel to execute the login logic following a play request before the retrieving the video URL.

Alternatively, you could proceed with your original plan to retrieve all the video URLs up front. You could do it in a background thread to avoid timing out.

papalozarou · September 1, 2014, 9:59am

Thanks for this – I went with the second of the three options you've outlined above and it seems to work now.

Topic		Replies	Views
General Channel Development Questions Dev/API Corner plugin-dev	4	114	January 2, 2014
Access other URL Services from a channel plugin? Dev/API Corner plugin-dev	4	541	February 23, 2014
Help regarding redundant menu steps. Dev/API Corner plugin-dev	2	83	August 22, 2014
Advice regarding URL Service? Dev/API Corner plugin-dev	5	107	August 16, 2013
URL Service Dev/API Corner plugin-dev	47	340	June 3, 2013

@parallelize and @task

Extracts the actual video URL for a channel

Related topics