Best practice for populating DirectoryObjects with remote metadata?

TehCrucible · August 10, 2013, 2:23am

As per title, whats the best practice here? For example, in the code below I have used a HTML.ElementFromURL request to pull a list of show names using xpath. I then have an "if" statement below specifying paramaters to narrow the list to the desired shows. However, the url for the thumbnails is not available from the original HTML pull (well... it is but its hidden via a javascript tooltip and not visible to xpath).

This means that I'm doing a new HTML.ElementFromURL request and xpath query for every DirectoryObject created. This seems sloppy, and causes the request to timeout if trying to process any significant number of DirectoryObjects (hence the limit of 10 in the if statement). However, I can't see any other way around it.

So I was just wondering if anyone had ever come up against something similar and how they handled it? Or just best practice for HTTP requests and metadata of objects in general?

Thanks in advance.

def MostPopular(title):
oc = ObjectContainer(title1 = title)
data = HTML.ElementFromURL("http://kissanime.com/AnimeList/MostPopular/")
count = 0

for each in data.xpath("//table[@class='listing']//td//a"):
	showurl = BASEURL + data.xpath("//table[@class='listing']//td//a/@href")[count]
	showtitle = data.xpath("//table[@class='listing']//td//a/text()")[count].strip()
	count = count + 1

	if showurl.count("/") <= 4 and len(oc) < 10:
		thumburl = HTML.ElementFromURL(showurl).xpath("//div[@class='rightBox'][1]//div[@class='barContent']/div/img/@src")[0]
		oc.add(DirectoryObject(key = Callback(ListEpisodes, title = showtitle, url = showurl), title = showtitle, thumb = Resource.ContentsOfURLWithFallback(url = thumburl, fallback='icon-default.png')))

if len(oc) < 1:
	return ObjectContainer(header="Error", message="Something has gone horribly wrong...")  

return oc

TehCrucible · August 10, 2013, 2:24am

Alternatively, is there a way for me to see the content of hidden javascript tooltips from the first request? Sorry for the double post.

mikedm139 · August 10, 2013, 6:19am

I’m not sure about the JavaScript off the top of my head. Often JS stuff loads separately from the HTML and can be a little more difficult to extract.

For the purposes of extracting thumbnails, you can actually use a callback to load them asynchronously. That way you request for the ObjectContainer doesn’t time out. For example:

...
  for item in data.xpath(...):
    oc.add(DirectoryObject(key=Callback(NextMenu, arg=blah), title=title, thumb=Callback(ExtractThumb, thumb_page=some_url)))

  return oc

def ExtractThumb(thumb_page):
   data = HTML.ElementFromURL(thumb_page)
  thumb_url = data.xpath(...)
  return Redirect(thumb)

This only works for thumbs. You can't force other metadata to load asynchronously. So in general, it's best to grab as much metadata as possible from the page that you're using to build the ObjectContainer but if you need to make an extra request to grab a thumbnail, it's not the end of the world. It's not usually worth the trouble to start making extra requests for other pieces of metadata though. If you implement a URL Service (which you should do), then a user can request more info about a specific video.

TehCrucible · August 10, 2013, 7:18am

Awesome. On that note: I'm yet to dive into the docs regarding URL Services but one question I did have was can I use my URL service to grab metadata for a DirectoryObject or TVShowObject, not just for a VideoClipObject?

For example, the site I am building this plugin for has a navigation tree as follows:

- List of show names (no metadata on this page)

- List of available episodes (show metadata, no video links on this page)

- Individual episode (direct video links for varying quality here)

The way I am structuring my channel navigation is like this:

- Object Container

- TVShowObjects

- VideoClipObjects

I am using xpath to pull meta data from each corresponding level of the site navigation. So what you are saying is I should be writing a URL service that I can pass the link for the show page (the second item on the list) and it can return the metadata to populate a TVShowObject as well as function to provide the data needed to play a video when passed the video link? Or should I be pulling metadata for my TVShowObjects from the channel, as I currently do with the thumbnail?

TehCrucible · August 10, 2013, 9:48am

Further to my example, here is some code. The first function parses a list of show names from the top level navigation, then builds a list of TVShowObjects with the second function. Is the stuff thats in the second function the sort of thing that a url service should handle?

def NewAnime(title):
	oc = ObjectContainer(title1 = title)
	data = HTML.ElementFromURL("http://kissanime.com/AnimeList/Newest")
	
	for each in data.xpath("//table[@class='listing']//td//a"):
		show_url = BASE_URL + each.xpath("./@href")[0]
		show_title = each.xpath("./text()")[0].strip()

		if show_url.count("/") <= 4 and len(oc) < 20:		
			oc.add(GetShow(show_title, show_url))
			
	if len(oc) < 1:
		Log ("data.xpath is empty")
		return ObjectContainer(header="Error", message="Something has gone horribly wrong...")  
	
	return oc

	
def GetShow(show_title, show_url):
	page_data = HTML.ElementFromURL(show_url)
	show_thumb = page_data.xpath("//div[@class='rightBox'][1]//div[@class='barContent']/div/img/@src")[0]
	show_ep_count = len(page_data.xpath("//table[@class='listing']//td/a"))
	show_genres = page_data.xpath("//div[@id='leftside']//p[2]/a/text()")
	show_summary = '

'.join(map(str, page_data.xpath("//div[@id='leftside']//p[5]/text()")))
	if len(show_summary) < 1:
		show_summary = '

'.join(map(str, page_data.xpath("//div[@class='bigBarContainer'][1]//td/text()")))
	
	show_object = TVShowObject(
		key = Callback(GetEpisodes, show_title = show_title, show_url = show_url),
		rating_key = show_title,
		title = show_title,
		thumb = Resource.ContentsOfURLWithFallback(url = show_thumb, fallback='icon-cover.png'),
		summary = show_summary,
		episode_count = show_ep_count,
		viewed_episode_count = 0,
		genres = show_genres
		)
	
	return show_object

TehCrucible · August 10, 2013, 12:09pm

One more question as well: Is there an easy way to paginate results when creating objects with a loop? For example, some of these shows list 600+ episodes on a single page. It would be great to be able to break them up into groups of 20 or so.

meo · August 10, 2013, 12:53pm

Further to my example, here is some code. The first function parses a list of show names from the top level navigation, then builds a list of TVShowObjects with the second function. Is the stuff thats in the second function the sort of thing that a url service should handle?

Not usually for TV Shows but instead for episodes or videoclips. The URL service provides both metadata(title, summary, thumb etc) and videodata(codec, how to play the video etc) in the two routines MetadataObjectForURL and MediaObjectsForURL respectively.

I think of it like this:

The plugin shall provide the navigation all the way down to an episode/video clip. The only thing required is that the plugin provides an URL for the page that episode/video clip is located on.
If a URL service recognize a certain URL(filtered by the regex in ServiceInfo.plist), it shall return the above data for that URL
This provides a good layered structure, where the URL service can be reused(for example youtube) by several plugins.

Episodes and video clips are represented by an EpisodeObject and a VideoClipObject:

http://dev.plexapp.com/docs/api/objectkit.html#EpisodeObject

One more question as well: Is there an easy way to paginate results when creating objects with a loop? For example, some of these shows list 600+ episodes on a single page. It would be great to be able to break them up into groups of 20 or so.

Yes, there exist an object called NextPageObject(not documented), example usage:

MAX_EPISODES_PER_PAGE = 20

@route('/video/myplugin/Episodes', offset = int)
def Episodes(title, offset = 0):
    oc = ObjectContainer(title2 = title)
    
    counter = 0

    pageElement = HTML.ElementFromURL(...)
    for item in pageElement.xpath(...):
        counter = counter + 1

        if counter <= offset:
            continue

        ....

        oc.add(
            EpisodeObject(
                url = myFoundURL,
                title = myFoundTitle,
                ...
            )
        )

        if len(oc) >= MAX_EPISODES_PER_PAGE:
            oc.add(
                NextPageObject(
                    key = 
                        Callback(
                            Episodes,
                            title,
                            offset = counter
                        )
                    title = "Next..."
                )
            )

            return oc
    
    return oc

Notes:

The HTTP cache is used/relied upon since the same URL is request in each "page" navigation
In order to use the parameter offset as an int, it must be stated in the route decorator

mikedm139 · August 10, 2013, 4:27pm

To add a little to meo's answer; most people (rightfully) find it confusing at first that you have to create code for grabbing metadata twice. Once in the channel to populate the Videos in the channel menu, then again in the URL Service. There is logic to it. The channel is responsible for populating the metadata for the lists of videos because, as you discovered with the site you're working on, it's much quicker to grab some data about each of a bunch of videos in a list all on the same page than it is to grab data from a single page for each video and build a list from it. If we used the URL Service to populate the metadata for the lists in the channel, it would take a long time and would very easily time out. Does that mean we can ignore the MetadataForURL() function in the URL Service? No. It's used for all sorts of other cool stuff. A short list of some of the more visible things: populating the info/pre-play screen for clients, facilitating the Play-on client x capability, (if gets added to the Services.bundle) supporting Queued videos via PlexIt and myPlex.

Aside from the docs, there are some good blog posts which are worth a read at dev.plexapp.com

TehCrucible · August 10, 2013, 10:39pm

es, there exist an object called NextPageObject(not documented)

Brilliant. Thanks for going to the effort of providing an example. Makes perfect sense.

Aside from the docs, there are some good blog posts which are worth a read at dev.plexapp.com

Thanks mate. I've checked out the blog posts. I think everything's coming together now.

Another question then. I now know I can paginate results easily (in fact, I'd already kind of built that by using a new DirectoryObject to call the function from within itself). What about if I wanted to break 600 episodes (pulled as one list) into separate DirectoryObjects, say 50 at a time. So I would be presenting the end user with a menu something like this:

- Show Name

- Episodes 1 - 50

- Episodes 51 - 100

- Episodes 101 - 150

- VideoClipObject Episode 101

- ViddoClipObject Episode 102

...

...and so on and so on. Last night I was trying to come up with some sort of loop that could keep count of where it was in list of episode urls and present DirectoryObjects that would callback to another function that only presented episodes from the list within a spcified range as VideoClipObjects. I feel like its definatly do-able but maybe there's an easier way?

Thanks again for all your help guys, its been brilliant. Hopefully I'll have something to show for it in a few days. :)

mikedm139 · August 11, 2013, 1:50pm

I think that creating your Episodes ObjectContainer with an offset parameter should work pretty well. Something to the effect of:

@route(PREFIX + '/episodes', offset=int)
def EpisodeMenu(page_url, offset=0):
  data = HTML.ElementFromURL(page_url)
  for episode in data.xpath('//xpath_expression_for_episodes')[offset:offset+MAC_COUNT]:
    ''' grab metadata and add it to the VideoClipObject '''
    oc.add(...)

  return oc

The NextPageObject example meo provided has a nice implementation of using a counter to manage the loop.

system · December 20, 2019, 11:52pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
URL Services Metadata and duplication of effort Dev/API Corner plugin-dev	4	115	December 20, 2019
How to prepare for differences between clients..? Dev/API Corner plugin-dev	15	174	December 21, 2019
Best practice regarding passing/storing of m3u8 urls Dev/API Corner plugin-dev	8	206	January 8, 2020
URL Service Dev/API Corner plugin-dev	48	295	December 20, 2019
Newb Help! Dev/API Corner plugin-dev	11	128	December 20, 2019

Best practice for populating DirectoryObjects with remote metadata?

Related topics