URL Service

Thought I would start a new topic on URL Services since I know I will have a lot of questions on this subject and this provides me one place to ask all of my questions. 

 

I know I will have to create a URL Service for any channel that I create.  Basically, if there is already a URL service, someone has also created a channel related to it, so I would not be offering anything new or better that is not already out there. (though I guess there could be a few URL services like Youtube that are large enough that multiple channels are created for it).

 

First, I am having trouble with just the basic concept of what a URL Service is and how it relates to the actual channel. I am having difficulty even finding the words to explain what I think the service does.

 

Here is what I am thinking it does. Since there is no standard format that all websites use when offering videos on their site, you need to first define where and how all the data you will need to write a channel is located on their website. The URL service maps out and tells you the locations of all of those peices of information like the type, location, and the metadata info for all of the videos, hosted by a particular website. Is this what the URL service does? 

 

Why does the MetadataObjectFromURL look alot like what you are pulling up when you write the channel? How is this different from the channel's _init_.py when you define the elements to pull from the URL?
 

You've pretty much got it.  MetadataObjectFromURL is used by the bookmarklet (Plex It!) to obtain metadata about the video when it's invoked outside the context of a channel, so you're right, there may be some overlap there.

If you haven't seen it, this blog post does a pretty good job of explaining the basics and how to write one -> The Power of the URL Service

Thanks. I am looking at the very blog you mentioned while also accessing the Framework documentation and some sample code for URL Services to try to figure out how these things work. The problem is within the blog, he often just says this is very basic or self-explanitory and skips on to the next step. Unfortunately, for me, these parts of the code are not basic or self explanitory.

I am starting to wonder if my disconnect and why these examples don't make sense, in part at least, just comes from it being object oriented programming. I learned the basics of programming back years ago, before everything went to OOP.  I am looking through some basic tutorials on the concept right now to see if it helps.

But I do have one question while it is on my mind. In the ServiceCode.pys file, when the different functions refer to "(url)", where are they getting the value of url? It doesn't seem to be defined anywhere.

url is passed in by the framework from either the channel or the bookmarklet.  For example, in the latter case, it's the URL of the page that the browser is pointed to when the user hits "Plex It!"

Also the url would be from within the main plugin when you call one a VideoObject (or EpisodeObject, etc) and pass url=________

Ok, now I am really confused and have a million questions.

I understand, that in theory, you want the channel or end users (with Plexit) to determine the actual page within the site that the URL Service uses. But it has to at least specify the base address of the website somewhere in the service. Otherwise, how would Plex even know which website this URL service is for? I think this is the purpose of the URL Pattern in the plist file. But I am still confused as to how to determine the proper syntax for this and if it has any other purposes.

And throwing out the need for the channel or end user to be able to speciy any page within a site, it still confuses me as to how I am supposed to write the code for the URL service if I am not specifying the pages. If I am telling the service how to pull metadata from a website, how do I choose the proper xpath commands for the html or xml coding in the sites webpages, if I don't even know what webpage or webpages are being accessed?

And I know this confusion comes from me not completely understanding exactly what the MetadataObjectForUrl(url) function is for in the URL Service.  First, in looking at ServiceCode.pys files for different URL services, the MetadataObjectForUrl(url) function appears to me as if it is looking only at pages within the website that are for each individual video based on the xpath codes that are used. Am I correct in that obserrvation? If so, what it telling it to only look at those pages? Is there something telling this function to only access certain pages or types of pages within the website?

Secondly, since the MetadataObjectForUrl(url) function seems to be pulling similar info as the channel code does, do you only pull what you want picked up for Plexit? And if so, what metadata do you need for Plexit and is there a naming format you need to use? Also, looking at different URL Services, some seem to just pull info from the the head of the page and I would think that would just be for Plexit, but others seem to be pulling more detailed information. So is that metadata for channels? Why would you choose to get that metadata here and not in the code of the channel?

Since it sounds like you're just trying (for now) to build a channel with some navigation that will be using the URL service directly, I would suggest you don't worry too much about  MetadataObjectForUrl for the time being.  This may seem counter-intuitive, but it's actually not recommended to call that function from your channel in most cases, since it incurs an additional HTTP request, and in most cases you will already have loaded everything you need to create an appropriate media object, which has all the info needed to tell Plex how to play the media itself.

So, you can look at any number of channels for examples of how to build hierarchical navigation by extracting this info from the site.  It's fine and standard practice to directly set a "global" variable in the plugin.  To choose a semi-random example, the CBS plugin does this right off the bat:

CBS_LIST = 'http://www.cbs.com/video/'

From there you would make a request for that page and build up your menus.  In the CBS case, there are a couple of levels of hierarchy, then eventually it returns ObjectContainers containing actual, playable EpisodeObjects or VideoClipObjects depending on whether we're looking at full episodes or not.  It's the "url" of these objects that gets passed to MediaObjectsForUrl -- note that MetadataObjectsForUrl is not even used in this case since we're calling it from a channel and we already have all the metadata we need to fill out the EpisodeObjects and VideoClipObjects from the requests we've been making in order to build the navigation.  Later on, once you've got this much working, you can go back and implement the MetadataObjectsForUrl so the service can get incorporated into the larger Services.bundle for use with the bookmarklet.

I know it's a little confusing at first, hopefully that helps clear things up at least a little bit.

I understand, that in theory, you want the channel or end users (with Plexit) to determine the actual page within the site that the URL Service uses. But it has to at least specify the base address of the website somewhere in the service. Otherwise, how would Plex even know which website this URL service is for? I think this is the purpose of the URL Pattern in the plist file. But I am still confused as to how to determine the proper syntax for this and if it has any other purposes.

I'm not an expert on this, but maybe the following will help. Please forgive me if I make this too basic - but when I was trying to figure this stuff out a week ago, it would have helped me to read what I'm about to type.

Let's take the Daily Show plugin as an example. So go to where your installed plugins folder is, look for Services.bundle/Contents/Service Sets/com.plexapp.plugins.thedailyshow/ServiceInfo.plist

In that file is this:

			URLPatterns
			
				^http://www.thedailyshow.com/(full-episodes|watch)/
			

The URLPatterns is a list of regular expressions. Here is a tutorial on regular expressions that I always look at in order to understand what is going on with them (they can be very complicated to understand at times):

http://www.regular-expressions.info/tutorial.html

In this particular case, the ^ character at the beginning of the regexp is saying "this URL must start with the following regexp". In other words, if you didn't have the ^ character there, it would match against

Hi-there-http://www.thedailyshow.com/watch/ 

which wouldn't be right.

The (full-episodes|watch) part of the regex means either one of those strings can be matched.

So this URL pattern would match strings that start with either:

http://www.thedailyshow.com/full-episodes/

or 

http://www.thedailyshow.com/watch/

Now, because of this ServiceInfo.plist file, Plex automatically knows that whenever a website URL matches either of the two patterns above, that it can call the URL Services for the Daily Show plugin. So suppose a VideoClipObject is added somewhere with URL =

http://www.thedailyshow.com/full-episodes/thu-february-28-2013-rachel-maddow 

Since Plex can match the start of this URL to the first string above, it will then automatically call MediaObjectsForURL(url) and the url parameter will be:

http://www.thedailyshow.com/full-episodes/thu-february-28-2013-rachel-maddow

And when I was working on my own plugin, I didn't place my URL service into the Services.bundle. I placed it in Lifetime.bundle/Contents/Services/

Just guessing, but I assume that the Services.bundle is reserved for approved plugins.

Thank you both for your responses.

So with the URL Service, the ServiceInfo.plist URL pattern does at the least provide the base URL for the channel or Plexit to use and can the limit the scope of the values that the can be sent to the URL service based on the regular expressions used. Have I got that concept right now?

And the MediaObjectsForUrl is not required in the ServiceCode.pys for a channel to work as long as I pull all of that metadata I need in my channel programming? Then that would mean that, besides its use for Plexit, the MediaObjectsForUrl is used just to pull general metadata that appears on all pages within the site to make the channel programming simpler or provide default metadata so you do not have to program it in the channel.

Though I get the very basic concept of regular expressions, it is the syntax that is still throwing me a little. As the site you linked explains, regular expression depends on the engine that processes them. For example, the link you provided gives a page of regular expression basic syntax reference (http://www.regular-expressions.info/reference.html) and that site says it uses Perl 5. So the Perl 5 based syntax given on that site will work for writng the regular expressions in this plist file? Will this syntax also work for any regular expressions I may need in other parts of my Plex channel programming?

So with the URL Service, the ServiceInfo.plist URL pattern does at the least provide the base URL for the channel or Plexit to use and can the limit the scope of the values that the can be sent to the URL service based on the regular expressions used. Have I got that concept right now?

And the MediaObjectsForUrl is not required in the ServiceCode.pys for a channel to work as long as I pull all of that metadata I need in my channel programming? Then that would mean that, besides its use for Plexit, the MediaObjectsForUrl is used just to pull general metadata that appears on all pages within the site to make the channel programming simpler or provide default metadata so you do not have to program it in the channel.

Though I get the very basic concept of regular expressions, it is the syntax that is still throwing me a little. As the site you linked explains, regular expression depends on the engine that processes them. For example, the link you provided gives a page of regular expression basic syntax reference (http://www.regular-expressions.info/reference.html) and that site says it uses Perl 5. So the Perl 5 based syntax given on that site will work for writng the regular expressions in this plist file? Will this syntax also work for any regular expressions I may need in other parts of my Plex channel programming?

As I understand it, I think your summary of the URL Service at the top is correct.

I think your summary on MediaObjectsForUrl is about correct. To test that function, sander1 suggested the following in another thread:

Note that I think your plugin will naturally duplicate some code in __init__.py that will also go in MediaObjectsForUrl. See this thread:

http://forums.plexapp.com/index.php/topic/60984-url-services-metadata-and-duplication-of-effort/

As for regular expressions, I think most of the regex functionality you will use for plugin development will be the same across all programming languages. That website highlights programming language differences when appropriate.

Thank you both for your responses.

So with the URL Service, the ServiceInfo.plist URL pattern does at the least provide the base URL for the channel or Plexit to use and can the limit the scope of the values that the can be sent to the URL service based on the regular expressions used. Have I got that concept right now?

And the MediaObjectsForUrl is not required in the ServiceCode.pys for a channel to work as long as I pull all of that metadata I need in my channel programming? Then that would mean that, besides its use for Plexit, the MediaObjectsForUrl is used just to pull general metadata that appears on all pages within the site to make the channel programming simpler or provide default metadata so you do not have to program it in the channel.

Though I get the very basic concept of regular expressions, it is the syntax that is still throwing me a little. As the site you linked explains, regular expression depends on the engine that processes them. For example, the link you provided gives a page of regular expression basic syntax reference (http://www.regular-expressions.info/reference.html) and that site says it uses Perl 5. So the Perl 5 based syntax given on that site will work for writng the regular expressions in this plist file? Will this syntax also work for any regular expressions I may need in other parts of my Plex channel programming?

The URL pattern is only used to match a URL (as stated above), it doesn't provide anything directly to the URL service itself in terms of an actual web path.

[strike]MediaObjectForURL is used in many places, not just for PlexIt bookmarks/queue, also for suggested videos (to friends), the pre-play screen on iOS, Roku, Android and Plex/Web, as well as if you get the "info" or "details" or whatever you want to call it within the OSX/Windows/Linux desktop client.  The metadata within your channel's code is used only for what is displayed within a menu listing in any given channel (that changes depending on what client you're using, the desktop clients display quite a bit of info there, some other clients not so much).[/strike]

Sorry ... I was talking about the completely wrong thing above, please ignore :)

Lastly for the Regex I think it would be safe to assume that the plugins/channels use the Python regex syntax.  I don't know that for sure but given that the whole framework is in python I think it's a safe assumption.   http://docs.python.org/2/library/re.html

Yep, just realized I messed up and put MediObjectforURL when I meant MetadataObjectforURL, so let me ask the question again correctly:

And the MetadataObjectsForUrl is not required in the ServiceCode.pys for a channel to work as long as I pull all of that metadata I need in my channel programming? Then that would mean that, besides its use for Plexit, the MediaObjectsForUrl is used just to pull general metadata that appears on all pages within the site to make the channel programming simpler or provide default metadata so you do not have to program it in the channel.

I am assuming what you crossed out was the answer to my corrected question. But let me know if I am wrong.

I was assuming Python would be the safe syntax.  I still have to do some more extensive research on that later, but I did see a note about backslashes being different in Python regex than Perl.  This difference would not affect the regex in the plist file since it is an xml document would it?

Yes the strikethru stuff is your answer.  I'll paste it again without the strikethru and the correct naming!

MetadataObjectForURL is used in many places, not just for PlexIt bookmarks/queue, also for suggested videos (to friends), the pre-play screen on iOS, Roku, Android and Plex/Web, as well as if you get the "info" or "details" or whatever you want to call it within the OSX/Windows/Linux desktop client.  The metadata within your channel's code is used only for what is displayed within a menu listing in any given channel (that changes depending on what client you're using, the desktop clients display quite a bit of info there, some other clients not so much).

Thanks Mark, I didn't actually realize that pre-play screens were using MetadataObjectForUrl, but it makes some sense.  More motivation to properly implement it then!  (and sorry for the confusion)

I agree with shopgirl284. We noobs need details. Don't assume anything. :)

Thanks. I am looking at the very blog you mentioned while also accessing the Framework documentation and some sample code for URL Services to try to figure out how these things work. The problem is within the blog, he often just says this is very basic or self-explanitory and skips on to the next step. Unfortunately, for me, these parts of the code are not basic or self explanitory.

I found this Python based regexp page informative too:

https://developers.google.com/edu/python/regular-expressions

Just to finish up any questions about the ServiceInfo.plist as it relates to URL services, I just want to clarify one thing.  Is this plist file using Perl5 regex or python regex?

And now back to the URL ServiceCode.pys.

I was just going to wait on the the MetadataObjectforURL, but since it pulls information for preplay screens and the more info sources, that can influence your channel or how it looks, I am going to just get some basic info and add more later if I want.

This section of the ServiceCode.pys makes alot more sense to me once I looked a little closer and realized the variables you choose to pull the info are based on the attributes for the Metadata Object you choose to return the info. (Yes that was a DUH moment for me). I see that most just use the VideoClipOject, while a few use the EpisodeObject. Since you will pull all this info for your channel again in your _init_.py file, is there really alot of benefit to using the EpisodeObject over the VideoClipObject to return the data? I would think this would depend on what is using this metadata, so...

Is there any information out there that explains what these different places that get their metadata info from the MetadataObjectforURL in the URL Service pull from the returned objects? I would think there would be some documentiation to help you decide at least the minimum data you want to collect and return here to satisfy the different places that use this info.

I have some general questions about networking and parsing data APIs, but I will ask those in the general thread I started since that may also apply to other areas of my channel code. But after I figure that out, I may have some more questions about the MetadataObjectforURL section.

But again, thank you guys for answering all these questions for me.

Sorry, but I am going to bug you guys with even more questions.

With my current channel project, by looking in the embed code, I did find that the website offers two different types of each video, a high and low quality specified in the name of the video. But only the high quality one is mentioned in the actual source code of the page, so would I still make two different MediaObjects for these two different quality of videos? Also, am I listing both video types just to tell the URL service they exist or because I want to offer both?

Within each MediaObjects(), how do you determine the values of the attributes? The tutorial (http://devblog.plexa...he-url-service/) talks about using MediaInfo to find the values, but that program only seems to provide info for videos stored locally on your machine, not videos stored remotely on websites. And, there seems to be a variety of attributes that differnet URL services choose to specify.  Which values should you define for your videos other than those that are different for each video type?

Also, I am wondering if I need to change the extension of my videos. All the videos for this URL service are f4v format, but in the code for each video page right under the line that states the value of the video, is a line that changes the format to mp4. I called both video formats using the video's http address in VLC.  The videos plays with both the .f4v and .mp4 extensions, but it is so much faster loading in .mp4 format. So, should I use the mp4 instead of f4v extension of the video?

Just wanted to bump this up and see if I could get some clarification on a few things that I asked about and make sure I am on the right track. 

I am still curious if the plist file uses Perl5 regex or python regex.  Also, if there are any standards to tell you at least the minimum metadata you need to pull in you URL service. And I am guessing that you provide both high and low quality videos in the MediaObjects for those that have their player set to a low quality.

I think I have figured out how to get the attributes of MediaObjects. I found that I can open the videos in VLC directly from the site and see the video and audio codec, the video resolution, and that the audio channels are stereo. So does that mean I can pull up a few videos for testing, make sure they are all the same and just use the info I find on VLC for the attributes? I also see that alot of URL services list the bitrate. Is it better to include the bitrate? How would I determine that? The VLC statistics show the content bitrate.  Would you put the maximim, minimum, or try to determine an average?

There is also the question I had about formats, but that also involves the PlayVideo function, so I will get to that later.

I am still curious if the plist file uses Perl5 regex or python regex.

I would assume they are python regex because Plex is python based. Have you run into a case where a regex is not working as you intended?

As for the other questions, I'm sorry I don't know any of the answers.