Hints for first plugin

flubr · February 17, 2011, 11:34pm

Unfortunately, this code generates the same error, so I think I must be doing something wrong:

rtmpClip = content2.xpath(’//div[@class=“jw-player”]//param[@name=“flashvars”]’)[0].get(‘value’)

File “/Users/flubr/Library/Application Support/Plex Media Server/Plug-ins/Framework.bundle/Contents/Resources/Versions/2/Python/Framework/bases.py”, line 142, in

getitem = lambda x, y: x.getitem(y),

IndexError: list index out of range


<br />
def EenVideos(sender, pageUrl):<br />
    dir = MediaContainer(title2=sender.itemTitle)<br />
    content = HTML.ElementFromURL(pageUrl, errors='ignore')<br />
    for video in content.xpath('//div[@id="videoArea"]/div/ul/li'):<br />
        image = video.xpath("a/img")[0].get('src')<br />
        title = video.xpath("h5/a")[0].text<br />
        title = title.split(" - ")[1]<br />
        link = video.xpath("h5/a")[0].get('href')<br />
        link = ROOT_URL + link<br />
        Log(link)<br />
           	<br />
        content2 = HTML.ElementFromURL(link)<br />
        summary = content2.xpath("//div[@id='videoZoneContainer']/div/p/a")[0].text<br />
        rtmpClip = content2.xpath('//div[@class="jw-player"]//param[@name="flashvars"]')[0].get('value')<br />
        Log(rtmpClip)<br />
<br />
        dir.Append(RTMPVideoItem(url="rtmp://vrt.flash.streampower.be/een", clip="2011/02/ogom_110216_huisdokter_Website_EEN", width=640, height=360, live=False, title=title, summary=summary, thumb=image))<br />
<br />
    return dir<br />

with e.g.:
pageUrl = [http://www.een.be/mediatheek/tag/168](http://www.een.be/mediatheek/tag/168)
link = [http://www.een.be/mediatheek/405910](http://www.een.be/mediatheek/405910)

My firefox xpath tools still gives this:

[size=2]So it seems that the xpath in plex doesn't return anything? How can this be?[/size]

pierre1313 · February 17, 2011, 11:41pm

it could be the page is modified by javascript

pierre1313 · February 17, 2011, 11:46pm

try doing a Log(HTTP.Request(pageUrl).content) at the beginning of your function and see if the HTML code that plex sees is the same as what a browser would see. Not entirely sure the javascript snippets are executed in Plex…

flubr · February 19, 2011, 12:49am

Thanks, this solved my issue!

I was able to extract the RTMP clip from the java script. In some cases I get redirected to the wrong video, but I think this is an error in the site’s source code since these wrong redirections also occur in my browser. I sent an email to the webmaster and hope it will be resolved

BTW, in the meanwhile I adapted the “deredactie.be” (news) plugin from Sander. I made the “Sporza” (sports news) plugin from it and submitted it for the app store. It’s also on the “unsupported plugins” page.

flubr · February 23, 2011, 11:22pm

Hi,

I’m having some trouble adding some text as a summary. I tried this command for the HTML code below:


<br />
summary = inhoud.xpath('//div[@class="text"]/p')[0].text<br />


<br />
<div class="text"><br />
<p><strong>Tomas liet zich vandaag inspireren door de indrukwekkende speech van Khaddafi om al zijn eisen duidelijk te maken aan het volk.</strong></p><br />
<p>Het staat vast dat kolonel Tomas niet zal wijken voor opstandige burgers en dat er erg repressieve maatregelen zullen volgen wanneer zijn wensen niet worden vervuld.&nbsp;</p><br />
 </div><br />

However the "strong" part causes some trouble I think. The "strong" parts only make the text bold. This isn't necessary in plex, so I was wondering if anyone knew a way to just extract all the text. Sometimes one word in the middle of the text is made "strong", so the structure isn't fixed.
Thanks in advance!

pierre1313 · February 23, 2011, 11:24pm

try .text_content() instead of .text

flubr · February 23, 2011, 11:31pm

Thanks, it worked!

pierre1313 · February 23, 2011, 11:35pm

nice!

jwray · February 23, 2011, 11:55pm

I wrote this up for myself else but is probably worth putting in a forum post.

Three ways to extract text from an XML tree all with slightly different semantics.

1.


e.xpath("a")[0].text

calls text method on lxml.etree_Element (http://lxml.de/api/lxml.etree._Element-class.html#text)

2.


e.xpath("a")[0].text_content()

calls text_content on HtmlMixin (http://lxml.de/api/lxml.html.HtmlMixin-class.html)

3.


e.xpath("a/text()")[0]

uses xpath to extract the text

Jonny

flubr · February 24, 2011, 4:02pm

Thanks, thats a clear summary.

The plug-in I’m writing right now is working almost completely, but I still have 2 questions:

Is it possible to ignore this kind of error:

rtmpClip = inhoud.xpath(’//div[@class=“embed”]/script[@type=“text/javascript”]’)[0].text

File “/Users/flubr/Library/Application Support/Plex Media Server/Plug-ins/Framework.bundle/Contents/Resources/Versions/2/Python/Framework/bases.py”, line 142, in

getitem = lambda x, y: x.getitem(y),

IndexError: list index out of range

Normally I should get a list of videos. The rtmpclips are being fetched from some pages. But sometimes one of these pages doesn’t contain a video (error in the website construction), which results in the error message above. The result is that none of the videos are displayed in the list (e.g. if 14 videos have been fetched correctly but one failed because there’s an error on the site, none of the 15 videos are displayed in the list)
The plugin is quite slow since there is a lot of data to be fetched. I can extract a list with URLs. These URLs redirect me to new pages that contain the rtmpClips. So this means that if I want to list of 15 videos, Plex has to go through 15 pages to fetch all the rtmpclips. Is there a way to do this faster? e.g. by fetching the rtmpClip when you open a video from the list?

Here’s my code:


<br />
import datetime, re, pickle<br />
<br />
PLUGIN_PREFIX   = "/video/stubru"<br />
MEDIA      = "http://www.stubru.be/media/results/taxonomy_F_44"<br />
ROOT_URL		= "http://www.stubru.be"<br />
####################################################################################################<br />
def Start():<br />
  Plugin.AddPrefixHandler(PLUGIN_PREFIX, MainMenu, "Stubru", "icon-default.png", "art-default.jpg")<br />
  Plugin.AddViewGroup("Details", viewMode="InfoList", mediaType="items")<br />
  MediaContainer.art = R('art-default.jpg')<br />
  MediaContainer.title1 ="Stubru"<br />
  DirectoryItem.thumb=R("icon-default.png")<br />
  <br />
<br />
#####################################  <br />
def MainMenu():<br />
    dir = MediaContainer()<br />
    dir.Append(Function(DirectoryItem(Videos, title="Meest recent"), pageUrl = MEDIA))<br />
    for item in HTML.ElementFromURL(MEDIA).xpath('//div[@id="sidebar-left-inner"]/div/div/div/div/div[2]/div/ul/li/span/a'):<br />
    	title = item.text<br />
    	link = ROOT_URL + item.get('href')<br />
    	dir.Append(Function(DirectoryItem(Videos, title=title), pageUrl = link))<br />
    return dir<br />
<br />
#####################################<br />
def Videos(sender, pageUrl):<br />
    dir = MediaContainer(title2=sender.itemTitle)<br />
    content = HTML.ElementFromURL(pageUrl, errors='ignore')<br />
    for video in content.xpath('//div[@id="content-middle"]//ul/li/div//div[@class="content-inner"]'):<br />
        image = video.xpath('div/div/div/div[@class="field-item odd"]/a/img')[0].get('src')<br />
        title = video.xpath("div[2]/h2/a")[0].text<br />
        subtitle = video.xpath('div[2]/div[@class="info"]/span')[0].text<br />
        link = ROOT_URL + video.xpath("div[2]/h2/a")[0].get('href')<br />
#        Log(HTTP.Request(link).content)  <br />
        <br />
        inhoud = HTML.ElementFromURL(link)<br />
        summary = inhoud.xpath('//div[@class="text"]/p')[0].text_content()<br />
        rtmpClip = inhoud.xpath('//div[@class="embed"]/script[@type="text/javascript"]')[0].text<br />
        rtmpClip = rtmpClip.split('.flv"')[0]<br />
        rtmpClip = rtmpClip.split('file: "')[1]<br />
#        Log(rtmpClip)<br />
        <br />
        dir.Append(RTMPVideoItem(url="rtmp://vrt.flash.streampower.be:80/stubru/", clip=rtmpClip, live=False, title=title, subtitle=subtitle, summary=summary, thumb=image))<br />
<br />
    return dir<br />

pierre1313 · February 24, 2011, 4:12pm

It the loading o the images that certainly takes a long time.

I usually use a separate function and in the ermpvideo item do:

thumb = Function(getThumb,image)

The getThumb is as follows:

def getThumb(url);

Try:

return DataObject(HTTP.Request(url).content ,'image/jpeg)

except:

return R(icon)

jwray · February 24, 2011, 5:14pm

For the first problem, check the length of the xpath return before using it


<br />
result = inhoud.xpath('//div[@class="embed"]/script[@type="text/javascript"]')<br />
if len(result) > 0:<br />
   rtmpClip = result[0].text <br />

For the second, you need to use a video redirect function. Logically what happens is that instead of extracting the rtmp clip URL in the original page you delegate that to a function that gets called when the video item is selected. The PBS plugin uses this approach

https://github.com/jonnywray/PBS.bundle/blob/master/Contents/Code/__init__.py

See line 138 for how to use it and 151 on for the function that is called (obviously you'll have to replace the logic in this function by your own to extract the rtmp clip URL).

Jonny

Thanks, thats a clear summary.

The plug-in I'm writing right now is working almost completely, but I still have 2 questions:

1) Is it possible to ignore this kind of error:

rtmpClip = inhoud.xpath('//div[@class="embed"]/script[@type="text/javascript"]')[0].text
File "/Users/flubr/Library/Application Support/Plex Media Server/Plug-ins/Framework.bundle/Contents/Resources/Versions/2/Python/Framework/bases.py", line 142, in
_getitem_ = lambda x, y: x.__getitem__(y),
IndexError: list index out of range
Normally I should get a list of videos. The rtmpclips are being fetched from some pages. But sometimes one of these pages doesn't contain a video (error in the website construction), which results in the error message above. The result is that none of the videos are displayed in the list (e.g. if 14 videos have been fetched correctly but one failed because there's an error on the site, none of the 15 videos are displayed in the list)

2) The plugin is quite slow since there is a lot of data to be fetched. I can extract a list with URLs. These URLs redirect me to new pages that contain the rtmpClips. So this means that if I want to list of 15 videos, Plex has to go through 15 pages to fetch all the rtmpclips. Is there a way to do this faster? e.g. by fetching the rtmpClip when you open a video from the list?

Here's my code:
 
import datetime, re, pickle 
 
PLUGIN_PREFIX = "/video/stubru" 
MEDIA = "http://www.stubru.be/media/results/taxonomy_F_44" 
ROOT_URL		= "http://www.stubru.be" 
#################################################################################################### 
def Start(): 
 Plugin.AddPrefixHandler(PLUGIN_PREFIX, MainMenu, "Stubru", "icon-default.png", "art-default.jpg") 
 Plugin.AddViewGroup("Details", viewMode="InfoList", mediaType="items") 
 MediaContainer.art = R('art-default.jpg') 
 MediaContainer.title1 ="Stubru" 
 DirectoryItem.thumb=R("icon-default.png") 
 
 
##################################### 
def MainMenu(): 
 dir = MediaContainer() 
 dir.Append(Function(DirectoryItem(Videos, title="Meest recent"), pageUrl = MEDIA)) 
 for item in HTML.ElementFromURL(MEDIA).xpath('//div[@id="sidebar-left-inner"]/div/div/div/div/div[2]/div/ul/li/span/a'): 
 	title = item.text 
 	link = ROOT_URL + item.get('href') 
 	dir.Append(Function(DirectoryItem(Videos, title=title), pageUrl = link)) 
 return dir 
 
##################################### 
def Videos(sender, pageUrl): 
 dir = MediaContainer(title2=sender.itemTitle) 
 content = HTML.ElementFromURL(pageUrl, errors='ignore') 
 for video in content.xpath('//div[@id="content-middle"]//ul/li/div//div[@class="content-inner"]'): 
 image = video.xpath('div/div/div/div[@class="field-item odd"]/a/img')[0].get('src') 
 title = video.xpath("div[2]/h2/a")[0].text 
 subtitle = video.xpath('div[2]/div[@class="info"]/span')[0].text 
 link = ROOT_URL + video.xpath("div[2]/h2/a")[0].get('href') 
# Log(HTTP.Request(link).content) 
 
 inhoud = HTML.ElementFromURL(link) 
 summary = inhoud.xpath('//div[@class="text"]/p')[0].text_content() 
 rtmpClip = inhoud.xpath('//div[@class="embed"]/script[@type="text/javascript"]')[0].text 
 rtmpClip = rtmpClip.split('.flv"')[0] 
 rtmpClip = rtmpClip.split('file: "')[1] 
# Log(rtmpClip) 
 
 dir.Append(RTMPVideoItem(url="rtmp://vrt.flash.streampower.be:80/stubru/", clip=rtmpClip, live=False, title=title, subtitle=subtitle, summary=summary, thumb=image)) 
 
 return dir

flubr · February 24, 2011, 11:19pm

@pierre: Thanks for your suggestion, but I don’t think this is the problem since it’s also slow when I don’t load any images. The problem is that every time the loop is being executed, another URL is being opened.

@Jonny Wray: Thanks, your solution for the first question seemed to solve the issue. I’ll try to implement your second solution when I’ve got time. I’ll let you know if it worked (but I suppose it will).

pierre1313 · February 24, 2011, 11:41pm

Jonny is the man !

Sorry, I just reread my message from earlier and it does not make any sense at all !! I was trying to type it on my iphone with autocorrect on, trying to conceal it from the wife who is starting to hate the fact that I spend so much time on the forum !!!

flubr · February 24, 2011, 11:44pm

Maybe we should write your wife a plugin to keep her busy while you're spending time on the forum helping us :D

pierre1313 · February 24, 2011, 11:52pm

well I’ve done that already … but she doesn’t watch that much TV … Plus I was really being an ass, replying to the forum at the dinner table …

jwray · February 24, 2011, 11:55pm

Well I usually get something along the lines of ‘you talking to your Plex boyfriends again’.

flubr: the solution I posted is exactly to solve that problem of hitting another URL within a loop. It’s a common issue so there’s a general solution. Shouldn’t be too hard, just move your rtmp clip extraction code to another function.

Jonny

pierre1313 · February 25, 2011, 12:00am

This way the code gets executed when your try to play the video rather than when the list is build. Same idea with the thumbnails.

jwray · February 25, 2011, 12:15am

I’ve actually wondered about that but keep forgetting to ask. With thumbnails you do actually need the URL content (the image) in the list, so what does using the Function approach do? Load them in parallel in an asynchronous manner, or does it load when you visit that list item (but then you need multiple images in a list)?

I’ve actually never used this approach for thumbs in my plugins so have never quite understood what it actually does.

sander1 · February 25, 2011, 2:54am

In most cases it’s faster and makes sense to find/build the final url to a video in a “PlayVideo” function that gets executed when a user clicks a certain item. Usually this is done for speed, as the opening of additional pages is only done when it’s needed.

However, by looking at this piece of code that was posted above, it doesn’t matter when the final video url is build, because the HTML page that is used for the summary also contains the video info:


for video in content.xpath('//div[@id="content-middle"]//ul/li/div//div[@class="content-inner"]'):<br />
        image = video.xpath('div/div/div/div[@class="field-item odd"]/a/img')[0].get('src')<br />
        title = video.xpath("div[2]/h2/a")[0].text<br />
        subtitle = video.xpath('div[2]/div[@class="info"]/span')[0].text<br />
        link = ROOT_URL + video.xpath("div[2]/h2/a")[0].get('href')<br />
#        Log(HTTP.Request(link).content)  <br />
        <br />
        inhoud = HTML.ElementFromURL(link) ### <-- Open the webpage containing the info we want,...<br />
        summary = inhoud.xpath('//div[@class="text"]/p')[0].text_content() ### <-- ...extract the summary text...<br />
        rtmpClip = inhoud.xpath('//div[@class="embed"]/script[@type="text/javascript"]')[0].text ### <-- ...and lets grab the video info too while we are here<br />
        rtmpClip = rtmpClip.split('.flv"')[0]<br />
        rtmpClip = rtmpClip.split('file: "')[1]<br />
#        Log(rtmpClip)<br />

Topic		Replies	Views
plugin tutorial? Dev/API Corner plugin-dev	45	198	December 20, 2019
need basic help with plugin writing Dev/API Corner plugin-dev	132	355	December 20, 2019
Flash player - how to extract video URL ... ? Dev/API Corner plugin-dev	31	1070	December 21, 2019
Only one xpath result Dev/API Corner plugin-dev	13	87	December 20, 2019
Scraping media URL from website Dev/API Corner plugin-dev	17	324	December 20, 2019

Hints for first plugin

Related topics