CASTtv plugin

CASTtv has lots of recent and older tv shows online
ok guys, here we go again. i have some how confused the crap out of myself. i have made 2 apps so far, now all the sudden the xpath search is eluding me.

so here we go

I am using the xpath tool in firefox, and the firbug tool in firefox also.

but i cant seem to get the xpath to work in plex. i keep getting either syntax errors, or list is too large


User-Agent: Plex Firefox/2.0.0.11<br />
Host: localhost:32400<br />
Accept: */*<br />
Connection: keep-alive<br />
X-Plex-Language: en<br />
X-Plex-Version: 0.8.5-f4b13b5<br />
<br />
19:01:47.399786: com.plexapp.plugins.vbs                 :   (Framework) Loaded en strings<br />
19:01:47.399911: com.plexapp.plugins.vbs                 :   (Framework) Handling request :  /video/CASTtv/:/function/VideoPage/KGRwMApTJ3RpdGxlVXJsJwpwMQpTJ2h0dHA6Ly93d3cuY2FzdHR2LmNvbS9zaG93cy90b3AtZ2VhcicKcDIKc1Mnc2VuZGVyJwpwMwpjY29weV9yZWcKX3JlY29uc3RydWN0b3IKcDQKKGNQTVMuT2JqZWN0cwpJdGVtSW5mb1JlY29yZApwNQpjX19idWlsdGluX18Kb2JqZWN0CnA2Ck50cDcKUnA4CihkcDkKUydpdGVtVGl0bGUnCnAxMApTJ1RvcCBHZWFyJwpwMTEKc1MndGl0bGUxJwpwMTIKUydTaG93cycKcDEzCnNTJ3RpdGxlMicKcDE0Ck5zUydhcnQnCnAxNQpOc2JzLg__<br />
19:01:47.409181: com.plexapp.plugins.vbs                 :   (Framework) Calling named function 'VideoPage'<br />
19:01:47.506536: com.plexapp.plugins.vbs                 :   (Framework) Received gzipped response from http://www.casttv.com/shows/top-gear<br />
19:01:47.531731: com.plexapp.plugins.vbs                 :   (Framework) An exception happened:<br />
Traceback (most recent call last):<br />
  File "/Users/sethreid70/Library/Application Support/Plex Media Server/Plug-ins/Framework.bundle/Contents/Resources/Versions/1/Python/PMS/Plugin.py", line 640, in __call<br />
    return function(*args, **kwargs)<br />
  File "/Users/sethreid70/Library/Application Support/Plex Media Server/Plug-ins/CASTtv.bundle/Contents/Code/__init__.py", line 42, in VideoPage<br />
    vidUrl = item2.xpath('.//div[@id="main_column"]/div//div/div//div/a[@class="episode_column02"]')[0].get('href')<br />
IndexError: list index out of range<br />




ok, can we go dumb for a second. like REAL dumb.



alright, thats the code im trying to get the data from. im missing something, i see the nodes. but i cant seem to graps how they are interacting.

so, be VERY CLEAR

// = search ALL of the xml ( in its first instance )????
//div/ = slash after node means i want you to look at the Div nodes DIRECT children?

so if we look at the picture i posted,

//div[@id="main_column"]/ = its now looking for the
with the id="main_column", then its looking at the DIRECT CHILD in that node for the next node

//div[@id="main_column"]/div = so like above, its finding the main column node. then looking for the DIRECT CHILD in that node for all other div's? or just the NEXT one?

//div[@id="main_column"]/div//div = so this one is then searching for a div node that is ANYWHERE in the node before it? 1 or all?

//div[@id="main_column"}/div//div/div/a = find
then a
thats a DIRECT child then a
thats ANYWHERE in the previous node, then a
thats a DIRECT child of the previous, then a node that is a DIRECT child of the previous?


basically im having problems filtering it down. and even when i use the specific tags in the xpath. sometimes the plex code freaks out and wont even let me use it. just says "invalid syntax, or invalid argument"

so, explain to me stupid like. i want to know where each one of the are being targeted and how.

because so far, i cant grasp it.

to be it looks like,

// = searches everywhere in the xml to start
/ = searches directly under the previous node
// = searches for a node anywhere in the previous node ( equal to? or under? like child, or on the same plane? )

this is really frustrating, haha. i see it, i can get the xpath in the firefox plugin but as soon as i try it in plex. it wont work.

soooooo im off to go read some more about the limited help out there. its too generalized. but if you guys out there can just give me some hints on how to "trace" where the data needs to be grabbed from and how the code structure should look, that would be great.

and yes, i looked at numerous other apps, I cant seem to find the common thread. not sure if im retarded or what..... but i dont see it. or if i do... it turns out completely wrong.

i had one xpath work that only returned 1 title and url... but yet the xpath tool shows 186

Simply:

http://www.w3schools…path_syntax.asp



You probably only got one result b/c your “for loop” is iterating on (’//div[@id=“main_column”]’)



Maybe you could use:


for item2 in content.xpath('//a[@class="episode_column02"]'):


since all the data you are parsing seems to come from that specific tag.

It also seems a bit redundant that you are parsing the same image 138 times. I would set the image outside of the loop.

Your issue is with


item2.xpath('.//div[@id="main_column"]/div//div/div//div/a[@class="episode_column02"]')

.



The leading ‘./’ says look under the item2 node in the tree. That itself was defined by ‘//div[@id=“main_column”]’ so you are looking for div tags with id=‘main_column’ as children of other div tags with id=‘main_column’. There aren’t any so the list is empty and you get index out of range.



To address you questions:




Not sure what you mean by in its first instance but yes, look anywhere in the xml tree


Trailing slash is not valid xpath. //div is valid and means look for div tags anywhere in the document tree.



Again, not valid with the slash but without it finds all div tags with the id attribute equal to 'main_column'.


This will return ALL div nodes that are direct children of div nodes with id attribute equal to 'main_column'


Finds ALL div nodes anywhere in the tree under the nodes defined by //div[@id="main_column"]/div



yes, if I understand you correctly.

What you haven't mentioned (and it causing the issue) is the difference between './div' and '//div'. The first is relative to the node the xpath is operating on (ie item2 in the above code), the second operates on the whole document.

All that said, I'll echo Dbl_A and ask why you are looping at all if there's only one entry, the latest episode, which can be obtained directly with no loop. The episode list under it will require a loop.

Jonny

thanks guys i will check that.



and Dbl_A i read that. 5 times. didnt explain it enough for me.



the reaons why im calling things wrong in the loop is because i didnt know that was the loop.



when i first posted about making an app i explained that i know very little about this style of python scripting/ programming. so really, this is new to me.



so if you think i dont know about something, or am using it wrong. i probably am.



I didnt know the ./ and // were different in the front part of the string. i thought they all were that way.



also johnny the trailing slash wasnt to search anything it was to mean, you are putting something else there and that would be X child. it was for example purposes in this thread, but that seems to have not worked out.



now how do you set the image outside the loop when you have to use the loop to find the specific image for a specific show?

so i got that part squared away, with johnny wrays code. thanks.



now, this site has episodes that are not actually available without paying.



but they are designated by images



is there a way, or i should say i know there is a way of filtering the results. will plex respond to that type of code?



can we do a “if” / “or” ?



My bad! I think I thought you were trying to use this image

http://www.casttv.com/misc/webapp/img_shows/7/6753.jpg




You apparently wanted to use the "little play/pay" button.

There are ways to filter. You can use if/else statements, while loops, and/or operators, etc...

-Dbl_A

The plugins are written in Python which is a full programming language. Any constructs available in Python are available for plugins to use. This includes all the usual flow control constructs (loops, if then else, etc) as well as data structures (lists, maps, sets, etc). If you don’t know what I’m talking about here it could be a wise thing to take a step back, find a basic Python tutorial (or book) and learn some basics. You’ll be banging your head against unmovable walls if you don’t know the simple stuff, which your comment about not realizing you were in a loop make be believe.



To be clear. This construct


<br />
for item2 in content.xpath('//a[@class="episode_column02"]'):<br />



implies content.xpath('//a[@class="episode_column02"]') returns a collection/list and the construct is looping over that list using standard python loop constructs. Each element in the list (which are XML DOM elements) are then assigned to the item2 local loop variable, which you can then operate on in the body of the loop to extract out whatever data you need from that DOM element. Before you try and go any further you really need to understand what's happening here - it is the basis of all plugins.

Jonny


alright guys thanks, i will look into it



dbl_a, sadly i want to use both, so you were semi right. but i think i took care of that. but im still stuck on the little image and how to filter it.



johnny



i think i understand some of what the code is doing right there. i just dont understand the structure completely.



like so.



for item2 content.xpath(…



content is a variable right? and the content.xpath statement means? that content should be found in the xpath procedding the . ?



from my understanding the loop is searching the XML for all related data to the variable we put in the ( ) section of the “loop” or xpath like i call it.



so when the “loop” finds the data, it then looks to the append statement to where it will be used.



basically its like CSS kind of.



the css style sheet for a webpage houses a bunch of terms, or acronyms. this makes it easier to give a section settings without having to type them completely out, and can be used over and over easily.



now the way the xpath is working is its finding the nodes, that have the info we want.



then we are designating the “loop” results to correspond with the variable we named it.



i.e.



something = item2.xpath(…)



so we are giving the “something” term the results from the “loop”. for the specific nodes we told it to search in.



this is my understanding so far. the only thing im having issues with it seems is understanding how to call the data in the xml…somehow it got more complicated from the VBS plugin to the mtv and cast plugin… strange.

Honestly, your reply still makes it seem like you are confused with very basic concepts. You keep talking about the loop searching, which it isn’t doing, it is simply looping over a list of objects. That list of objects results from an xpath search, or pattern matching, operation.



I’ll try again by breaking it down into more explicit steps:



<br />
content = XML.ElementFromURL(url, True)<br />




uses a framework object (XML) to construct an python object representing the XML document/tree found at the end of the url (after converting it from html). The object is then assigned to the variable content.


<br />
nodeList = content.xpath('//a[@class="episode_column02"]')<br />




executes the xpath method on the content object. The xpath method returns a list of python objects, each object in that list representing an XML node. The objects returned are those for nodes that match the given xpath predicate. The returned list is assigned to the nodeList variable.


<br />
for node in nodeList:<br />




is a simple python loop accessing all elements in the node list. Since each element in the list is itself an object representing XML tree, xpath methods can be called on these individual elements also.

Jonny

ok thats starting to make more sense, thats what i was looking for as far as the explanation.



not sure if i want to continue on this site though.



upon further research, casttv.com is similar to plex. its “tunneling” streams through their player from other sites.



so im not sure if will even work correctly to “tunnel” it twice.



let alone the Top Gear episodes i found ( on of the reasons i was doing this ) were not very good quality.



hmmm



johnny -



without sounding dumber.



when i say “searching” i mean this



content.xpath('//a[@class="episode_column02"]') 



the section in the ( ) is the location the xpath is "searching" or atleast thats what im calling it.

and i understand that python is scrapping the html source for the page and finding and returning the nodes that are called. im just having an issue finding how to filter it enough to get what i need.

more complicated html source seems to trick my brain, and make it hard for me to see the nodes, and where the children and parents are.

this is why a tutorial on how to do a plex aopp would be awesome. the best thing for me is to see the steps, and the interactions. what happens after you do this, what changes after you do this.

more often than not if i make a change, it breaks the app and i cant tell if anything was right or not.

oh well, i will keep plugging away, thanks for the help.

Yeah, when starting a new plugin seeing if I can access the media is the first thing I do. No point in going forward if that isn’t the case.



As for the scraping: maybe you should attempt a site with a cleaner layout - pure XML maybe via either a web service or an RSS feed. They are generally much easier to work with under xpath. But yes, you can consider the xpath a search - it is really a path language. You can consider XML as a tree of nodes and the xpath syntax describes paths through that tree starting at the root node. It will return all leaves it finds at the end of the tree path.



Jonny


**** firebug you thwarted me!!! *****

ok i think i see my problem.



im using firebug to look at the code, and it indents very strangely so sometimes its hard to see where a node is closed or not. so that can limit the amount of data in that node im calling.



looking at the source now, im seeing that the main_column node closes after only a few lines. thats probably why that doesnt work.



so, im going to continue to try at this. so any help would be appriciated

im editing, i figured it out


"//" -by itself, is a syntax error.

YES!

"//" Best way to describe is from W3Schools... "Selects nodes in the document from the current node that match the selection no matter where they are."

you are "searching" "all" tags/nodes with the "class" attribute = "blah blah" that have a DIRECT
,
then "all" of the
tags/nodes that are children of that. Then the DIRECT children and the DIRECT
Looks something like this:

<a class='blah blah'><br />
   <div><br />
      <div><br />
        <a><br />
       	<img><br />
      <div><br />
        <a><br />
       	<img><br />
      <div><br />
        <a><br />
       	<img><br />




If the tags/nodes are on the same indentation level, then they are siblings, NOT children.

It "searches" for all

i posted while you were writing that haha



but i will read it.



is there anyway you guys could give me an IF else example



i just spent an hour searching the net and finding NOTHING that fits what i want.



if img/@src = /image/blah/blah.gif ( slow down, this isnt what i think will work, just giving an example )



Log(vidUrl)



else ? delete? remove? ignore?



every if statement i see just throws it out there, so its hard to see it in my code.



and i know im being patient, but part of me is frustrated as it feels as though im bothering everyone, the questions i ask are meet with links, or repeat answers that are not answering the question i asked. I understand its most likely my fault for saying the wrong terminology, but i post code just so everyone can see what i “think” im talking about. plopping my code into any plex will show the problems, so its not that im not “providing” enough info.



and really, the xpath documentation online is really rediculous. i remember having these same issues when programming php for my database.



no “detailed” info



syntax is very specific, but yet no one shows it. syntax’s can be combined i understand that. but who knows what can be combined and what cant? and how the structure should look? thats what gets me.



weres the starting point, there is none. everyone just shows code from the middle of the xml, or a snippet so generalized that you cant get any real info from it.



( everyone = internet everyone )



thanks for all the help and understanding.

simple if example:



if image == "pay.jpg":<br />
	continue<br />
else:<br />
	dir.append...



can also be written:

if image != "pay.jpg"<br />
	dir.append...

Did anyone ever complete the CastTV plugin?



:unsure:

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.