parsing javascript with xpath

David_Veld · December 31, 2009, 12:23am

Hi,

Is it possible in plex parsing a code that was generated with javascript using xpath.

For example, in the script below I want only to extract the string value of sGlobalFileName=‘what-i-think-of-tv-news’, so as a result I want to have ‘what-i-think-of-tv-news’ in the end. Is this possible using xpath? If yes, How?

jwray · December 31, 2009, 12:48am

Short answer. No. Xpath is for parsing XML documents or fragments, no general text.

For that, extract the text of the script tag (that you can do using xpath) then use either regular expression or python substring to extract out the part you need. Messy, but no other way around it.

David_Veld · December 31, 2009, 12:58am

mmm could anyone help me how that would like for a bit, I am also searching the internet for regex but it is not very clear to me…

jwray · December 31, 2009, 1:23am

If you’ve never used regular expressions before now is probably not the time to learn. They are the spawn of the devil (but very powerful).

You need a good python reference. String objects (and all others) have a number of very useful methods on them. You need to use find. Something like (pseudocode)


<br />
start = text.find('sGlobalFileName=') + 17<br />
end = text.find(";", start)<br />
substring = text[start:end]<br />

sansnipple · December 31, 2009, 5:17am

honestly, that js example looks like a primo candidate for rolling into a nice neat JSON object. as for exactly how to do that, someone with more braincells than me will have to take a look.

sander1 · December 31, 2009, 7:27pm

Although regexes are a bit more difficult, I think they are the best way to extract the data you want (in this case).


<br />
import re<br />
<br />
webpage_content = HTTP.Request('http://www.example.com/pagecontainingthejavascript.html')<br />
title = re.search("sGlobalFileName='(.+?)';", webpage_content).group(1)<br />

A little explanation about the regex:
(...) = a group within a regex, you can have multiple groups within one regex, you retrieve them with the group function. The above expression could also be written like this:


result = re.search("sGlobalFileName='(.+?)';", webpage_content)<br />
title = result.group(1)

. = matches any character except a newline
+ = match 1 or more repetitions of the preceding expression
? = make the expression ungreedy (= grab as few characters as possible)

Without the "?" the result of this regular expression would be:
what-i-think-of-tv-news';EmbedSEOLinkURL='http://www.break.com/';EmbedSEOLinkKeywords='Funny Videos';sGlobalContentID='355403';sGlobalContentTitle=document.getElementById("vid_title").getAttribute("content");sGlobalCategoryID='7';sGlobalContentFilePath='2007/8';sGlobalContentUrl='http://www.break.com/index/what-i-think-of-tv-news.html';sGlobalContentIDEncoded='boEGsDRmcc5fL4GHC%2bhfyA%3d%3d';sSubmittedBY='rbbtcee';sKeywordTitle='Flashes,man,News,reporter,What I Think of TV News';sKeywordString='Flashes,man,News,reporter

David_Veld · January 1, 2010, 12:35pm

Wow thanks, This really helped me! Thanks for the explanation.

David_Veld · January 1, 2010, 1:26pm

I created the following code using regex but there must be something that I forgot because it does not work.

Does anyone have any suggestions?


def Video(sender, url):<br />
<br />
  dir = MediaContainer(title3=sender.itemTitle, art=R(ART), viewGroup="InfoList")<br />
<br />
  videos = XML.ElementFromURL(url, isHTML=True, errors='ignore').xpath(XPATH_VIDEOS)<br />
  for content in videos:<br />
    title = content.xpath("./a/span")[0].text<br />
    thumb = content.xpath("./a/img")[0].get('src')<br />
    summary = content.xpath("./a")[0].get('title')<br />
    url = content.xpath("./a")[0].get('href')<br />
    dir.Append(Function(VideoItem(PlayVideo, title=title, summary=summary, thumb=thumb), url=url))<br />
<br />
  return dir<br />
<br />
####################################################################################################<br />
<br />
def PlayVideo(sender, url):<br />
	<br />
	video_link = HTTP.Request(url)<br />
	file_name_link = re.search("sGlobalFileName='(.+?)';", video_link)<br />
	file_path_link = re.search("sGlobalContentFilePath='(.+?)';", video_link)<br />
	file_path = file_path_link.group(1)<br />
	file_name = file_name_link.group(1)<br />
	<br />
	total_video_link = 'http://media1.break.com/dnet/media/' + file_path '/' + file_name '.flv'<br />
	<br />
	dir.Append(VideoItem(total_video_link))

sansnipple · January 1, 2010, 5:23pm

see my response to the other topic you started.

http://forums.plexapp.com/index.php?/topic/11941-play-video-using-regex/page__view__findpost__p__70717

sander1 · January 1, 2010, 5:44pm

Your indentation is maybe wrong, but you also need to change the PlayVideo function.

This:


dir.Append(VideoItem(total_video_link))

needs to be replaced by this:


<br />
return Redirect(total_video_link)<br />

You're also missing two plusses here:
total_video_link = 'http://media1.break.com/dnet/media/' + file_path + '/' + file_name + '.flv'

sander1 · January 1, 2010, 6:20pm

You can also do the regex stuff once and grab everything you need. Here is an “optimized” (shorter) PlayVideo function:


<br />
def PlayVideo(sender, url):<br />
<br />
  video_link = HTTP.Request(url)<br />
  link = re.search("sGlobalFileName='(.+?)';.+sGlobalContentFilePath='(.+?)';", video_link, re.DOTALL)<br />
  total_video_link = 'http://media1.break.com/dnet/media/' + link.group(1) + '/' + link.group(2) + '.flv'<br />
<br />
  return Redirect(total_video_link)<br />

David_Veld · January 1, 2010, 11:16pm

Wow… I feel so stupid. I just left the computer for a couple of hours alone and tried to get my concentration back and now I returned to it and read your messages it all sees so clear.

Thank you for your support!

system · December 20, 2019, 8:34pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Tvolucion - Xpath Problem Dev/API Corner plugin-dev	16	109	December 20, 2019
XPATH for Dummies Dev/API Corner plugin-dev	11	780	December 21, 2019
Parsing HTML with xpath Dev/API Corner scanner-agent-dev	7	167	December 20, 2019
play video using regex Dev/API Corner plugin-dev	10	110	December 20, 2019
Regex Problem Dev/API Corner plugin-dev	6	93	December 20, 2019

parsing javascript with xpath

Related topics