element an go directly to the fourth loop. If you replace the word “continue” with “pass”, it would not pick up the link variable for that loop, but would go on to pick up the title variable.
Options for Writing Xpath Code:
If you look at examples of Plex channels, you will see two different options for how to word your xpath code.
Example 1: title = item.xpath('./title')[0].text
url = item.xpath(‘./a')[0].get('href')
Example 2: title = item.xpath('./title/text()')[0]
url = item.xpath(‘./a/@href')[0]
I tend to use the second method, because that can be entered directly into the xpath checker and give you the proper results, so when you have the right code, you can just cut it from the xpath checker and paste it directly into your channel code. This also gives you the option to use some of the variations of this code, for example, if there are multiple lines of text within the element, using /text() at the end of a line will just return the first line of text associated with an element, where using //text() will return all lines of text associated with the element. See last post in topic for more information on this.
When a document contains CDATA:
If a document contains CDATA tags around the text that you are trying to pull, the text() portion of the xpath command automatically removes the CDATA tag from the data returned. Note: Xpath does not ignore these CDATA tags, but the Plex Framework does, so when using an xpath checker, these CDATA tags will not be ignored.
Pulling data within data:
Every once in a while you will encounter a document where a field you pull with xpath contains more elements. For example, you may have an XML document that has a description element that contains and image element inside it like:
Here is the description for the video that also contains an image http://wwwwebsite.com/image.jpg
In that case, once you pull the outer data in as a string, you would then use the HTML.ElementfromString(str) to create elements from that string and then use xpath to pull the inner element contained in the string.
summary = page.xpath(‘//description/text()’)[0]
data = HTML.ElementfromString(summary)
image = data.xpath(‘//image/text()’)[0]
When an XML document contains namespace data:
Some XML documents contain data within the items or entries that require namespace info. You will recognize these line because they contain a colon within the entry field name. To pull data that have namespaces, you have to first find the namespace associated with that piece of data. This is usually defined at the top of the XML document. Then include a variable for that namespace in your code, and add a reference to that variable to the end of the xpath command for that pull.
NAMESPACES = {'media': 'http://search.yahoo.com/mrss/'}
media = page.xpath(’./media:content//@url’, namespaces=NAMESPACES)[0]
Using the contain option in xpath:
Sometimes you will come across HTML documents where you want to specify the class or id but you want more than one variation for this identifier to be included in the matches. For those situations you can use the contains option to make the class or id for your xpath pull a little more vague.
for items in page.xpath('//ol/li[contains(@id,"carousel")]')
You can also use the contains option is to look for multiple strings with the attribute of a tag using and/or.
url = html.xpath('//li[contains(@class,"navigation") and contains(@class,"right")]/a/@href')[0]
For more info on choosing the best format for your xpath code and ensure you are using the most reliable path to the data, see the following tutorial: http://devblog.plexapp.com/2012/11/14/xpath-for-channels-the-good-the-bad-and-the-fugly/