xpath / CDATA

trouble with cdata
Hi all,

I'm trying to setup a simple plugins and I've got trouble with CDATA.
Information is provided by an XMLfile, some comments are not available.


	URL="http://www.europe1.fr/podcasts/revue-de-presque.xml"<br />
	data=HTML.ElementFromURL(URL,encoding="utf-8")<br />
	item=0<br />
	for item in data.xpath('//item'):<br />
		title	= item.xpath('title')[0].text<br />
		pudDate	= item.xpath('pubdate')[0].text<br />
		url = item.xpath('enclosure')[0].get('url')<br />
		summary = str(item.xpath('summary')[0].text)<br />
		print summary<br />
	return dir



Sometime item.xpath('summary')[0].text > return "None" instead of content.
I'm not sure, but i think that is a trouble with UTF-8 (with french éàè ).

You can find the full code here: https://github.com/whoo/Cantelou.bundle.git

Have you got some clue to get the full summary inside CDATA ?
Thanks ;)

It’s less likely that the CDATA is the problem, than the HTML tags inside the CDATA. Try using:



<br />
summary = item.xpath('./summary/text()')<br />





It should return a list of strings which are separated by tags in the XML. You can put the string back together again using pythons .join(), like so:



<br />
summary = ''.join(item.xpath('./summary/text()'))<br />


Thank’s for your quick answer;



I’ve try both solution :frowning:

but item.xpath(’./summary/text()’) still return nothing when there is some special char “éaè”.

Hi!

You could parse the XML file as XML and use ‘itunes’ namespace to get to the summary element. It’ll look something like this:



url = "http://www.europe1.fr/podcasts/revue-de-presque.xml"<br />
data = XML.ElementFromURL(url) # Parse as xml<br />
<br />
for item in data.xpath('//item'):<br />
    title = item.xpath('./title')[0].text.strip()<br />
    pubDate = item.xpath('./pubDate')[0].text<br />
    url = item.xpath('./enclosure')[0].get('url')<br />
<br />
    # Use the itunes namespace to grab the summary element<br />
    summary = item.xpath('./itunes:summary', namespaces={'itunes':'http://www.itunes.com/dtds/podcast-1.0.dtd'})[0].text<br />
    # Strip out HTML tags<br />
    summary = String.StripTags(summary).strip()

Thank’s for your answer.



It’s working fine now.

I’ve changed:


	<<data=HTML.ElementFromURL(URL,encoding=None)<br />
	>>data=XML.ElementFromURL(URL,encoding=None)



And namespace to use specials tags.


	for item in data.xpath('//item'):<br />
		summary= item.xpath('t:summary',namespaces={'t':'http://www.itunes.com/dtds/podcast-1.0.dtd'})[0].text<br />
		keyword= item.xpath('t:keywords',namespaces={'t':'http://www.itunes.com/dtds/podcast-1.0.dtd'})[0].text<br />
		pubDate=item.xpath('pubDate')[0].text<br />
		url= item.xpath('enclosure')[0].get('url')<br />
		title=item.xpath('title')[0].text.strip()<br />
		summary="[%s]

%s 
%s
keywords:%s "%(title,summary.strip(),pubDate,keyword)<br />
		title=title.strip()<br />
		dir.Append(TrackItem(url,title,"info","Rubrique",summary=summary,art=R(ICON)))<br />
	return dir



My Code is available on github:
https://github.com/whoo/Cantelou.bundle.git


If you're interested in making your channel available in the Plex Channel Directory, there are instructions [here](http://wiki.plexapp.com/index.php/App_Store_Submission) and you can file a ticket for review on the [lighthouse project.](https://plexapp.lighthouseapp.com/projects/31804-plex-plug-ins/overview)

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.