XML parsing issue

papalozarou · August 12, 2014, 11:16am

I'm accessing an XML file which contains a channel list.

The list is in the following format:

<?xml version="1.0"?>

  Channel Name
  http://path/to/video.m3u8

My code looks like this:

 XML_SOURCE              = HTML.ElementFromURL(XML_URL, headers = CUSTOM_HEADERS)
    CHANNELS                = XML_SOURCE.xpath("//item")
    
    for CHANNEL in CHANNELS:           
        CHANNEL_NAME        = "".join(CHANNEL.xpath("./title/text()"))
        CHANNEL_URL         = "".join(CHANNEL.xpath("./link/text()"))
    
        Log(CHANNEL_NAME)
        Log(CHANNEL_URL)
    
    return CHANNEL_LIST

My problem is that I can't get the value in link. I can get title just fine.

I also tried using ./*[local-name()='link']/text() but that gives me nothing as well.

When I try XML.ElementFromURL(XML_URL) instead of HTML.ElementFromURL, I get the following error:

2014-08-12 13:04:38,837 (10addd000) :  CRITICAL (runtime:883) - Exception (most recent call last):
  File "/Users/loz/Library/Application Support/Plex Media Server/Plug-ins/Framework.bundle/Contents/Resources/Versions/2/Python/Framework/components/runtime.py", line 843, in handle_request
    result = f(**d)
  File "/Users/loz/Library/Application Support/Plex Media Server/Plug-ins/Framework.bundle/Contents/Resources/Versions/2/Python/Framework/handlers/base.py", line 119, in call
    result = self.func(*args, **kwargs)
  File "/Users/loz/Library/Application Support/Plex Media Server/Plug-ins/PremierAdLive.bundle/Contents/Code/__init__.py", line 62, in MainMenu
    CHANNEL_LIST                = GetChannelList()
  File "/Users/loz/Library/Application Support/Plex Media Server/Plug-ins/PremierAdLive.bundle/Contents/Code/__init__.py", line 115, in GetChannelList
    XML_SOURCE              = XML.ElementFromURL(XML_URL)
  File "/Users/loz/Library/Application Support/Plex Media Server/Plug-ins/Framework.bundle/Contents/Resources/Versions/2/Python/Framework/api/parsekit.py", line 345, in ElementFromURL
    ).content, encoding=encoding, max_size=max_size)
  File "/Users/loz/Library/Application Support/Plex Media Server/Plug-ins/Framework.bundle/Contents/Resources/Versions/2/Python/Framework/api/parsekit.py", line 301, in ElementFromString
    return self._core.data.xml.from_string(string, encoding = encoding)
  File "/Users/loz/Library/Application Support/Plex Media Server/Plug-ins/Framework.bundle/Contents/Resources/Versions/2/Python/Framework/components/data.py", line 162, in from_string
    return etree.fromstring(markup, parser=(xml_parser if remove_blank_text else None))
  File "lxml.etree.pyx", line 2743, in lxml.etree.fromstring (src/lxml/lxml.etree.c:52665)
  File "parser.pxi", line 1573, in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:79932)
  File "parser.pxi", line 1452, in lxml.etree._parseDoc (src/lxml/lxml.etree.c:78774)
  File "parser.pxi", line 960, in lxml.etree._BaseParser._parseDoc (src/lxml/lxml.etree.c:75389)
  File "parser.pxi", line 564, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:71739)
  File "parser.pxi", line 645, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:72614)
  File "parser.pxi", line 585, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:71955)
XMLSyntaxError: Extra content at the end of the document, line 6, column 1

mikedm139 · August 12, 2014, 4:15pm

It's obviously barfing on some unexpected formatting or similar.

Looks like an RSS feed and should likely be parsed with the feed parser rather than as HTML or XML.

http://dev.plexapp.com/docs/api/parsekit.html?highlight=rss#module-RSS

https://pythonhosted.org/feedparser/

Example in the PlexPodcast channel.

papalozarou · August 13, 2014, 7:56am

I got it to work, by converting the to a string and then regexing for the title and link. I then actually looked at the string and saw that the XML was malformed, with no closing tag for , meaning the actual URL existing as a node of the , i.e.:


  Channel Name
  http://path/to/video.m3u8

I didn't know about the feed parser though, so will look into that – will that be faster?

mikedm139 · August 13, 2014, 3:46pm

It's not necessarily faster and may not work any better with malformed data but, it does come in handy for parsing RSS feeds.

Having to compensate for bad data is a real PITA. It's a shame that you need to deal with that.

papalozarou · August 13, 2014, 4:03pm

I couldn't get it to work with the feed parser. So I'm sticking with what I've got for now. Though, as per my other thread, I've got slightly bigger problems...

Thanks for your help as always though

Topic		Replies	Views
problems parsing through XML Dev/API Corner plugin-dev	0	109	April 26, 2014
stuck at parsing xml from URL Dev/API Corner scanner-agent-dev	2	121	March 23, 2013
Need some help with a channel. Dev/API Corner plugin-dev	14	200	July 9, 2013
xpath coding Dev/API Corner plugin-dev	49	424	August 7, 2014
XML Parsing problems Dev/API Corner plugin-dev	2	83	May 12, 2009

XML parsing issue

Related topics