XML parsing issue

I'm accessing an XML file which contains a channel list.

 

The list is in the following format:

<?xml version="1.0"?>

  Channel Name
  http://path/to/video.m3u8

My code looks like this:

 XML_SOURCE              = HTML.ElementFromURL(XML_URL, headers = CUSTOM_HEADERS)
    CHANNELS                = XML_SOURCE.xpath("//item")
    
    for CHANNEL in CHANNELS:           
        CHANNEL_NAME        = "".join(CHANNEL.xpath("./title/text()"))
        CHANNEL_URL         = "".join(CHANNEL.xpath("./link/text()"))
    
        Log(CHANNEL_NAME)
        Log(CHANNEL_URL)
    
    return CHANNEL_LIST

My problem is that I can't get the value in link. I can get title just fine.

 

I also tried using ./*[local-name()='link']/text() but that gives me nothing as well.

 

When I try  XML.ElementFromURL(XML_URL) instead of HTML.ElementFromURL, I get the following error:

2014-08-12 13:04:38,837 (10addd000) :  CRITICAL (runtime:883) - Exception (most recent call last):
  File "/Users/loz/Library/Application Support/Plex Media Server/Plug-ins/Framework.bundle/Contents/Resources/Versions/2/Python/Framework/components/runtime.py", line 843, in handle_request
    result = f(**d)
  File "/Users/loz/Library/Application Support/Plex Media Server/Plug-ins/Framework.bundle/Contents/Resources/Versions/2/Python/Framework/handlers/base.py", line 119, in call
    result = self.func(*args, **kwargs)
  File "/Users/loz/Library/Application Support/Plex Media Server/Plug-ins/PremierAdLive.bundle/Contents/Code/__init__.py", line 62, in MainMenu
    CHANNEL_LIST                = GetChannelList()
  File "/Users/loz/Library/Application Support/Plex Media Server/Plug-ins/PremierAdLive.bundle/Contents/Code/__init__.py", line 115, in GetChannelList
    XML_SOURCE              = XML.ElementFromURL(XML_URL)
  File "/Users/loz/Library/Application Support/Plex Media Server/Plug-ins/Framework.bundle/Contents/Resources/Versions/2/Python/Framework/api/parsekit.py", line 345, in ElementFromURL
    ).content, encoding=encoding, max_size=max_size)
  File "/Users/loz/Library/Application Support/Plex Media Server/Plug-ins/Framework.bundle/Contents/Resources/Versions/2/Python/Framework/api/parsekit.py", line 301, in ElementFromString
    return self._core.data.xml.from_string(string, encoding = encoding)
  File "/Users/loz/Library/Application Support/Plex Media Server/Plug-ins/Framework.bundle/Contents/Resources/Versions/2/Python/Framework/components/data.py", line 162, in from_string
    return etree.fromstring(markup, parser=(xml_parser if remove_blank_text else None))
  File "lxml.etree.pyx", line 2743, in lxml.etree.fromstring (src/lxml/lxml.etree.c:52665)
  File "parser.pxi", line 1573, in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:79932)
  File "parser.pxi", line 1452, in lxml.etree._parseDoc (src/lxml/lxml.etree.c:78774)
  File "parser.pxi", line 960, in lxml.etree._BaseParser._parseDoc (src/lxml/lxml.etree.c:75389)
  File "parser.pxi", line 564, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:71739)
  File "parser.pxi", line 645, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:72614)
  File "parser.pxi", line 585, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:71955)
XMLSyntaxError: Extra content at the end of the document, line 6, column 1

It's obviously barfing on some unexpected formatting or similar.

Looks like an RSS feed and should likely be parsed with the feed parser rather than as HTML or XML.

http://dev.plexapp.com/docs/api/parsekit.html?highlight=rss#module-RSS

https://pythonhosted.org/feedparser/

Example in the PlexPodcast channel.

I got it to work, by converting the to a string and then regexing for the title and link. I then actually looked at the string and saw that the XML was malformed, with no closing tag for , meaning the actual URL existing as a node of the , i.e.:


  Channel Name
  http://path/to/video.m3u8

I didn't know about the feed parser though, so will look into that – will that be faster?

It's not necessarily faster and may not work any better with malformed data but, it does come in handy for parsing RSS feeds.

Having to compensate for bad data is a real PITA. It's a shame that you need to deal with that.

I couldn't get it to work with the feed parser. So I'm sticking with what I've got for now. Though, as per my other thread, I've got slightly bigger problems...

Thanks for your help as always though

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.