help on XML parsing via XPATH

Hi all,

 

I am really sorry, but reading the XPATH introduction and browsing some example code I am still not getting it ... I am working to read data from my VDR system to enrich the plugin I started.

 

The plugin calls a URL like this:

 

http://10.8.0.2:8002/channels.xml?group=Hauptprogramme

 

and retreives this:



Das Erste HD
1
C-41985-1051-11100
false
Hauptprogramme
418
C-41985-1051-11100.ts
false
true
false
false
false


ZDF HD
2
C-102-1079-11110
false
Hauptprogramme
394
C-102-1079-11110.ts
false
true
false
false
false


arte HD
3
C-9999-401-17113
false
Hauptprogramme
762
C-9999-401-17113.ts
false
true
false
false
false


3sat
4
C-102-1079-28007
false
Hauptprogramme
394
C-102-1079-28007.ts
false
true
false
false
false


RTL Television
5
C-9999-161-12101
false
Hauptprogramme
442
C-9999-161-12101.ts
false
true
false
false
false


RTL2
6
C-9999-161-12105
false
Hauptprogramme
442
C-9999-161-12105.ts
false
true
false
false
false


ProSieben
7
C-9999-161-12103
false
Hauptprogramme
442
C-9999-161-12103.ts
false
true
false
false
false


SAT.1
8
C-9999-161-12102
false
Hauptprogramme
442
C-9999-161-12102.ts
false
true
false
false
false


VOX
9
C-9999-161-12104
false
Hauptprogramme
442
C-9999-161-12104.ts
false
true
false
false
false


kabel eins
10
C-9999-161-12106
false
Hauptprogramme
442
C-9999-161-12106.ts
false
true
false
false
false


DAS VIERTE
11
C-9999-191-11102
false
Hauptprogramme
450
C-9999-191-11102.ts
false
true
false
false
false


SIXX
12
C-9999-181-20116
false
Hauptprogramme
530
C-9999-181-20116.ts
false
true
false
false
false


Tele 5
13
C-9999-191-12111
false
Hauptprogramme
450
C-9999-191-12111.ts
false
true
false
false
false

13
13

Now my issue is to access the values in the param nodes. I tried a lot with ./param[@name='name'] or /param/text() or combination of both. My guess I simply was not able to get from the docs how to do this ... 

 

Could anyone give me a hint please? Thank would be really great. 

 

Thanks a lot and best regards,

Alex

 

 

You were almost there, I think you just need to combine the methods.  You would probably want something like (assuming you're already looping through the each //channel):

./param[@name='name']/text()

That's very "dumbed down" but should give you a starting point.  You will probably want to be looping through all the channels and then reading each param accordingly using the above approach.  The method above will find a specific param and then should show the text() value of it.  This is just off the top of my head, I didn't actually test ... 

Hope this helps.

It is the anonymous namespace (the xmlns="http://www.domain.org/restfulapi/2011/channels-xml" part) you need to take into account, for example like so:

NAMESPACES = {'a': 'http://www.domain.org/restfulapi/2011/channels-xml'}

for channel in xml.xpath(’//a:channel’, namespaces=NAMESPACES):
name = channel.xpath(’./a:param[@name=“name”]’, namespaces=NAMESPACES)
channel_id = channel.xpath(’./a:param[@name=“channel_id”]’, namespaces=NAMESPACES)

Hello gerk,

Hello Sander1,

thanks a lot - heelemal bedankt ;-)

I guess the Namespace stuff was really what I was missing. Now I figure that I really did not get Python yet with how loops and so on work properly... For my understanding of your suggestion, Sander:

for channel in xml.xpath('//a:channel', namespaces=NAMESPACES):

Translates to:  for each "channel"-node returned by the xpath query, right?

I guess I need some more example-reading on how this works ;-))

However, this was the right pin-point guess and let's see how I can get this to work during the next evenings.

Cheers,

Alex

Now I figure that I really did not get Python yet with how loops and so on work properly... For my understanding of your suggestion, Sander:
for channel in xml.xpath('//a:channel', namespaces=NAMESPACES):

Translates to:  for each "channel"-node returned by the xpath query, right?

Hello Alex,

Yes, that is correct. You can give the variable used in the for loop a different name if you like, but I usually stick with the name of the node or something similar.

So, another question ;-)

But first - I am getting really far by now, Channellogos, etc .. all works pretty fine.

When I receive EPG data for a current TV event the XML contains it like this:

164826
Monitor
Berichte zur Zeit

Genre: Politik Kategorie: Information Land: D Jahr: 2014 MONITOR will Hintergrund liefern, Diskussionen anstoßen, Themen setzen. Unsere Handschrift: seriöse Information, gepaart mit einer sorgfältigen Analyse. Kritischer, investigativer Journalismus wird in der Redaktion großgeschrieben. "Wir bringen Bewegung in die öffentliche Diskussion und wollen unbequem sein. Wir teilen nach allen Seiten aus", so beschreibt Sonia Seymour Mikich die Aufgabe von MONITOR. Sie leitet die Redaktion seit Januar 2002. Unsere sachlich-nüchterne und kritische Berichterstattung ist seit über 40 Jahren gefragt. Moderator: Georg Restle Altersempfehlung: ab 0 Audio: Stereo Zweikanal Flags: [PrimeTime] [Untertitel] Quelle: DVB/EPGDATA

C-41985-1051-11100
Das Erste HD
1389300300
1800
78
16
0
1389300300

1










false
false


however, the string extracted by xpath seems to be "corrupted" (I assume this is an encoding issue):

[u'Genre: Politik Kategorie: Information Land: D Jahr: 2014

MONITOR will Hintergrund liefern, Diskussionen ansto\xdfen, Themen setzen. Unsere Handschrift: seri\xf6se Information, gepaart mit einer sorgf\xe4ltigen Analyse. Kritischer, investigativer Journalismus wird in der Redaktion gro\xdfgeschrieben. “Wir bringen Bewegung in die \xf6ffentliche Diskussion und wollen unbequem sein. Wir teilen nach allen Seiten aus”, so beschreibt Sonia Seymour Mikich die Aufgabe von MONITOR. Sie leitet die Redaktion seit Januar 2002. Unsere sachlich-n\xfcchterne und kritische Berichterstattung ist seit \xfcber 40 Jahren gefragt.

Moderator: Georg Restle

Altersempfehlung: ab 0

Audio: Stereo Z…

I tried using the encoding=UTF-8 parameter with XML.ElementFromUrl, but that did not help. I could of course manually replace all kinds of substrings - but isn't this a common issue with a common solution ?

Cheers,

Alex

An xpath query will always return a list, even if there's just 1 item to return, so you need to always add the [0] (I see I forgot to put that in my example code posted earlier). The u' means it's a unicode string. is a newline and the \xXX are unicode encoded characters, which should not be a problem at all and be displayed correctly in Plex clients.

NAMESPACES = {'a': 'http://www.domain.org/restfulapi/2011/events-xml'}

xml = XML.ElementFromURL(‘http://…’)
description = xml.xpath(’//a:event[@name=“description”]’, namespaces=NAMESPACES)[0]

*arg*  You are so right - and I am so sorry  :rolleyes:

That did the trick. Now I need to see if I can add custom objects for pictures and then I am done ;-) great! 

Thanks a lot sander1.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.