Unicode and special characters in DirectoryObjects and MediaContainers

I am developing a Plex plugin that lets you watch recordings made on a MythTV server. I am currently working to "prettify" the interface: icons and background (thumb & art), descriptions, etc.
 
But I have noticed that some clients (Roku, web) sometimes revert to showing just the title of the recordings, instead of all the pretty artwork. Other clients (iPad) show the same list just fine.
 
Analyzing the scenarios, the problems seem to appear whenever an ObjectContainer has an entry with unicode (eg. æøå) or special characters (eg. ") in it. I've written some pseudo-code to demonstrate what's causing the problem: 
...
oc = ObjectContainer(title2="MythTV recordings", art=backgroundUrl)
oc.add(
   DirectoryObject(
      key=
         Callback(
            RecordingDetails,
            recordingTitle='Some recording title with æøå'
         ), 
      title='Some recording title with æøå',
      summary='The new season of "Some recording title with æøå" starts' 
      thumb=iconUrl
)

@route(’/video/mythrecordings/RecordingDetails’)
def RecordingDetails(recordingTitle):

Looking at the XML that's produced by the code, it looks something like this

...

...

I notice two potential problems here - one the special characters in the key (which is just a relative URL), the other the extra double-quotes in the summary attribute.

 

(Note: oddly enough, selecting the DirectiryItem above actually still leads you to the proper recording details screem - it's just the presentation that fails)

 
The strings with weird characters come from MythTV (which in turn gets them from my EPG provider), so "just use strings without special characters" is not a solution.
 
Instead I've tried all sorts of encodings - cgi.encode(), string.encode('ascii', 'xmlcharrefreplace') - on these strings, but then I get an on-screen  title that reads something like 
Some recording title with æø&acircle

So obviously some of the strings should not be encoded.

 
But which strings are causing the problems? And how should they be encoded?
 
I am not done experimenting yet, but I was wondering if anyone out there had ever seen this problem?
 
/thomas
 
 
PS: I'm writing this on my way to work, so all the examples are from memory - I'll update with the correct examples and possibly screenshots when I get home
 

@schaumburg

/video/mythrecordings/RecordingDetails?recordingTitle=Some%20recording%20title%20with%20æøå is actually a valid URI, there's nothing wrong with these characters being here. These are probably even in the ANSI set, else I'd expect them to be encoded.

Plex uses lxml library to produce the output XML, it also takes care of all the encoding, you shouldn't encode the values yourself in order to get the valid XML. The double quotes in the summary attributes are encoded too, it's just your web browser that unencodes them for convenience. To be sure use the View Source function in the browser, you'll see "s instead of " symbols. It would also be a terrible bug in the lxml if the double quotes actually appeared unencoded.

If you're missing thumbnails or backgrounds, check the XML for the 'track' and 'art' attributes in the XML elements. Maybe they're missing (I don't see it in the example you've posted above) or have incorrect values.

@czukowski

Garh - I hadn't thought about my browser prettying up the results for me. I'll try "show source" straight away when I get home (it's looking to be a long day)

I'll also update the from-memory examples with the real stuff.

Thanks for your pointers, though!