Unicode support ? ...

Überflieger in Österreich ‎
**** UPDATE ****:

Apparently this works when the container is filled and returned to PLEX

WHAT DOES NOT WORK is the passing of foreign characters from one ROUTINE to another. In my case, I am pre-processing the elements for the container in a separate procedure and when I RETURN RESULTS ... the problem starts. Once I am back in the procedure filling the actual container everything seams to work.

As a WORKAROUND, I am now doing the encoding / decoding processing just when I fill the container and not in any procedure before that.



Here is how "simple" encoding / decoding should work.

First find out if / what codec is being used by the HTML / XML piece of text you want to convert.

In my case it was Latin-1 (NOTE: This codec has many other 'names' e.g. 'iso-8859-1')

PLEX looks out for utf-8 encoded strings. thus I needed to write this:

**TITLE.encode('Latin-1').decode('utf-8')**

to get the TITLE correct: first we have to convert the unicode-string to UNICODE from LATIN-1

then from UNICODE we convert the unicode-string to a UTF-8 string

NOTE: Be aware that the encode/ decode methods are different for STRING and UNICODESTRING objects.
See [here](http://stackoverflow.com/questions/447107/whats-the-difference-between-encode-decode-python-2-x) for more info.

******


I get TITLES / SUBTITLES / and DESCRIPTIONS from a GERMAN website.

The German-typical characters are encoded in various ways some are "plain".

e.g. **TITLE = "Überflieger in Österreich"**

Now I am LOST on how to best handle them within my plug-in.

If I pass **TITLE = "Überflieger in Österreich".encode('Latin-1')** to the media container, then this will be displayed correctly :-)

However, when I take the input from the web site using

**WEBPAGE = XML.ElementFromURL(...)**

**TITLE = WEBPAGE.xpath('//tr')[1].xpath("//td[2]/a/text()").encode('Latin-1')**

Type of TITLE is "_ElementStringResult" I get an the following error when the media container is RETURNED to PLEX

[codebox]14:02:19.437170: com.plexapp.plugins.compizmediacenter : (Framework) Response OK
14:02:19.438629: com.plexapp.plugins.compizmediacenter : (Framework) An exception happened:
Traceback (most recent call last):
File "/Users/plex/Library/Application Support/Plex Media Server/Plug-ins/Framework.bundle/Contents/Resources/Versions/1/Python/PMS/Plugin.py", line 341, in __run
resultStr = result.Content()
File "/Users/plex/Library/Application Support/Plex Media Server/Plug-ins/Framework.bundle/Contents/Resources/Versions/1/Python/PMS/Objects.py", line 109, in Content
return XML.StringFromElement(self.ToElement())
File "/Users/plex/Library/Application Support/Plex Media Server/Plug-ins/Framework.bundle/Contents/Resources/Versions/1/Python/PMS/Objects.py", line 179, in ToElement
root.append(item.ToElement())
File "/Users/plex/Library/Application Support/Plex Media Server/Plug-ins/Framework.bundle/Contents/Resources/Versions/1/Python/PMS/Objects.py", line 105, in ToElement
el.set(key, unicode(self.__dict__[key]))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 6: ordinal not in range(128)[/codebox]

Unicode issues are a bit tricky, especially with all the codings out there and the number of pages which either lie about their encodings or have strings in multiple encodings on them :slight_smile:



In any event, I’m glad you found a work around for your problem, and I appreciate you posting the solution you found.

Elan,



one quick Q. … any pointer on how to transform from HTML-“encoded” characters to unicode (Latin-1)?



e.g. from östereich ==> Österreich (Austria).



I understand now how to transform from one codec to another … but still struggle with the “codec” of HTML-based character-entities.


I had the same problem. Easy solution here: [http://effbot.org/zone/re-sub.htm#unescape-html](http://effbot.org/zone/re-sub.htm#unescape-html)
Oh and thanks for your plugin guide, it helped me get started on making a plugin :)

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.