Our forum migration to Discourse is underway and scheduled to last through June 21. During the migration, the forums will be read-only, except for a single temporary forum (contents of which will not be getting transferred). Read our announcement post for more information about the forum migration.
Hey folks, there is a new Podcast category for forums https://forums.plex.tv/categories/podcasts
If you have not already, we suggest setting your Plex username to something else rather than email which is displayed on your posts in forum. You can change the username at https://app.plex.tv/desktop#!/account
Welcome to our forums! Please take a few moments to read through our Community Guidelines (also conveniently linked in the header at the top of each page). There, you'll find guidelines on conduct, tips on getting the help you may be searching for, and more!

Unicode (UTF-8)

One of the things I cannot get working in my Plug-in are the most exotic characters such as åäö. As I'm from Finland and the plug-in contents as well, this is vital. And also some of the videos don't even play because of the errors caused by this.

File "/Users/xxx/Library/Application Support/Plex/Plex Media Server.app/Contents/Resources/Python/Resources/PMS/MediaXML.py", line 59, in SetAttr
	self.root.set(name, value)
  File "lxml.etree.pyx", line 641, in lxml.etree._Element.set (src/lxml/lxml.etree.c:9596)
  File "apihelpers.pxi", line 416, in lxml.etree._setAttributeValue (src/lxml/lxml.etree.c:31761)
  File "apihelpers.pxi", line 1136, in lxml.etree._utf8 (src/lxml/lxml.etree.c:37215)
ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes


It seems that the MediaItem is receiving some unacceptable data. This happened when a video title contained some of the aforementioned characters mentioned earlier. For example 'ä' is displayed like this in the ui: 'ä'. Even though I have selected Unicode from the display settings. Plug-ins like YouTube, for instance, do display the characters correctly. I just can't seem to grasp why.

The site that I'm playing with has declared:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="fi" lang="fi">
...
<meta http-equiv='Content-Type' content='text/html; charset=UTF-8' />


Maybe some of you who has more knowledge of these character encoding issues can say in what part of the code should I do what.
YLE Areena plug-in Plex wiki | download

Comments

  • atrusatrus Members, Plex Pass, Plex Ninja Posts: 11,100 Plex Ninja
    edited February 2009
    Finx wrote on Feb 27 2009, 01:05 PM:
    One of the things I cannot get working in my Plug-in are the most exotic characters such as åäö. As I'm from Finland and the plug-in contents as well, this is vital. And also some of the videos don't even play because of the errors caused by this.

    File "/Users/xxx/Library/Application Support/Plex/Plex Media Server.app/Contents/Resources/Python/Resources/PMS/MediaXML.py", line 59, in SetAttr
    	self.root.set(name, value)
      File "lxml.etree.pyx", line 641, in lxml.etree._Element.set (src/lxml/lxml.etree.c:9596)
      File "apihelpers.pxi", line 416, in lxml.etree._setAttributeValue (src/lxml/lxml.etree.c:31761)
      File "apihelpers.pxi", line 1136, in lxml.etree._utf8 (src/lxml/lxml.etree.c:37215)
    ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes
    


    It seems that the MediaItem is receiving some unacceptable data. This happened when a video title contained some of the aforementioned characters mentioned earlier. For example 'ä' is displayed like this in the ui: 'ä'. Even though I have selected Unicode from the display settings. Plug-ins like YouTube, for instance, do display the characters correctly. I just can't seem to grasp why.

    The site that I'm playing with has declared:
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="fi" lang="fi">
    ...
    <meta http-equiv='Content-Type' content='text/html; charset=UTF-8' />
    


    Maybe some of you who has more knowledge of these character encoding issues can say in what part of the code should I do what.


    Sorry, I can't help, but Im having similar or exactly the same problems (depending if I read you correctly). Swedish characters are problematic to show in the infoList. If add for example:
    dir.AppendItem(DirectoryItem("noje", "Nöje", ""))
    ... the App doesnt even show in the list. If changed to o from ö then the App shows in the list...
     Mac mini 2,3 GHz Quad-Core Intel Core i7 (With Fusion drive) (PM score: 7347)  Panasonic VT50 (55 inch)  Pioneer VSX-2021 Receiver  Norco 24-bay NAS (CPU: Intel Xeon E3-1240v2 3,4GHz, PM score: 9300) 
    New to Plex/Nine? Watch my screencast video (slightly out-dated nowadays, but still informative for new users)
    Want to know how to handle subtitles? Watch my subtitle screencast video (out-dated, but still informative), and this One Minute Subtitle Addendum screencast
    Always attach log files if you are reporting an issue: Log file info & location
    Get Plex guides here: Howto name your files NAS guide Troubleshooting Plex Support Page
  • marcoppcmarcoppc Members Posts: 42
    Finx wrote on Feb 27 2009, 01:05 PM:
    ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes
    


    try to replace myvarwithumlaut with myvarwithumlaut.encode("utf-8")
  • marcoppcmarcoppc Members Posts: 42
    atrus wrote on Feb 27 2009, 01:57 PM:
    dir.AppendItem(DirectoryItem("noje", "Nöje", ""))


    dir.AppendItem(DirectoryItem("noje", u'Nöje', ""))
    
  • FinxFinx Members Posts: 38
    marcoppc wrote on Feb 27 2009, 06:44 PM:
    dir.AppendItem(DirectoryItem("noje", u'Nöje', ""))
    


    I tried these:

    dir.AppendItem(DirectoryItem('TEST$Test', u'åäö', ''))
    dir.AppendItem(DirectoryItem('TEST$Test', unicode('åäö', 'utf-8'), ''))
    


    They both result in:
    ...(Framework) Couldn't find en strings
    ...(Framework) Couldn't find en-us strings
    
    YLE Areena plug-in Plex wiki | download
  • marcoppcmarcoppc Members Posts: 42
    Hm, which editor do you use?
    Perhaps it saves the code in latin-1. If not, try adding the following line to the top

    # -*- coding: utf-8 -*-
    
  • FinxFinx Members Posts: 38
    marcoppc wrote on Feb 27 2009, 08:02 PM:
    Hm, which editor do you use?
    Perhaps it saves the code in latin-1. If not, try adding the following line to the top

    # -*- coding: utf-8 -*-
    


    Am using TextMate and UTF8...

    That line did it! Thank you!
    YLE Areena plug-in Plex wiki | download
  • FinxFinx Members Posts: 38
    Now I keep getting this when trying the decoding on further ui levels:
    TypeError: decoding Unicode is not supported
    
    YLE Areena plug-in Plex wiki | download
  • marcoppcmarcoppc Members Posts: 42
    Finx wrote on Feb 27 2009, 10:17 PM:
    Now I keep getting this when trying the decoding on further ui levels:
    TypeError: decoding Unicode is not supported
    


    Are you perhaps trying to convert a variable that already is unicode?
  • jamjam Plex Dev Team Members, Plex Employee, Plex Pass, Plex Ninja Posts: 4,303 Plex Employee
    edited February 2009
    Finx wrote on Feb 27 2009, 07:37 PM:
    They both result in:
    ...(Framework) Couldn't find en strings
    ...(Framework) Couldn't find en-us strings
    

    Just so you know, those log lines aren't UTF8 errors. They're framework messages related to localization. You can include strings files for multiple languages in your plug-in. This will allow the plug-in to be automatically localized based on language information provided by Plex (not yet implemented, but will be used within the next few versions). Strings files are JSON dictionaries using key-value pairs, and should be saved in the Strings folder, e.g. /Contents/Strings/en.json. You can then easily localize your plug-in using the _L method defined in PMS.Shorthand, e.g.
    from PMS.Shorthand import _L
    
    localizedString = _L("KeyName")
    


    The framework is logging those lines as it sets the default language to en-us, and your plug-in doesn't include strings for that language.
  • FinxFinx Members Posts: 38
    marcoppc wrote on Feb 28 2009, 12:02 AM:
    Are you perhaps trying to convert a variable that already is unicode?


    I have been trying to figure out what's the problem here. I found this googling: http://www.red-mercury.com/blog/eclectic-t...ery-of-the-day/. It could explain the behavior that I'm experiencing. As the page author states:
    So, you tell Python to encode the string as utf-8. If the string is already represented as unicode, the above exception fires. This situation can easily happen if whatever processed the string in the first place applied the wrong encoding (for example, if iso-8559-1 was incorrectly applied to a utf-8 stream).


    So maybe the original site that I'm requesting data from is being read as iso-8559-1 and not utf-8. What made my suspicious is the fact that the site doesn't declare any xml document type, like <?xml version="1.0" encoding="UTF-8"?>. Could this be the reason?
    YLE Areena plug-in Plex wiki | download
  • FinxFinx Members Posts: 38
    This is how I got it working. I'm pretty sure it's not the proper way to do it. But at least I get the ui to display some proper characters:

    def utf8decode&#40;s&#41;:
      s = s.encode&#40;&#34;iso-8859-1&#34;&#41;
      return s.decode&#40;&#34;utf-8&#34;&#41;
    
    for item in XML.ElementFromURL&#40;PLUGIN_ROOT, True&#41;.xpath&#40;&#34;//div&#91;@class='foo'&#93;&#34;&#41;:
      item_name = utf8decode&#40;item.xpath&#40;&#34;a&#34;&#41;&#91;0&#93;.text&#41;
    ...
      dir.AppendItem&#40;DirectoryItem&#40;item_id+&#34;&#036;&#34;+item_name, item_name, Utils.EncodeStringToUrlPath&#40;thumb&#41;, summary&#41;&#41;
    


    So basically first I encode the text to iso-8859-1 and then decode it back to utf-8 in order to be able to append the item. Weird? Well, I think so, too. This is what made me try it out: http://groups.google.com/group/comp.lang.p...0bca2728df55bc6

    I also tried to play with BeautifulSoup to no avail.

    Someone has likely a more elegant solution to this.
    YLE Areena plug-in Plex wiki | download
Sign In or Register to comment.