RFE: Plex and Unicode

dane22 · October 28, 2013, 12:05am

Struggling here with a small plug-in, where I might encounter characters, that does not confirm to regular Unicode decoding in Python 2.7

Like the following:

A filename like: Doppelgänger is reported like the following:

Plex Database: Doppelg%C3%A4nger

FileSystem: Doppelga%CC%88nger

So, researching shows, that even using normalize method, doesn't work for all characters :(

But since Plex itself can handle this, I was wondering, if the function they use could be exposed in the framework, so all dev's didn't have to maintain their own table?

Best Regards

Tommy

grayfm · October 30, 2013, 3:22am

Hey dane22,

Ah, the fun of unicode. The issue your having is because of unicode decomposition. I won't launch into a large explanation of it but there is a good wikipedia article on the subject here.

The Plex scanners always use the NFKC normalization algorithm which stores the filenames in their composed form. If you are comparing the database with the filesystem you will need to do the same with strings from the filesystem. Here is a quick demonstration:

import urllib, unicodedata
filename = urllib.unquote('Doppelga%CC%88nger').decode('utf8')
composed_filename = unicodedata.normalize('NFKC', filename)
urllib.quote(composed_filename.encode('utf8')) # => Doppelg%C3%A4nger

Or you could decompose the ones from the Plex database using NFKD.

There should be no reason why it won't work for all characters, where were you finding this to be true?

Cheers,

Gray

dane22 · November 2, 2013, 6:22pm

First of all.....Sorry about the delay in responding here, but somehow I wasn't notified as normally about your reply :(

Your code is a lot more adv. than the one I used, so thanX for that, however I'm still faced with the same problem.

Code to grap the file names is:

####################################################################################################
# This function will scan the filesystem for files
####################################################################################################
@route(PREFIX + '/listTree')
def listTree(top, files=list()):
	Log.Debug("******* Starting ListTree with a path of %s***********" %(top))
	r = files[:]
	try:
		if not os.path.exists(top):
			Log.Debug("The file share [%s] is not mounted" %(top))
			return r
		for f in os.listdir(top):
			pathname = os.path.join(top, f)
			Log.Debug("Found a file named : %s" %(pathname))
			if os.path.isdir(pathname):
				r = listTree(pathname, r)
			elif os.path.isfile(pathname):
				filename = urllib.unquote(pathname).decode('utf8')
				Log.Debug("Tommy 1 %s" %(filename))
				composed_filename = unicodedata.normalize('NFKC', filename)
				filename = urllib.quote(composed_filename.encode('utf8'))
				Log.Debug("Tommy 2 %s" %(filename))			
				r.append(filename)
			else:
				Log.Debug("Skipping %s" %(pathname))
		return r
	except UnicodeDecodeError:
		Log.Critical("Detected an invalid caracter in the file/directory following this : %s" %(pathname))

When given the following parameter:

/share/MD0_DATA/FindMovies/Les Misérables/Les Misérables (1080p HD).m4v

it returns back the following filename:

/share/MD0_DATA/FindMovies/Les%20Mis%C3%A9rables/Les%20Mis%C3%A9rables%20%281080p%20HD%29.m4v

However, in the database, the following is registred:

/share/MD0_DATA/FindMovies/Les%20Mise%CC%81rables/Les%20Mise%CC%81rables%20%281080p%20HD%29.m4v

So I needed some way of matching %C3%A9 with the following e%CC%81

And the result was simply to run your code against the database output as well, and the both ended up as eq.

Sir....I salute you for the feedback and insight, as well as sharing code

Best Regards

Tommy

system · December 21, 2019, 12:20am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Fix for Unicode Exception in Scanners.bundle Dev/API Corner scanner-agent-dev	2	91	December 20, 2019
Plex Unicode Equivalence/Normalization Issues General Discussions	4	96	January 7, 2020
Unicode support? General Discussions	3	198	December 21, 2019
File unavailable - ostensibly due to accented characters Desktops & Laptops server-mac	6	217	November 22, 2018
Local Media Agent errors when paths/files contained non-ASCII characters. General Discussions	20	443	January 7, 2020

RFE: Plex and Unicode

Related topics