Correct encoding for non-english subtitles

Hi,

I'm happy plex user for sometime now, but there is still one thing that pisses me off. There is several languages for which icu based encoding detectors fails. Czech is one of them. All czech subtitles are in windows-1250, but libicu (charlock_holmes, enca and plex I guess) detects it as iso-8859-2, which makes all the language specific characters disappear and render the subtitles completely useless. I know that it works when we transcode it into utf8 and I know about SRT2UTF-8 bundle, I'm using subliminal for downloading and transcoding subtitles but non of that is a solution, it's just workaround and not even complete and decent workaround. It won't work with embded subtitles in mkv or mp4 files and we shouldn't need third party tool to make this work. So I'm here and begging you, developers.. 

 

TLDR;

Please let us specify prefered subtitles encoding on language bases in simple editable configuration file like { cs : "windows-1250", ... } and look into it when you try to decode subtitles with language suffix. Just check if the subtitles looks like utf8 and if not, use prefered encoding if specified, fallback to autodetection otherwise.

 

I'm pretty sure we'll make nice shippable list of encodings really fast if you let us to do so.

1 Like

Most 3.Party clients only works with utf-8 encoded subtitles, thus why we need to convert them.

And speaking of....regardless of if srt2utf-8 works for you, I would love some feedback, both good and bad, in my thread, since I, as a Dane, has no way of confirming this.

/Tommy

i've read diff post that the server bug is doing this.

would be nice to get this sorted same issue with polish language :(

i've read diff post that the server bug is doing this.

would be nice to get this sorted same issue with polish language :(

It's a very hard thing to fix 100%.

In fact, nobody, and also taking big companies like Google etc. into account here, has made a 100% sure detection/fix for all codepages.

But do take my SRT2UTF-8 plugin for a spin, and maybe that'l work for you?

If doing so, I would love some feedback in the SRT2UTF-8 thread regardless of the outcome.

Best Regards

Tommy

sorry for the late reply. added it and will begin testing! ;) thanks

By the way. How does it work? When sub’s will work? Added plugin configured it and still getting weird characters ;(

By the way. How does it work? When sub's will work? Added plugin configured it and still getting weird characters ;(

First of all remember, that it's only for srt files, located next to the movie that it works for...

And you'll have to do a forced refresh of the section

yep, got that :)

movie / episode name: Breaking Bad - 1x01 - Pilot.avi

movie / episode subs name: Breaking Bad - 1x01 - Pilot.PL.srt

section forced still no copies and originals showing wrong characters :(

appreciate the help mate :)

After a forced refresh, zip and upload the logfile

logs below, hope these are the correct logs

Link 

From log:

2014-07-03 13:54:07,500 (170c) : DEBUG (logkit:13) - File trigger is “D: v series\Breaking Bad\Breaking Bad - 5x11 - Confessions.avi”
2014-07-03 13:54:07,551 (170c) : DEBUG (logkit:13) - Found a valid subtitle file named “Breaking Bad - 5x11 - Confessions.PL.srt”
2014-07-03 13:54:07,611 (170c) : DEBUG (logkit:13) - The subtitle file named : D: v series\Breaking Bad/Breaking Bad - 5x11 - Confessions.PL.srt is already encoded in utf-8, so skipping

Saying that the file is already encoded in utf-8, and if so, it's a bad subtitle, or has been converted incorrectly by some editor/util!

(Try and get a new bunch of srt's)

Also found in the logs, that you have a lot of srt's without language code!

Found a valid subtitle file named "Game of Thrones - 1x08 - The Pointy End.srt"

Did you in the preferences set a preferred language?

/T

Only game of thrones are without codes downloaded before the taking logs :wink:

But about sub’s how do I now they are correct tho … So do I need Mon uft in polish so when the plugin converts to utf it will work?

Preferred language in Plex for sub’s and your plugin is set to polish

Only game of thrones are without codes downloaded before the taking logs ;)
But about sub's how do I now they are correct tho ... So do I need Mon uft in polish so when the plugin converts to utf it will work?

You need to download, rename to match your movies, and put next to your movies.....

Then scan with Plex, and the plug-in should hopefully take care of it.

Do NOT use any editor before that

/T

i'm using napi project software for polish subs

and downloading already in utf-8

do i need to change that and download in diff code so your plugin changin it to uft-8?

i get all the naming stuff ;)

this might sounds stupid but how do i know i have correct subs?

my server is on windows 7 English so there no polish language added so even if the file is right i wont see it i guess right?

this might sounds stupid but how do i know i have correct subs?

my server is on windows 7 English so there no polish language added so even if the file is right i wont see it i guess right?

Wrong....It has nothing to do with the language of Windows....

When you name your srt files like mymovie.pl.srt I know that this is polish, so I only have to check the file for utf-8, ISO-8859-2 and windows-1250 codepages....

If I detect utf-8, I skip the file, and if not, I try to determine what codepage the file is in, so I can successfully convert it to utf-8

But if you open the file with an editor, that doesn't convert the contents, but save it as utf-8, then my plugin can't fix it.

Regarding the so called napi project, I don't know it, but bet that it's broken then.

/T

And just noticed, that you posted in the devs forums.

Please continue talk here:

https://forums.plex.tv/topic/94864-rel-str2utf-8/?p=553659

in the support thread

/T

its ok got it!! you're a star :)

napi project software set to download polish SRT in ANSI

rename to match title with .PL at the end

go to plex to whole show or single episode ;) refresh (not force update)

it will download meta data and start your plugin ;) after that i can see all changed into polish language with correct characters, i have also your plugin extra file for each subtitle (extension .srt.Srt2Utf-8)

not sure if to keep it or not it seems like after deleting its all working anyway so might delete ;)  should i ??? ;)

1 Like

not sure if to keep it or not it seems like after deleting its all working anyway so might delete ;)  should i ??? ;)

I make those to avoid anger statements from people about my plug-in has corrupted all their subtitles  :rolleyes:

They are a backup of the srt files from before I convert them...

If it works for you now, you can go into the settings (Little gear icon) on the Srt2Utf-8 agent, and disable backup

that way, they'll not be created anymore