Automate subtitle offsets with voice to text
So we all have had these subs that are off by a few seconds or start to slowly drift due to different frame rates or pauses in the media files. Especially the subtitles that will need adjustment multiple times during viewing are super annoying.
My idea is to utilize some public library that can generate text from audio. I don't propose to generate subtitles on the fly as this would need Siri/Cortana grade speech recognition and then there's the subtitle presentation/timing esthetics.
What I do propose is to generate OK translations from speech that are somewhat accurate. These results could be matched with subtitles present in plex to find common patterns and ultimately determine an offset.
I have rough ideas about how to implement the above, I can share those at a later point if anyone is interested.
What I am unclear about is how plex exactly handles subtitles. I know transcoding will burn them in, so this would be the easiest case as everything happens serverside. But how about direct streams? Are the subtitles transmitted completely or is it streamed in chunks? For the former it would mean that the server would need to update the client with (new) offsets while for the latter I can simply adjust future chunks.
In any case, my idea is to have the server determine offsets and not push this logic client side. Ideally this needs to be an implement once use everywhere kind of solution.
But are we even able to build something like this into plex? I'm not familiar with the API or other exposed parts. I only know that the actual server is closed source. I don't think the above could be managed with metadata plugins.
So what would be the options here? The key point would be (just to summarize) to determine offsets continuously for media during playback. I'm not interested in only client side solutions as they are limited in audience and likely not future proof.
Would love to hear your thoughts on this!