I did some digging in their code and provided a pull request, to which they haven’t responded yet.
Basically, their frontend is hardcoded to check the entered api key to openai.com’s servers, but the backend did accept an env variable to supply a custom backend. I updated their code to also use that env variable in their frontend code.
So I did get it working in docker compose with a local whisper server, in the end. I’ll find the docker compose if someone’s interested. My fixed code can be found on their pull requests tab still.
That being said, this still feels more like a “proof of concept” than a finished project- for example it has no integration with plex at all, and doesn’t use something like ffprobe in order to see what subtitles are already present.
It simply checks your disc to see if you have a .srt (generated by them, not something like title.eng.srt) present or not.
The algorithm that splits the audio up into chunks before transcribing does work fairly well, but it isn’t perfect, though still better than just running Whisper point-blank, which has issues in long-form contexts.