This isn’t a bug, it’s a limitation that we are trying to remove but haven’t yet.
Basically when the audio needs to be transcoded and subtitles are enabled, the only way we can keep them in sync (and to deliver them to the clients) is to burn the subtitles into the video. When the audio doesn’t need to be transcoded everything direct plays rather than direct streams, so the subtitles can be played directly by the client.
The limitation itself is in the HLS protocol which only supports subtitles over WebVTT, which we currently don’t support.