At least as of 1.20.1.3252, Plex server is using aresample with ocl=stereo to downmix 5.1 audio when playing on chromecast. But resulting audio has much lower volume of speech compared to sound effects. ffmpeg has -ac flag specifically for downmixing, which works differently from aresample filter.
To further illustrate the difference, below is a screenshot of waveforms of 3 tracks downmixed from the same source:
Plex’s downmix being too quiet is a common complaint. And there are other feature requests for louder audio. But I think you’ve hit the nail on the head with this one.
A really good argument for -ac is that it obeys the NTSC recommendations for downmixing.
I’ve had this bookmarked for years, and it has analysis similar to yours.
I really like the “Nightmode Dialogue” answer for listening in noisy places, Planes, Trains and Automobiles.
I wonder if Plex’s method predates -ac working well, or if it provides other functionality/generality. (Or if it plays well with EasyAudioEncoder.)
As much as I agree that there is much to be desired with the loudness of downmixed audio, I somehow doubt that the loudnorm filter can be applied while streaming the file. From what I gathered working with ffmpeg, this filter requires a separate “analysis” run, prior to the actual conversion.
Yeah, loudnorm works better with two passes, but it can also work in one pass.
However, that’s beside the point: using -ac already would be a huge improvement. I think it even might be as simple as removing ocl=2 and adding -ac 2 in the transcoder command line.
Now I’m wondering why rematrix_maxval is set. The default is 1, which would already avoid clipping - possibly at the expense of loudness. If it’s being set to 0, I’m confused that things aren’t louder.
Every time I think I understand some parts of Plex I learn that I’m missing an entire dimension.
I made a simple wrapper around Plex Transcoder to do exactly that: remove ocl and insert -ac 2 right after that. It seems to work so far, but I’ll keep an eye out for breakages.
wrapper
package main
import (
"os"
"regexp"
"syscall"
)
var oclRe = regexp.MustCompile(":ocl=(2|stereo|'stereo'):")
func rewriteArgs(args []string) []string {
r := []string{args[0]}
for i := 1; i < len(args); i++ {
switch {
case args[i] == "-filter_complex":
r = append(r, args[i])
i++
m := oclRe.FindStringSubmatchIndex(args[i])
if len(m) >= 2 {
r = append(r, args[i][0:m[0]]+args[i][m[1]-1:], "-ac", "2")
} else {
r = append(r, args[i])
}
default:
r = append(r, args[i])
}
}
return r
}
func main() {
args := rewriteArgs(os.Args)
syscall.Exec(args[0]+"_org", args, os.Environ())
}
(number in the first column is max level in each output)
It seems that it’s rematrix_maxval that is messing up the volume. loudnorm can bring it back to normal, but only if placed after aresample (and then another aresample instance is needed to get to the final desired sample rate from 192kHz).
I think I’ll just filter out rematrix_maxval from ffmpeg invocation and see if anything breaks.