Weirdly fluctuating volume/normalization when transcoding TrueHD to EAC3

When playing a movie with TrueHD audio on my TV, Plex transcodes the audio from TrueHD to EAC3 because my TV can’t decode TrueHD.

I’ve noticed that this will have the volume fluctuate weirdly sometimes. It can be subtle, but it’s very obvious and annoying in other places. It’s like atrocious volume normalization.

I have narrowed down the issue to the transcoder and rules out my TV: I have grabbed the transcode segment from Plex’s tmp directory. Playback on my PC gives the exact same phenomenon, the volume clearly fluctuates the same way. (I changed the video quality to 240p to force plex to create the temp files, before it would only create the EAC files and grabbing those is rather tricky because it deletes them constantly)

I could attach the sample clip here, but I’m not sure how that is copyright wise, it’s 3 seconds long.

Here are the settings that Plex uses for transcoding:

Mar 27, 2021 15:06:33.712 [0x7f6fc2ffd700] DEBUG - [Transcode/JobRunner] Job running: EAE_ROOT='/tmp/pms-7654c25b-9147-43a3-8b95-cb3f9eb380aa/EasyAudioEncoder' FFMPEG_EXTERNAL_LIBS='/config/Library/Application\ Support/Plex\ Media\ Server/Codecs/73e06c8-3759-linux-x86_64/' X_PLEX_TOKEN=xxxxxxxxxxxxxxxxxxxxj' '/usr/lib/plexmediaserver/Plex Transcoder' 
'-codec:0' 'hevc' 
'-codec:1' 'truehd_eae' 
'-eae_prefix:1' 'qldeahgxhijtfk0kfkd2kq6s_' 
'-noaccurate_seek' 
'-analyzeduration' '20000000' 
'-probesize' '20000000' 
'-i' '/media/Movies/2160p HDR/Star Trek Into Darkness (2013)/Star Trek Into Darkness UHD Remux.mkv' 
'-map' '0:0' 
'-metadata:s:0' 'language=eng' 
'-codec:0' 'copy' 
'-filter_complex' '[0:1] aresample=async=1:ocl='\''7.1'\'':rematrix_maxval=60.000000dB:osr=48000[0]' 
'-map' '[0]' 
'-metadata:s:1' 'language=eng' 
'-codec:1' 'eac3_eae' 
'-eae_prefix:1' 'qldeahgxhijtfk0kfkd2kq6s_' 
'-b:1' '1000k' 
'-segment_format' 'mpegts' 
'-f' 'ssegment' 
'-individual_header_trailer' '0' 
'-segment_time' '10' 
'-segment_start_number' '0' 
'-segment_copyts' '1' 
'-segment_time_delta' '0.0625' 
'-segment_list' 'http://127.0.0.1:32400/video/:/transcode/session/qldeahgxhijtfk0kfkd2kq6s/2ecec0bc-cd05-4174-b5f8-d3380f09c211/seglist?X-Plex-Http-Pipeline=infinite' 
'-segment_list_type' 'csv' 
'-segment_list_size' '5' 
'-segment_list_separate_stream_times' '1' 
'-segment_list_unfinished' '1' 
'-max_delay' '5000000' 
'-avoid_negative_ts' 'disabled' 
'-map_metadata' 
'-1' 
'-map_chapters' 
'-1' 'media-%05d.ts' 
'-start_at_zero' 
'-copyts' 
'-vsync' 'cfr' 
'-y' 
'-nostats' 
'-loglevel' 'quiet' 
'-loglevel_plex' 'error' 
'-progressurl' 'http://127.0.0.1:32400/video/:/transcode/session/qldeahgxhijtfk0kfkd2kq6s/2ecec0bc-cd05-4174-b5f8-d3380f09c211/progress'

I’m running Version 1.22.1.4228 of PMS on a Synology via Docker.

I’m guessing this might be the interesting part: ‘-filter_complex’ ‘[0:1] aresample=async=1:ocl=’’‘7.1’’’:rematrix_maxval=60.000000dB:osr=48000[0]’

rematrix_maxval seems like the obvious one that might affect the volume somehow, since that appears to be some kind of normalization.

However, based on my understanding, rematrixing shouldn’t dynamically adjust based on content, but affect the audio equally over time, no?

Is this a known phenomenon?

I’ve also tried messing around with EasyAudioConverter to figure out whether ffmpeg is to blame or EAC but couldn’t get any results yet.

It can only react dynamically, because it has to perform its magic in real-time, without knowing the audio levels of the whole movie in advance.

If you wanted it to produce no volume fluctuations, you’d either have to:

  • analyse the level of each audio channel for the whole movie in advance before playback
  • change the mixdown parameters to a much lower level, to prevent any clipping. But the resulting volume will be way lower.

(My personal preference would be first option. Because it would yield optimal results and enable Plex to normalize/raise the loudness of all movies to a common level at user request. Users with small speakers could rejoice.)

@OttoKerner Interesting, thanks! I couldn’t find any real documentation on it unfortunately. I had assumed it was option 2, since the volume lowering to prevent clipping seems to be a common complain online.

The question then becomes, why does this issue go away when it’s transcoding to AAC instead? (Disabled EAC3 support on my TV’s plex app) According to the logs, it’s still invoked with the same rematrixing_maxval setting.

And more importantly: How can Plex fix this? It makes TrueHD transcoding essentially useless.

I have uploaded a 3 second example clip here: Download file faithful-conversion.mp3

I transcoded from the extracted EAC3 stream to MP3 using ffmpeg, it’s a faithful conversion when I compare it to the “original”.

For reference, how it should sound: Star Trek Into Darkness - Opening Scene (HD) - YouTube

AAC filter parameters:

'-filter_complex' '[0:1] aresample=async=1:ocl='\''5.1'\'':rematrix_maxval=60.000000dB:osr=48000[0]' 

I found this FFmpeg Resampler Documentation
Not sure why not a value of 1.0 is used instead. Probably because it’d yield even more artefacts.

@OttoKerner I’ve done some digging…

tl;dr: rematrix_maxval=60dB has absolutely no effect on the audio

longer tl;dr:

  • 60dB corresponds to 1000 which is the maximum input for rematrix_maxval (using dB here as a unit is nonsensical and misleading)
  • rematrix_maxval affects the static channel mixing matrix only, there is no dynamic behavior
  • rematrix_maxval=1 ensures the sum of coefficients per output channel has a maximum of 1, that way the resulting output never goes above the maximum, no clipping (all other channel weights are adjusted relatively of course)
  • 1000 means one channel can receive up to 1000 times “the volume of a single input channel” → we could get clipping
  • Thus, 1000 is equivalent to disabling rematrix_maxval for all intents and purposes, no sane mixing matrix would use such channel weights

Example: When we mix a stereo signal to mono and have a mixing matrix of Left: 1, Right: 1 and both inputs played at max volume, we’d be at twice the maximum volume in the output signal. Clipping occurs.

Setting rematrix_maxval=2 would not affect this mixing matrix, setting it to 1 would recompute our matrix such that we have Left: 0.5, Right: 0.5.
Setting it to 1.5 would recompute our matrix such that we had Left: 0.75, Right: 0.75 (clipping can still occur).

Set maximum output value for rematrixing. This can be used to prevent clipping vs. preventing volume reduction. A value of 1.0 prevents clipping.

We can now see why 1.0 prevents clipping, because it means if all input channels played at full volume, the output channel would also play at full volume (Left: 0.5 Right: 0.5 → Mono)

Obviously 60dB/1000 has no effect on our matrix either.
The same applies to any channel setup.


Experiments to backup my claims:

On the command line, 60db corresponds to a value of 1000 (dB parsing, 10^(60/20) = 1000), this can also be seen by invoking it with 60.1dB which gives this error:

 Value 1011.579454 for parameter 'rematrix_maxval' out of range [0 - 1000]

I adjusted my FFMPEG conversion of the EAC3 7.1 file to produce my stereo MP3 to experiment with rematrix_maxval. The (code) that adjusts the matrix to account for rematrix_maxval runs only once, showing that it doesn’t dynamically adjust the matrix over time, otherwise the log messages for the matrix would appear more than just once per transcode. (invoke ffmpeg with “-v debug”)

I have also adjusted the levels such that the sum of the coefficients per channel is 100, this corresponds to 40dB on the command line (10^(40/20) = 10^2 = 100)

ffmpeg.exe -i media-00091.ts -vn -acodec libmp3lame -filter_complex "aresample=async=1:rematrix_maxval=60dB:clev=30:slev=30:lfe_mix_level=12.727922" -v debug output-audio-test-rematrix-60db-custom.mp3

[Parsed_aresample_0 @ 000001e64da56ec0] [SWR @ 000001e64f9ef040] FL: FL:1.000000 FR:0.000000 FC:30.000000 LFE:9.000000 BL:30.000000 BR:0.000000 SL:30.000000 SR:0.000000
[Parsed_aresample_0 @ 000001e64da56ec0] [SWR @ 000001e64f9ef040] FR: FL:0.000000 FR:1.000000 FC:30.000000 LFE:9.000000 BL:0.000000 BR:30.000000 SL:0.000000 SR:30.000000

Running the command with different rematrix_maxval settings gives the following outputs for the channel front left:

rematrix_maxval=100:     FL: FL:1.000000 FR:0.000000 FC:30.000000 LFE:9.000000 BL:30.000000 BR:0.000000 SL:30.000000 SR:0.000000
rematrix_maxval=40dB:    FL: FL:1.000000 FR:0.000000 FC:30.000000 LFE:9.000000 BL:30.000000 BR:0.000000 SL:30.000000 SR:0.000000
rematrix_maxval=99.9:    FL: FL:0.999000 FR:0.000000 FC:29.970000 LFE:8.991000 BL:29.970000 BR:0.000000 SL:29.970000 SR:0.000000
rematrix_maxval=39.9dB:  FL: FL:0.988553 FR:0.000000 FC:29.656592 LFE:8.896978 BL:29.656592 BR:0.000000 SL:29.656592 SR:0.000000

As you can see, the mixing matrix is only affected once we drop below 100/40dB. Of course, you don’t have such high coefficients normally anyway.


How rematrix_maxval works in the code:

Here you can see the how the correction factor is computed based on the (maximum sum of coefficients across each output channel) and finallly (applied) equally to all coefficients as I’ve described in my tl;dr

On line 352, the matrix is logged to stderr if you invoke ffmpeg with “-v debug”, since this output never repeats when converting a file, we conclude that rematrix_maxval affects the matrix once and only once for the entire duration of the conversion and is not adjusted dynamically.

Sidenote: maxval is (directly sourced from rematrix_maxval) if the sampling format is floating point based.

Seems like rematrix_maxval is set to 0dB if the Plex app has “Normalize Mult-Channel Audio” enabled and set to 60dB if it’s disabled. I had disabled the feature from the default because I was trying to narrow down what caused the issue.

0dB per the above explanation would mean 10^0 which is 1, which means maximum volume without clipping while keeping the relative channel distribution. I’ve also verified this on the command line, 0dB acts the same as 1.

That matches the description of the feature “Reduce volume to reduce clipping when converting from multi-channel audio formats”.

So we now know that Plex does actually do that by default.

Sidenote: As a dev myself, I’m slightly amused that Plex uses such a roundabout way to set the value to 1 or 1000, obfuscation at its finest :upside_down_face:

So we’re back to square one, what causes the audio volume to fluctuate when transcoding to EAC3 but not to AAC?

Is it possible to instruct EasyAudioEncoder to keep all temp files so I can give them a listen?

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.