Strange (or is it?) transcoding behavior

Hey guys, I have a few transcoding questions again. This is not a “please help my stuttering movie”, it is more a “how does transcoding work on a high level” typ of post.

I’m doing my tests with Jumanji (info attached) cause it really punishes my Skull Canyon NUC. It has a passmark of 9.7k but I would like to say that it is completely irrelevant for my question. My question would still apply if I was on a 2k passmark system trying to transcode 4k to 4k.

PMS is running on Ubuntu 17.10. I’m playing it in a LAN through Plex web client. In the player I’ve selected for it to “convert to maximum”, 4k 53.2 Mbps, which pretty much results in a HVEC to H264 conversion. I’m using the fast transcoder profile in the server (I’ve been using auto but switched it for testing) and the throttle is on 60 secs. Subs are not on. I’m getting stutters at the same point in the file whenever I start it with the same transcoder settings.

Now that this punishes my NUC (pretty much 92%+ CPU constant) since it requires transcoding such a high bit rate file is not all that strange, especially since it is going from HVEC to H264. I think that should be expected. What I do not understand how ever is why the transcoder behaves like it does.

To me it feels like that when ever the transcoder start hitting the CPU limit for a period of time the playback stutters or frames are dropped, no matter if I’ve paused to let it fill the buffer. Now…this might sound strange (that I think it sounds strange…) but this really doesn’t make sense to me and this is what I would like to understand a bit better.

What I would expect (but I am obviously wrong somehow) is that when the transcoder hits the CPU limit, It would not have time to fill the buffer or to stream the data to the client. This means that (for example) if it takes 10 secs to produce a 5 sec transcoded video, one will obviously have buffering issues. Not strange at all. I would at that point also expect that since it cant keep up, I should be able to press pause, let it work at the pace it “can deal” while the buffer fills, then play a stutter free movie as long as it still is buffered. I would never expect the transcoder to produce a result that stutters, drop frames and buffers.

What I am seeing, and not understanding, is that to me it seems like when the CPU hits 100% it actually starts doing a bad job and producing bad transcoded results. In other words I get the feeling that if your CPU is peaking you will always see bad results in regards to stutter and buffering/waiting issues no matter how long you let the server work before playing/using the data. This seems to me that you can never buffer or throttle a movie that requires transcoding if your CPU cant transcode the movie in real time, no matter if you actually need (read movie is paused) the transcoding results in real time. It seems that throttling or buffering has lost its purpose here. If a movie has been paused, there is no reason to produce bad transcoding results on data we dont yet need, we can allow that data to be transcoded at what ever rate it takes until we press play. Then we can play our little perfect data until we are out of a “transcoded” buffer.

So yeah, I have no clue why I am always seeing stuttering and buffering/waiting at that point in the movie even if i have a HUGE throttling value (for test) and when I let the transcoder work a good while before playing the actual portion of the movie containing the problem. To me with a little bit of black box thinking it seems like the transcoder is always on some kind of real time algo, resulting in frames being dropped, stutters and buffer wait time issues no matter if you actually let the transcoder work(/gave it time) before you play the movie.

I would like to say that I have a decent computer background and that I’ve been working with software development for a few years, that is mostly the reason to my question and why I cant actually understand it. I am by no means super good, but from what I know about software development, I’m having a hard time understanding the transcoding code/logics/algos. I have a few theories of why this would be happening but that would point to very strange transcoder algos so chances are I’m just not understanding stuff correctly and that I’m being ignorant and that I am lacking fundamental knowledge of Plex, working with media and streams.

Any ideas guys, this seems so counter intuitive to me? Let me know if I can do some more tests. Movie plays fine without transcoding or if you select a profile where the computer can keep up with (i.e. 4k --> 1080 20Mbps).

Best regards
Stefan

PS:
Hope someone else besides me might think this is fun to try to understand and think about. Like I said earlier, this is not actually about fixing my issue. DLNA, XPlay (for subs) and Chromecast Ultra have me covered for that.

Edit reason

  • Added linux distro

This is a little bump with some new info I cant really understand. It is Jumanji again, transcoded to 1080p 20Mbps. CPU usage is obtained by top

HW-acceleration enabled:
PGS subs: ~ 30% CPU (WTF…?)
No subs: ~70-75% CPU
External SRT: ~70-75% CPU

Software only:
PGS subs: ~40-45% CPU, but with occasional spike.
No subs: ~88-92% CPU
External SRT: ~88-92% CPU

According to this, if the movie will have to be transcoded, I should opt ti transcode it with PGS subs to save CPU…? I did not know this.

Other thing that might or might not be worth noting:

  • Using XPlay and DLNA without transcoding uses essentially no CPU at all.

  • Hardware acceleration does not work when transcoding 4k to 4k, at all, no matter subs. CPU is at ~30% for PGS, 50-60% for no subs/external). It just hangs and waits.

  • On 4k to 4k (SW) transcoding, using PGS subs report about 30% less CPU usage than external SRT, but seems to suffer from about the same amount of frame drops and stuttering.

  • With 4k to 1080p 20Mbps transcoding, CPU is 25% with PGS and 50-70% with no subs. It does how ever stutter with PGS and not with no/external (was a different movie).

  • XPlay with PGS enabled will stutter in the same places the web player would when transcoding (XPlay can play HVEC though, web player had to encode to h264). Now here is where i think it gets interesting. On the WebPlayer, if I paused right before it started stuttering it would show that it was buffering. When I started it, it always stuttered in the same place. In XPlay how ever, if I pause it right before the stuttering, it will actually give the CPU chance to burn in the subs and when I play the movie again it will not stutter in the same location.

Doesn’t seem like a lot of people think this is interesting so chances are this will be the last post considering the fact that XPlay is working quite well. For other other people that would want to stream 4k content from me I’ll just recommend a compatible player for direct playing and SRT subs. If I am in a situation (for example, out of country) when I need lower bit rates I can always just optimize my media.

Best regards
S

Glad you are interested in learning more about Plex. Hope this helps.

When Plex transcodes a video, there are 2 steps, decode and encode. Each step can be done in either hardware or software, with limitations. So a reduction in CPU load could mean 1 of these steps was offloaded to hardware. It could also mean that the full power of your CPU is not being utilized and there is a bottleneck.

Another thing is that not all decoders and encoders are equal. This applies to both hardware decoders and software decoders. Some decoders/encoders are faster than others. i.e. decoding/encoding H264 is faster per cycle then hevc. Software decoding of h264 is also able to utilize all cores in your CPU, while software decoding of the VC1 codec is restricted to using only 1 cpu core, a bottleneck. Burning in vobsub subtitles is also another bottleneck, as that is also restricted to a single core. So in these cases, yes your CPU load will be lower, but it doesn’t not mean it’s working more better.

When PMS is transcoding, it saves the transcoded data onto your server’s temporary cache area. PMS will try to create a buffer based on your PMS setting versus the current play time. As the media is played, PMS will kick on as needed to generate additional data. If you are looking at CPU load, you will see a steady load while this buffer is being created, then spikes as it turns on and off and maintain that buffer.

The client on the other hand can only receive 1 transcoder chunk at a time, so it is not possible to buffer this ahead of time, client side. Well, maybe 1 chunk ahead. So hitting pause will allow PMS to generate it’s buffer if your PMS is slow but it will not help on the client side since it cannot receive more data. So if you are seeing buffering issues, it could be either PMS not transcoding in real time or the data not being transferred to the client fast enough, or the client is having issues with the data it received.

@“MovieFan.Plex” said:
Glad you are interested in learning more about Plex. Hope this helps.

When Plex transcodes a video, there are 2 steps, decode and encode. Each step can be done in either hardware or software, with limitations. So a reduction in CPU load could mean 1 of these steps was offloaded to hardware. It could also mean that the full power of your CPU is not being utilized and there is a bottleneck.

Another thing is that not all decoders and encoders are equal. This applies to both hardware decoders and software decoders. Some decoders/encoders are faster than others. i.e. decoding/encoding H264 is faster per cycle then hevc. Software decoding of h264 is also able to utilize all cores in your CPU, while software decoding of the VC1 codec is restricted to using only 1 cpu core, a bottleneck. Burning in vobsub subtitles is also another bottleneck, as that is also restricted to a single core. So in these cases, yes your CPU load will be lower, but it doesn’t not mean it’s working more better.

When PMS is transcoding, it saves the transcoded data onto your server’s temporary cache area. PMS will try to create a buffer based on your PMS setting versus the current play time. As the media is played, PMS will kick on as needed to generate additional data. If you are looking at CPU load, you will see a steady load while this buffer is being created, then spikes as it turns on and off and maintain that buffer.

The client on the other hand can only receive 1 transcoder chunk at a time, so it is not possible to buffer this ahead of time, client side. Well, maybe 1 chunk ahead. So hitting pause will allow PMS to generate it’s buffer if your PMS is slow but it will not help on the client side since it cannot receive more data. So if you are seeing buffering issues, it could be either PMS not transcoding in real time or the data not being transferred to the client fast enough, or the client is having issues with the data it received.

Question for you if I may (please). Any future plans to implement multi-threaded VC1/HEVC? Or is that in the hands of FFMPEG or the like?

That’s up to ffmpeg, or more specifically the vc1 decoder used by ffmpeg.

@“MovieFan.Plex” said:
That’s up to ffmpeg, or more specifically the vc1 decoder used by ffmpeg.

Thank you: I found a feature request filed over at ffmpeg support from 5 years ago now that still looks to be open: #1885 (Multithreaded decoding for vc1) – FFmpeg

I’m surprised the tickets have so few comments; or at least people are unable to correctly attribute the problem back to ffmpeg. Is hardware decoding the recommended solution for this if I am unable to sufficiently increase single-threaded performance or direct stream on clients?

Hardware decoding would help against the single thread limit. Another option, which I would personally recommend is to re-encode the video to H264.

Your problem has an easy solution. Go into the Server>Settings>Transcoder and set ‘Transcoder default throttle buffer’ to 7200. The default is like 60 seconds. So, by default a low horsepower CPU will cause lots of waiting on transcoding (not client buffering). If you set the value to 7200 (2 hours) and start a video then pause it to give the transcoder a bit of lead time, you will find that your CPU will run 100% and finish the whole video transcode in one shot.

Most people don’t realize that there are actually TWO buffers. The transcoder buffer and the stream buffer. The stream buffer is from the finished transcode over the network to the playback device. On a high speed connection this is rarely the problem unless you have wifi interference. Most people have the problem of setting the ‘Transcoder default throttle buffer’ too low. If you do not have a CPU that is capable of transcoding FASTER than playback (real time) then you need to set the ‘Transcoder default throttle buffer’ much higher so that a slower CPU can get enough of a head start to then be able to provide the finished transcode without having to pause.