I’ve been testing hardware accelerated transcoding with i5-13500H recently. In general things work great, the CPU rarely breaks a sweat and the GPU produces really impressive results. However, I found that when turning on PGS subtitles in the web client it forced the server to transcode the video which makes sense given that PGS subtitles are an image-based format. However, the transcode was destroying the CPU which I didn’t understand at first. Looking more closely at the filter arguments being used it was clear that the transcoder was no longer using hardware acceleration.
The filter arguments when transcoding from 4k → 1080p without PGS subs looked like:
-filter_complex '
[0:0]hwupload[0];
[0]scale_vaapi=w=1920:h=1080:format=p010[1];
[1]hwmap=derive_device=opencl[2];
[2]tonemap_opencl=tonemap=mobius:format=nv12:m=bt709:p=bt709:r=tv[3];
[3]hwmap=derive_device=vaapi:reverse=1[4];
[4]hwupload[5]
'
All of these are using hardware accelerated filters. With PGS subtitles turned on, however, the filter arguments were:
-filter_complex '
[0:5]scale=3840:2160[0];
[0:0][0]overlay[1];
[1]scale=w=1920:h=1080:force_divisible_by=4[2];
[2]format=p010,tonemap=mobius[3];
[3]format=pix_fmts=nv12[4];
[4]hwupload[5]
'
So scaling, overlaying and tone mapping were all being done by the CPU.
Looking at the list of filters available in ffmpeg I saw that overlay_vaapi was available so I took a stab at recreating the filter graph with PGS subtitles using hardware accelerated filters and came to this:
-filter_complex '
[0:0]
hwupload,
scale_vaapi=w=1920:h=1080:format=p010,
hwmap=derive_device=opencl,
tonemap_opencl=tonemap=mobius:format=nv12:m=bt709:p=bt709:r=tv,
hwmap=derive_device=vaapi:reverse=1
[v];
[0:5]
hwupload,
scale_vaapi=w=1920:h=1080
[s];
[v][s]overlay_vaapi[c]'
This dropped CPU usage down to less than one core, the GPU was chugging away and the resulting video (and burnt-in subtitles) looked great. So, is there some reason that Plex doesn’t use overlay_vaapi for this?
Testing was done on Linux using PMS version 1.32.7.7621.