Burn-in subtitles using `overlay_vaapi`?

I’ve been testing hardware accelerated transcoding with i5-13500H recently. In general things work great, the CPU rarely breaks a sweat and the GPU produces really impressive results. However, I found that when turning on PGS subtitles in the web client it forced the server to transcode the video which makes sense given that PGS subtitles are an image-based format. However, the transcode was destroying the CPU which I didn’t understand at first. Looking more closely at the filter arguments being used it was clear that the transcoder was no longer using hardware acceleration.

The filter arguments when transcoding from 4k → 1080p without PGS subs looked like:

-filter_complex '
  [0:0]hwupload[0];
  [0]scale_vaapi=w=1920:h=1080:format=p010[1];
  [1]hwmap=derive_device=opencl[2];
  [2]tonemap_opencl=tonemap=mobius:format=nv12:m=bt709:p=bt709:r=tv[3];
  [3]hwmap=derive_device=vaapi:reverse=1[4];
  [4]hwupload[5]
'

All of these are using hardware accelerated filters. With PGS subtitles turned on, however, the filter arguments were:

-filter_complex '
  [0:5]scale=3840:2160[0];
  [0:0][0]overlay[1];
  [1]scale=w=1920:h=1080:force_divisible_by=4[2];
  [2]format=p010,tonemap=mobius[3];
  [3]format=pix_fmts=nv12[4];
  [4]hwupload[5]
'

So scaling, overlaying and tone mapping were all being done by the CPU.

Looking at the list of filters available in ffmpeg I saw that overlay_vaapi was available so I took a stab at recreating the filter graph with PGS subtitles using hardware accelerated filters and came to this:

-filter_complex '
  [0:0]
    hwupload,
    scale_vaapi=w=1920:h=1080:format=p010,
    hwmap=derive_device=opencl,
    tonemap_opencl=tonemap=mobius:format=nv12:m=bt709:p=bt709:r=tv,
    hwmap=derive_device=vaapi:reverse=1
  [v];
  [0:5]
    hwupload,
    scale_vaapi=w=1920:h=1080
  [s];
  [v][s]overlay_vaapi[c]'

This dropped CPU usage down to less than one core, the GPU was chugging away and the resulting video (and burnt-in subtitles) looked great. So, is there some reason that Plex doesn’t use overlay_vaapi for this?

Testing was done on Linux using PMS version 1.32.7.7621.

Hardware-based subtitle burn-in is (or at least was) being looked at, but you’re right that it’s currently done in software: Single thread subtitle burn in, is there any fix to this? Possibility of GPU usage? - #2 by ChuckPa

This is great input.

Thank you.

We’re working on this (new development in the new team) when we’re not fixing bugs dangling from ages-past.

This is a big boost to our efforts!

4 Likes

Is this part of the ‘at some point in the not too distant future’ progress you guys are making re burn in subs using gpu?

I’m very interested in this, much like others I guess!

This is a big issue for my friends/family using clients that don’t have great sub format support. I’d love it if I didn’t have to tell them to use a different device (or walk less technically-inclined family members through downloading SRT subs).

Would be very very happy if this just worked using HW acceleration.

One question, the original post here mentioned a GPU, but would this work accelerated on Intel quicksync as well?

Yes. QuickSync is Intel’s branding for their embedded GPUs.