Iām also going to test using GPU support for those who might want this functionality but Iād generally recommend against it since it wonāt be as good as CPU encoding.
Iāve added nearly full support for Nvidia decoding/encoding into sickbeardās mp4 automator here - https://github.com/Collisionc/sickbeard_mp4_automator
Same as the original with better nvidia support, my changes are as follows ā Just copying part of the README.md. 
Nightly builds from https://ffmpeg.zeranoe.com/builds/ will work with all added options except scale_npp.
scale_npp support requires the following:
- CUDA Toolkit 8.0 - https://developer.nvidia.com/cuda-downloads installed on the pc using ffmpeg. Unfortunately there doesnāt seem to be a way to statically compile scale_npp support on Windows, and this is the only way to get the shared libraries so ffmpeg doesnāt complain about missing .dlls when loaded.
- I created a fork of ffmpeg-windows-build-helpers and edited the build script so that it will add scale_npp to ffmpeg while compiling - https://github.com/Collisionc/ffmpeg-windows-build-helpers - Make sure to enable non-free libraries for scale_npp support
Using the gpu for decoding doesnāt change the overall speed of taking a file and encoding it via gpu to another format, but it does free the cpu so that it may be used for other things.
scale_npp may speed up downscaling resolution from 1080p ā 720p, but I havenāt benchmarked it yet.
Brief explanation of added settings:
- āqminā = minimum video quantizer scale (VBR) (from -1 to 69) (default 2) - Must be set when nvenc_rate_control is vbr_2pass or vbr_minqp.
- āqmaxā = maximum video quantizer scale (VBR) (from -1 to 1024) (default 31)
- āglobal_qualityā = Must be set when nvenc_rate_control is constqp
- āmaxrateā = maximum bitrate (in kb/s). Used for VBV together with bufsize. (from 0 to INT_MAX) (default 0)
- āminrateā = minimum bitrate (in kb/s). Most useful in setting up a CBR encode. It is of little use otherwise. (from INT_MIN to INT_MAX) (default 0)
- ābufsizeā = set ratecontrol buffer size (in kb/s) (from INT_MIN to INT_MAX) (default 0) - I usually set this to 5*average bitrate, but mileage will vary.
- ānvenc_encoder_gpuā = Selects which NVENC capable GPU to use for encoding. First GPU is 0, second is 1, and so on. Default is any
- ānvenc_profileā = h264 options include: baseline, main, high, high444p - default is main
- ānvenc_presetā = Options include: slow, medium, fast, hp, hq, bd, ll, llhq, llhp, lossless, losslesshp - default is medium
- ānvenc_rate_controlā = Options include: constqp, vbr, cbr, vbr_minqp, ll_2pass_quality, ll_2pass_size, vbr_2pass - default is constqp
- ānvenc_temporal_aqā = (true/false) Improves output quality slightly, adds 2-5% extra processing time - default off
- ānvenc_rc_lookaheadā = Number of frames to look ahead for rate-control (from -1 to INT_MAX) - default -1
- āenable_nvenc_decoderā = (true/false) Enable gpu decoding. Default is false
- āenable_nvenc_hevc_decoderā = (true/false) Enable GPU decoding of HEVC/VP9. Only supported by Geforce 950/960/1050/1060/1070/1080 and Pascal quadros. Default is false
- ānvenc_decoder_gpuā = Selects which NVENC capable GPU to use for decoding. First GPU is 0, second is 1, and so on. Default is any
- ānvenc_hevc_decoder_gpuā = Selects which NVENC capable GPU to use for hevc decoding. First GPU is 0, second is 1, and so on. Default is any.
- āscale_npp_enabledā = (true/false) Enables usage of NVIDIA Performance Primitives | NVIDIA Developer to resize video output resolution. Requires building ffmpeg yourself, as npp is currently a nonfree license. Default - false
- āscale_npp_interp_algoā = Which algorithm to use with scale_npp - Options include: nn, linear, cubic, cubic2p_bspline, cubic2p_catmullrom, cubic2p_b05c03, super, lanczos. Default - super
If you have multiple nvidia cards you can decode on one and encode on the other, but it doesnāt seem to speed up the process at all.
Decoding by itself does not count towards the nvenc 2 stream limit.
With a lot of tweaking Iāve gotten gpu encoding to look nearly as good as cpu encoding. Like most people, upstream bandwidth is the issue I run into, so Iāve re-encoded nearly everything in my library with the following settings in the autoprocess.ini file:
video-codec = nvenc_h264
video-max-width = 1280
video-bitrate = 4000
qmin = 17
maxrate = 6000
bufsize = 18000
nvenc_profile = high
nvenc_preset = slow
nvenc_rate_control = vbr_2pass
nvenc_temporal_aq = true
nvenc_rc_lookahead = 64
enable_nvenc_decoder = True
enable_nvenc_hevc_decoder = True
Completely ballparking it, the 4 Mbps output from nvenc appears to be ~equal in quality to 3.5 Mbps output from the cpu h264 encoder. If comparing dark scenes with a lot of black/grey colors, 4 Mbps from nvenc is probably closer to 3.25 Mbps on the cpu.
However, the speed difference is massive. I have a 1080 gtx and a i7 6900k, and when converting a 15~ megabit 1080p h264 file to 4 megabit 720p with those settings I get around 280 fps with the 1080, 40 fps with the cpu.