Hardware Accelerated Decode (Nvidia) for Linux

tijmenvn

Can I ask you why you didn’t go for a GTX1660 or GTX1660-TI

I am very much interested in how you make out here as I am thinking the combo of a GTX-1660 and the nvidia hacked driver to enable multiple encodes could be the best available option (in a reasonable cost) for add on plex decoding.

Please post on how you make out with the 2060!

1 Like

I have a pretty cheesy answer for not choosing the 1660Ti. I still wanted to get Metro Exodus. Which cost around 40-50, and with the 2060 you get it for free. And the 2060 was 60 euros more, so I opted to get that card and have myself the game as well. Both should have same new encoder and have all 6GB of VRAM. So should all do similar in transcoding

I’ve tinkered with some settings this morning. I haven’t had time to try some new stuff out since some friends were watching and didn’t want to keep interupting. I’ll try some more settings first thing tomorrow. Will keep you posted

EDIT: Having a lot of troubles. I finally read that the fact that my host is running Debian and my container is running Ubuntu might be the reason it’s not working. nvidia-smi works, I can see all the folders in /dev/dri. Plex just doesn’t wanna use it. Might try to do some manual ffmpeg to see if I can get it working.

I played around with all of this extensively a couple of weeks ago while using a GTX1060. With that card, the latest Nvidia drivers and the latest version of Plex available at the time (the first one that had the upgraded ffmpeg baked-in), I was seeing transcoding being passed to the GPU with the “use hardware acceleration if available” box checked.

Since then, I’ve upgraded to a 1660ti and returned the 1060. (I have a 4690k that’s overclocked to 4.2 GHz, 16 GB 2133MHz RAM and a 32 TB RAID 5 as my home “server.”) I hadn’t been using the wrapper since getting the new card, but I decided to test today with the new Turing card.

For some reason, with the 1660ti, I have to disable hardware acceleration in Plex for the transcoding to be performed on the 1660ti GPU, and it’s only encoding. If I leave the box checked now, Plex will try to use my 4690k for hardware transcoding. At first I thought that I had messed-up the wrapper, but when I uncheck the “use hardware acceleration if available” box, the GPU takes over transcoding. I remember reading some of you say that you had to uncheck the box, which I found odd at the time, but now I am in that boat.

I haven’t tried toggling the onboard GPU in the BIOS yet. The only reason I really care about the checkbox, is I wonder what trickledown impact changing that option might have when I am recording content via Plex’s OTA DVR. I haven’t fully internalized how that decision should be made…need to think through this a bit more…
edit
Update: So I looked at what was happening when I was leaving the “enable hardware acceleration” button unticked, and I’ll explain here for posterity and because I think that it may help some of you who were having issue with the checkbox before me.

When I was ticking the box with my iGPU enalbed, plex was feeding ffmpeg the vaapi flag, which was causing the 1660ti to only be used for one half of the transcode (encode, not decode, if I’m not getting turned-around there). When I disable onboard graphics in the BIOS, the 1660ti is called for both pieces of the transcode via nvenc (vs vaapi which tried to use the dedicated QuickSync core of my CPU) and my CPU stays around 25% utilization with all of the other services it’s running, whereas it was hovering just under 100% on all cores and running pretty warm before.

I can’t explain why I was able to leave the onboard graphics enabled while using the gtx1060, but I can confirm that things are now working as expected with the iGPU disabled now and “use hardware acceleration” checked. I haven’t yet tested DVR/OTA TV functionality, which should call my GPU, but it’s fairly safe to assume that it’s also working as expected with the hardware acceleration.

update I tried watching some OTA TV, which gets transcoded and recorded the local news, and everything looks really great using the hw encoding. I’m also seeing the hw notification in the Dashboard/tautulli. I’ve noticed that both the Pascal and Turing cards really make my OTA TV streams look much better than what my CPU and software were doing, I’m guessing using that updated nvenc engine vs the QuickSync built into the 4690k. (I am applying the latest Intel microcode early in the boot process, which also makes a surprisingly big difference with QSV etc.)

Now we just need Plex devs to finish writing the interface to pass the flags the wrapper is passing (and smooth-over whatever weird business issues may exist around this issue)!

As for people asking about Plex features, I obviously don’t know and they won’t tell any of us consumers. We can only speculate with educated guesses based on functionality in ffmpeg. I know that the Plex folks claim to push things back into ffmpeg, but I doubt they are going to actually improve it’s efficiency or speed. But I’m just taking half-educated stabs. yajrendrag, I would check that you have all of your settings in plex set to the equivalent of “Make my CPU hurt,” as that’s probably why you’re not seeing it use more of your GPU when transcoding, as I’ve seen my utilization when making thumbnails go higher than 10%. YMMV. Use at your own risk, etc.

Just in case anyone comes here later, I have a more complete version of this guide here: Plex HW acceleration in LXC container - anyone with success?

You can actually get it to work with ubuntu in the container and debian in proxmox. It just requires a few tweaks to a lot of the guides.

You’ll need to make sure on your proxmox side you have your headers installed: apt-get install pve-headers-########-pve

Where ####### is whatever the result from uname -r is, ie: apt-get install pve-headers-4.4.35–2-pve

First step is to install whichever nvidia driver version you want on the proxmox side. I usually look up the latest version here: https://github.com/keylase/nvidia-patch

If it asks you to blacklist nouveau drivers, you will need to do that. Once that’s complete and you reboot, give a nvidia-smi a check and make sure it gives you the nvidia info.

Next, you’ll want to run nvidia-persistenced (https://github.com/NVIDIA/nvidia-persistenced/blob/master/init/systemd/nvidia-persistenced.service.template) on your proxmox side. This keeps the driver running which makes sure all the /dev/ are created on boot and remain. If you don’t run persistanced then those will go away and cause the container to not be able to use those /dev/

Once that’s all working you’ll do ls /dev/nvidia* -l to see what the cgroup is (this is in most of the guides) you should end up with 5 items. This is in most of the guides, if you’re missing one, you’re not running persistenced:
crw-rw-rw- 1 root root 195, 0 Feb 19 11:09 /dev/nvidia0
crw-rw-rw- 1 root root 195, 255 Feb 19 11:09 /dev/nvidiactl
crw-rw-rw- 1 root root 195, 254 Feb 19 11:09 /dev/nvidia-modeset
crw-rw-rw- 1 root root 511, 0 Feb 19 11:09 /dev/nvidia-uvm
crw-rw-rw- 1 root root 511, 1 Feb 19 11:09 /dev/nvidia-uvm-tools

You’ll add the two cgroups (in the above case 195, 511) to your .conf file for that container, similar to this:
lxc.cgroup.devices.allow: c 195:* rwm
lxc.cgroup.devices.allow: c 511:* rwm
lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file
lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-modeset dev/nvidia-modeset none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file

At this point boot up your container and I’ll typically do a ls /dev/nvidia* -l in the container to see if I see the same 5 entries from proxmox. If I do, that’s a good sign that I’ve got the container setup right

then run the same nvidia install you used on the proxmox install, but add --no_kernel (I think, but check --advanced_options I can’t find my notes on that parameter). This will install all the nvidia tools on the container. It might ask you if you want to overwrite some libraries, I typically say no and haven’t had issues.

At this point, you should be able to an nvidia-smi here and get a similar screen to what’s on the proxmox side.

Some quirks to be aware of:
If you install more things on the proxmox side your cgroups can change. If they do you’ll need to update your .conf file. I typically check this if I’ve been spinning up lots of new containers/VMs just in case.

If you’re checking nvidia-smi to see if there are decodes/encodes are happening, you’ll need to do it on the proxmox side, NOT the container side. I’ve noticed those won’t show up on the container side nvidia-smi

I may have missed something here but that’s my process in a nut shell.

2 Likes

my use case is different than most of the ones i’m reading in here - no virtualization environment and no 4k content. I simply want the GPU transcode function for speed. took me a while to get to an ffmpeg command line that produces what i want, but i have it now so that i can transcode an hour show in mpeg2 format to h264 at 1280x720 in 2-3 minutes with what i feel is good enough quality - without the GPU it was taking ~25 minutes. nvidia-smi is showing upwards of 90% on the encode and 100% on the decode.
This is all from the shell separate from Plex.

With Plex and the wrapper script, however, i’m only getting about 10% encode and decode and it still takes about 30 minutes to transcode to 720p - i use this function and the 2M/720p quality selection in Plex to sync content to my ipad. I have been hoping the GPU based transcode would reduce the time.

Am i correct in thinking that once Plex finishes their dev work that we’re likely to see a more optimized set of Plex Transcoder options for a linux environment (ie, GPU-only based h/w accel) that will speed up transcoding?

Well I would hope that we would all expect plex to optimize it as best they can, otherwise what’s the point of even having it?

All that said, who can ever say what plex will or won’t do ?

Thanks for the detailed and well written reply! I’ll be able to test extensively this weekend, as then I’ll be on site to be able to do multiple needed reboots.

A few problems that I already know to have reading your guide are the following:
Disabling Nouveau,
I’ve already created an entry at /etc/modprobe.d/nvidia-installer-disable-nouveau.conf with

blacklist nouveau
options nouveau modeset=0

After I restart my server, I get a BIOS message that one of my PCI cards failed and has to be reseated. After I simply reset after this message, the server boots normally. Although I have no clue wether the Nouveau drivers are properly blacklisted at this point, since the lspci -v out is the following (on the host side):

0a:00.0 VGA compatible controller: NVIDIA Corporation Device 1f08 (rev a1) (prog-if 00 [VGA controller])
Subsystem: ASUSTeK Computer Inc. Device 868a
Flags: bus master, fast devsel, latency 0, IRQ 152, NUMA node 0
Memory at d8000000 (32-bit, non-prefetchable) [size=16M]
Memory at 3fff0000000 (64-bit, prefetchable) [size=256M]
Memory at 3ffee000000 (64-bit, prefetchable) [size=32M]
I/O ports at fc80 [size=128]
[virtual] Expansion ROM at d9000000 [disabled] [size=512K]
Capabilities: [60] Power Management version 3
Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Legacy Endpoint, MSI 00
Capabilities: [100] Virtual Channel
Capabilities: [250] Latency Tolerance Reporting
Capabilities: [258] L1 PM Substates
Capabilities: [128] Power Budgeting <?> Capabilities: [420] Advanced Error Reporting Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
Capabilities: [900] #19
Capabilities: [bb0] #15
Kernel driver in use: nvidia
Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia

I still see the nouveau drivers in Kernel modules: although not in use. Is this the desired output, or should nouveau not even be listed here?

Next problem is with the persistenced-mode. I’ve enabled this setting. But after enabling this. The only entry that appears is the nvidia-modeset. Stil no uvm/uvm-tools. Although this might be fixed after a reboot. Will test this weekend. Should this option also be on, on the lxc side driver?

Also, in the Intel based guide, they mention making the hooking script: /var/lib/lxc/101/mount_hook.sh, should I only put the cards and renderD128 devices in here, or should I put the nvidia nodes in here as well? they also forward the framebuffer device fb0 to the container. Should this also be done with Nvidia, or is this only needed for intel igpu.

What about the driver hack for unlim streams, I know I should first get transcoding working without it. But if that works, do I apply that hack on both host/lxc, or only on one of m?

Thanks again for all the help. Didn’t expect this much hassle when I initially read articles about this :X

Hi I’ve just set this up on my Plex 1.15.1.791 server running on Ubuntu 16.04 on a Core i5 4570r with a Quadro p400 gpu.

Everything seems to be working as expected except for HEVC. Before I enabled NVDEC I was able to transcode 1 stream smoothly with close to 100% cpu utilization but now even though my CPU utilization has dropped to 30%, I can’t seem to transcode any faster than .5x.

H.264 streams transcode at close to 4x for high bitrate 4k content. I should add that I’m running the GPU through an EXP GDC Beast connected via a Mini PCIe slot, so the GPU is only running PCIe 2.0 x2.

Am I missing something?

thisisnotdave,

I just updated to the same PMS version 1.15.1.791. This is the first time I ran the NVDEC hack and it appears to launch just fine, but I am also seeing an issue where transcoding speed of a 4K HVEC file is only running at 0.8x speed. Lower resolution HVEC and H264 content works fine. I did note that in my testing, the Plex Transcoder2 process is consuming 100% of one CPU only and not branch out into multiple threads.

So maybe there’s some other bottleneck in play?

So after doing some poking around, I think the audio transcode may be the bottleneck. This is always in CPU and it looks like it may be single threaded. I’m working on confirming this by rewriting the audio into a format that doesn’t require transcoding. My source file was using TRUEHD, converting to AAC. Going to take several hours and will report back.

most source files (if you ripped directly from disk) have multiple audio streams and usually one of them is either ac3/dtx 5.1 or stereo.

you would simply need to choose that audio stream while playing through your plex client, to avoid audio transcoding.

@revr3nd encoding audio should not take several hours. download eac3to, then extract it to 640kbps ac3 from the lossless truehd audio source (assuming 5.1). then download mkvtoolnix and mux it into the mkv.
Examples:

PS Y:\Little.Italy.2018> eac3to.exe .\Little.Italy.2018.1080p.BluRay.Remux.H.264.mkv
MKV, 1 video track, 1 audio track, 1 subtitle track, 1:41:29, 24p /1.001
1: h264/AVC, English, 1080p24 /1.001 (16:9)
2: DTS Master Audio, English, 5.1 channels, 24 bits, 48kHz
   (core: DTS, 5.1 channels, 1509kbps, 48kHz)
3: Subtitle (PGS), English
PS Y:\Little.Italy.2018> eac3to.exe .\Little.Italy.2018.1080p.BluRay.Remux.H.264.mkv 2:audio640.ac3 -640
MKV, 1 video track, 1 audio track, 1 subtitle track, 1:41:29, 24p /1.001
1: h264/AVC, English, 1080p24 /1.001 (16:9)
2: DTS Master Audio, English, 5.1 channels, 24 bits, 48kHz
   (core: DTS, 5.1 channels, 1509kbps, 48kHz)
3: Subtitle (PGS), English
a02 Extracting audio track number 2...
a02 Decoding with libDcaDec DTS Decoder...
a02 Remapping channels...
a02 Encoding AC3 <640kbps> with libAften...
a02 Creating file "audio640.ac3"...
a02 The original audio track has a constant bit depth of 24 bits.
Video track 1 contains 145993 frames.
eac3to processing took 2 minutes, 31 seconds.
Done.


easy peasy

Just in case anyone comes here later, I posted my complete reinstall guide here: Plex HW acceleration in LXC container - anyone with success?

My server’s down at the moment, so I can’t check to verify your lspci -v vs mine but I don’t think there’s an issue with it listing it as a module that’s available as long as it’s not the driver in use.

If nvidia-smi works, I’m fairly certain you have the nouveau driver blacklisted.

Not sure how you installed persistanced but I’d recommend you run the install.sh here: https://github.com/NVIDIA/nvidia-persistenced/tree/master/init

After that, you should be able to do:
sudo systemctl start nvidia-persistenced
sudo systemctl status nvidia-persistenced

That should let you know if it starts and runs. If it does, then go ahead and do:
sudo systemctl enable nvidia-persistenced

You’ll want to do it this way because otherwise, every reboot you’ll have to go in and manually command persistence mode back on.

At that point, check your to see if uvm/uvm-tools comes up. When I didn’t see those it’s usually because persistance isn’t running. You only need this server side. For a discussion of why this is needed, see here: https://us.download.nvidia.com/XFree86/Linux-x86/375.26/README/nvidia-persistenced.html

For unlimited streams, you only install on the host side. The container leverages the host side’s kernel install. That’s why we do the install on the container side with --no_kernel

The only thing you need on the container side is just the install of the nvidia tools that match the version on the host. And really, you may not even need that, I just haven’t tried without them because I like to be able to run nvidia-smi on the container side too for sanity that it’s working there.

It took me some trial and error and looking through a lot of guides to get it working right, no one seemed to have the ‘right’ answer for me. But it’s slick when it does and it lets me spin up multiple containers that all have GPU access. Was worth the effort for me.

1 Like

I’m a little unclear from the above since there seems to be some tangents - does the latest release now officially support decoding or a hack is still necessary?

1 Like

Andy, a simple hack is still required but the current release of Plex can now work with nvdec without any major modification

You will need to place a wrapper script in front of the Plex Transcoder executable

  1. cp /usr/lib/plexmediaserver/Plex\ Transcoder /usr/lib/plexmediaserver/Plex\ Transcoder2
  2. Create a new file in place of /usr/lib/plexmediaserver/Plex\ Transcoder and insert the following script code:
#!/bin/bash
marap=$(cut -c 10-14 <<<"$@")
if [ $marap <> "mpeg4" ]; then
     exec /usr/lib/plexmediaserver/Plex\ Transcoder2 -hwaccel nvdec "$@"
else
     exec /usr/lib/plexmediaserver/Plex\ Transcoder2 "$@"
fi
  1. chmod +x “/usr/lib/plexmediaserver/Plex\ Transcoder”

Now Plex will start to use nvdec during transcodes

4 Likes

Hi, I just tested this with a 10bit HEVC 4k file using 2 channel AAC audio and I’m still having the same issue with stutter.

CPU utilization hasn’t dropped, its still ~94% on a single core. And according to Plexy py my encode speed is still .5x.

Someone on Plex mentioned that this is only an issue with 10 bit HEVC?

image

notdave, you are probably hitting the wall on the pcie x2 bandwidth.

See below from the support matrix for NVDEC support for the P400 https://developer.nvidia.com/video-encode-decode-gpu-support-matrix

What is the source media? The P400 shoudl support NVDEC if the source video is HVEC 4:2:0, but if it is 4:4:4 then it will not work.

Can you run: nvidia-smi dmon -s um
You should see the Dec column show above 0 when in use and memory utilization should jump up by about 1GB.

EDIT: The note from TeknoJunky is a good call. PCIe 2x bandwidth can very much be a bottleneck here.

1 Like

The video is 4:2:0 and the decode is definitely working here is my nvidia-smi output.

I’m surprised that it would be a bottleneck considers it’s still 1GB/s available bandwidth. Any way I can test that?

image

uncompressed 4khdr is way more than 1gbs

a lot folks really don’t easily comprehend or understand the magnitude of bandwidth and processing power it takes to deal with 4k content.

720/1080 is like a grain of sand on the beach of 4k video.

this magnitude of difference is at every level of the playback chain: storage, compression/decompression, disk/network/pcie/memory/cpu IO and bandwidth, etc.

everything it takes to move the bits from one place to the other, 4k takes much more than many probably think.