I just don’t see the point of feeding it codecs it doesn’t support. VC1 is one that comes to mind, but I’d rather not come up with all unsupported codecs. Since we all know the supported codecs it should be much easier, faster and more robust to only feed relevant codecs, especially since 1000s of different users use it.
Oh, and by inverting I mean whitelisting codecs instead of blacklisting.
While the wrapper script seems to be working and I can see activity in nvidia-smi, I still can’t get smooth playbook of 4k content encoded in HEVC. I’m running on E5-1650v3, 16GB of memory, Quadro P2000, Centos 7.6, nVidia driver 418.43. GPU usage during encode is 49%. However, on CPU side, I can see 1 thread of 12 is hitting 100%. I assume this is for audio encode?
Just wanted to share my experience. Looks good! I put up a rig with Core i5-4690, 16GB RAM and Nvidia GTX 1070. I have movies on a separate box, accessing them from this rig via NFS share. As you can see below, the rig can easily handle simultaneous full hardware decoding and encoding of 5 videos where each is 4K video, transcoded to 1080p video (CPU is only doing Audio decode/encode). And there is plenty room for more simultaneous transcoding
the recurring thing that bothers me is the memory usage.
If transcode process takes ~1 gig video ram, small ram cards like p400, 960gtx, any others with 2 gig or less, may not be able to handle multiple 4k streams.
on windows, the transcodes use much less video ram
True, in order to be able to transcode multiple 4k on a GPU you need a very recent GPU, at least 10xx. 960 is actually an exception in the 9xx series and still supports NVDEC, but it’s definitely too weak and as you said it does not have enough VRAM. See the GPU Support Matrix (click on “GeForce/TITAN” button on the page to get a complete list).
Regarding the VRAM usage - I believe the VRAM usage during transcoding is pretty much the same on Windows as it is on Linux since it is Nvidia Driver / CUDA dependent. Do you have same Win vs. Linux tests on the same configuration which is showing differently? (please note that transcoding with NVDEC requires significantly more VRAM usage).
Have you got these numbers directly from the nvidia-smi tool or by checking numbers in Windows Task Manager / Performace? Because the Windows Task Manager sometimes reports GPU memory usage when also taking into account paged memory allocation and sometimes it reports without it. Also make sure you are testing exactly the same video transcoded into exactly the same output, because VRAM usage depends on all of these factors. In any case … for several simultaneous DEC+ENC 4k transcoding the 960 will be too weak. But yes, it definitely should be a very good choice for a few simultaneous DEC+ENC 4k transcoding, especially due to it’s very good price on second market (cryptocurrency miners are dumping it)!
I am in no way saying that the 960 can handle many 4k transcodes, no aside from the memory usage, I would expect maybe 3 or 4 max before running out of steam.
As far as Vram I am going by what is reported by windows task manager, to my knowledge there isn’t an equivalent to nvidia-smi on windows.
you might be right that task manager is not reporting the memory usage the same way.
also, windows does not actually use NVDEC, it uses windows native decoding (DXVA2), so that could also help explain the large difference in apparent memory usage.
edit: let me clarify above, PLEX in windows, uses DXVA2 decoding.
other applications may be able to use nvdec on windows, I have no idea about those.
Are you able to transcode 4k HEVC/H265 without periodic buffering during playback? I been playing around with the settings but I cannot seem to get it to play smoothly.
I am accessing the media files via NFS from on NAS. As a test, I copied the file locally onto the Plex server and it’s able to transcode and buffer smoothly. However, if the media is access over NFS, Plex doesn’t seem to be able to keep up. Network traffic is not anywhere near maxing out the NIC. I am not quite sure what the problem is.
whether its something client, network, or server related, it sounds like the NFS mount is not responding fast enough for the transcoder.
might see if there are any NFS tuning optimizations you can do on either the client (mount options) or the server (disk/stripe caching adjustments etc).
I just dived into that, but no matter what tuning I tried it did not alleviate the buffering. As a sanity check, I changed the mounts to CIFS and now it’s behaving properly. 4K playback starts very quickly and I can see it buffering ahead correctly. Pretty strange that something about NFS was preventing Plex to transcode smoothly.
Also keep in mind that Plex transcodes ahead so if you start 3 simultaneous transcodes, they will all go at more than 1x for a while, and then they’ll stop for a while. So the gpu ram utilization you’re seeing may be due to transcoding ahead and thus higher than what is expected for a 1x transcode
Looks like I spoke too soon. Same issue came back even with CIFS. The NAS is serving NFS/CIFS on 10Gb link and the Plex server at most using a few MB/s reading the .mkv file. I don’t think network connectivity/latency is at play here. Could Plex be not respecting the throttle settings?
Based on your description the issues are likely to be related to actual transcoding performance and not to network performace. You mentioned that 1 CPU thread is hitting 100% (you assume audio encode for this) … well try to remux one of the 4k sources to mp3 stereo audio (use ffmpeg for this) which will disable audio transcode (any player can direct play mp3 stereo) - to make sure that there are no bottlenecks in the actual transcoding process.
Also, when you try to do a direct copy of the file from the NAS server in the OS level, is the copy process smooth, no interruptions?
No issue on the performance for straight copy. The source is True-HD 7.1 AC3. Would AAC 7.1 work for audio direct play?
According to the logs, I see the behavior of Plex seem to be grabbing chunks of source file and encode for playback, rinse and repeated. When the file is located on the disk locally, it’s just able to keep up. However, when it’s accessed across NFS/CIFS, it seems to fall behind even though NIC utilization is very low. I managed to install fscache and enable it for the respective NFS mounts and it seems to be streaming reliably now.
Direct play depends on clients’ (players’) capabilities: if clients’ hardware & software is able to decode and play TrueHD 7.1, then Plex will not transcode it, otherwise it will force transcoding.
Hmm, normaly NFS should be able to do OS-level caching when accessing NFS, but if fscache works for you, then this is also fine. How are you mounting NFS shares? Try something like this in fstab:
Try a different but similar video file. I’ve seen inconsistent behavior when trying to transcode 4k videos. I’m not 100% sure, but it seems that video files with “forced” PGS subtitles contribute to the problem. Since many clients (i.e., Apple TV) don’t support this native Blu-Ray format, they need to get the subtitles ‘burned’ into the video… so when this happens, we are happily using hardware assisted encoding, but after that the subtitles still need to be burned and that appears to be a single threaded operation (bound to a single CPU core). At least that’s what I saw when I poked at this a few weeks ago (but gave up and moved on to other stuff). My test case was a Blu-Ray 4k rip of Black Panther, which does have forced PGS subtitles… but a different 4k rip (without forced PGS) seemed to play fine…
I’ve purchased myself a RTX2060 to use the improved NVENC/DEC on these cards. Could’ve opted for a P2000, but figured I’d see if I’d get the hack for extra transcodes working and have a better quality stream. However. I’m running into some problems what settings to setup for my LXC container.
crw-rw---- 1 root video 226, 0 Mar 17 14:10 card0
crw-rw---- 1 root video 226, 1 Mar 17 14:10 card1
crw-rw---- 1 root video 226, 128 Mar 17 14:10 renderD128
I’m not sure what settings to put in my lxc .conf file. My server has one VGA output controlled by a matrox card. Isn’t that one of the cards I’m seeing above? Which I wouldn’t want to passthrough to my lxc container. Also. the lines you use:
My server looks the same (gigabyte pesh2 board), but I am running plex on bare metal proxmox and it just works.
I’d simply suggest start with card 0, if that doesn’t work switch to card1.
lxc.cgroup.devices.allow = c 226:0 rwm <<<< card0, change to 226,1 for card1
lxc.cgroup.devices.allow = c 226:128 rwm <<<<< render128
further, I’d recommend getting everything (encoding) working before messing with driver hacks and transcoder hacks. (ie don’t complicate things until they actually work in a stock configuration)