Has anyone used a high-end graphics card in discrete mode and seen any advantages in a virtualized PMS box? My Plex server currently has 38 cores assigned (the Hyper-V 2016 host is running a single Xeon E5-2699 v4 processor, and hosts several other virtualized servers) and I have absolutely no issues transcoding (software) any types of files (to include HEVC). My average is about 8 concurrent users, mostly transcoding (and the CPU never breaks a sweat). Before I go drop more money I would like to know if anyone is doing this yet (using a virtualized high-end graphics card in discrete mode, and seeing advantages using hardware transcoding). Thanks!
Does the VM have direct and exclusive access to the hardware? If not, any efforts will be in vain.
Yes, that is what I meant by saying discrete modeâŠ
âŠand this would be the card I would use (dedicating one GPU and 16GB of GPU RAM to PMS):
https://www.amd.com/en-us/products/graphics/workstation/radeon-pro/duo
Let me move this to Windows forum. Youâll have proper help from those there. Iâm just a Linux guy 
No worriesâŠthis question could also relate to Linux as well, Chuck (discrete mode, aka SR-IOV is a beautiful thing). It doesnât even have to be Microsoftâs solution (for the Hypervisor).
Linux wonât give you that level of control over the devices unless you do a whole lot of kernel config and custom software. As soon as you do that, you push yourself into a non-supportable state (obvious reasons).
The fundamental problem youâre going to face is in libva if you use Linux. Hardware transcoding is currently setup / implemented for single-resource home user. It is for the single user. Not something as massive as I believe youâre thinking.
Oh wowâŠjust did some reading about that, and am very surprised that the Linux kernel doesnât natively support this feature (was just on the Red Hat message board). Iâm very excited about GPU hardware transcoding, and with SR-IOV, the card is directly attached to the virtual machine (just like a physical box). Because, in the greater scheme of things, no average Joe buys a 4300 dollar processor to just run PlexâŠLOL ;0)
@johnnycole said:
Has anyone used a high-end graphics card in discrete mode and seen any advantages in a virtualized PMS box? My Plex server currently has 38 cores assigned (the Hyper-V 2016 host is running a single Xeon E5-2699 v4 processor, and hosts several other virtualized servers) and I have absolutely no issues transcoding (software) any types of files (to include HEVC). My average is about 8 concurrent users, mostly transcoding (and the CPU never breaks a sweat). Before I go drop more money I would like to know if anyone is doing this yet (using a virtualized high-end graphics card in discrete mode, and seeing advantages using hardware transcoding). Thanks!
You are essentially asking if anyone has every pass-through the GPU of a virtualized high-end graphics card to a Hyper-V guestâcorrect?
With Windows 2016 as the host and Windows 2016 as the guest Plex should work with that card.
I played with a similar setup in a VM during hardware beta and it did work for me. However, nothing is guaranteed of course and itâs not like this is going to be a standard setup that Plex will test for.
Iâm assuming you are thinking of buying the card for other purposes and just want to make use of it as well for Plex. If not and you are planning to use it only for Plex, make sure you can return it just in case.
But it should work if the host is Windows 2016. ffmpeg (what our transcoder is based on) can do this as well in the VM in discrete mode.
However, your server is probably handling the load fine now with just the CPU so why add the GPU?
Carlo
âYou are essentially asking if anyone has every pass-through the GPU of a virtualized high-end graphics card to a Hyper-V guestâcorrect?â
No, that is totally possible, and not just with video cards (like I stated before, Iâm a big fan of SR-IOV). Iâm wondering, since hardware transcoding is no longer a feature that is only offered by a beta version of Plex, if anyone has used this type of scenario successfully, and seen results (that can compete, and hopefully are better than software transcoding). I just lowered my electric bill by almost 150 bucks a month, because I was running a dedicated i7 6700 PMS box that was pegged at 100% all the time (by adding 5 additional friends to my Plex pool). Damn mini case burned your fingers if you touched it for more than a second. I consolidated my Hyper-V host and my PMS box (from two boxes) to a single server solutionâŠhowever, it would be great, in the near future, if (whether virtual or physical) a PMS server utilizing hardware transcoding could be built based on a GPUâŠespecially as UHD starts becoming the norm, and compression can be tamed using a graphics card vs CPU (similar to the bitcoin mining situation). My question is based on my specific environment, but since I would use a discreet âdirectâ connection to the hardware, it could be applied to a physical scenario as wellâŠ
âHowever, your server is probably handling the load fine now with just the CPU so why add the GPU?â
Youâre rightâŠI have absolutely no problem with my current setupâŠbut, I am interested in offloading to a GPU vs CPU. I would get the card, and dedicate half of it to PMS (if it can be exploited). My curiosity is based on this, say, 4 years from now, when (hopefully) 8k is the norm, and there is a compression scheme more demanding than h265. Is everyone that wants to stream to 10+ users using Plex going to have to buy a 4000+ dollar processor, or, a modern-day Intel Nuc Skull Canyon (smaller than an average paperback book) built around a bad-ass GPU?
Iâve attached this monthâs data collection (from my Plex VM). I wish I had a screenshot from the month before, where the CPU table pretty much stayed at 100% for at least 18 hours a day..
âŠand I would like to ask another question (totally different topic, but just as important). Software sideâŠany thoughts of offering the PMS backend based on a MYSQL or SQL database? This product has so much major growth potential in the near futureâŠand whether you are using a local (on-site) or cloud solution, being able to use a fast indexed optimized (and vendor supported) database would be stellarâŠthink of the hooks that could be built, exploiting the application data tables and metadata?! Plex could replace products like Adobe Lightroom and give birth to things like new online radio stations (for example) overnightâŠif it had a standardized databaseâŠcoupled with GPU hardware transcoding (which could be deployed on thin clients built around a fast GPU). Plex has the potential, more so than any of their competitors, and I believe that they could (with the right resources) get into any marketâŠfrom watching the morning news or last nightâs Simpsons on your OLED screen above the microwave in your kitchen, to running media in a 3rd grade classroom.
Itâs been asked numerous times before about having a real back end SQL database.
Best I can say is I doubt this will ever happen.
Itâs a real shame too, because it would allow us to easy scale our systems to handle transcoding so much easier. You could run a couple of home computer with QSV or dedicated GPUs and be able to handle pretty much any load of family and friends. If you were into DVR you could setup a different computer for recording with real time Transcoding so all shows/movies are saved as MP4 with commercials cut.
This would give us a lot more flexibility for sure.
@johnnycole said:
âYou are essentially asking if anyone has every pass-through the GPU of a virtualized high-end graphics card to a Hyper-V guestâcorrect?âNo, that is totally possible, and not just with video cards (like I stated before, Iâm a big fan of SR-IOV). Iâm wondering, since hardware transcoding is no longer a feature that is only offered by a beta version of Plex, if anyone has used this type of scenario successfully, and seen results (that can compete, and hopefully are better than software transcoding). I just lowered my electric bill by almost 150 bucks a month, because I was running a dedicated i7 6700 PMS box that was pegged at 100% all the time (by adding 5 additional friends to my Plex pool). Damn mini case burned your fingers if you touched it for more than a second. I consolidated my Hyper-V host and my PMS box (from two boxes) to a single server solutionâŠhowever, it would be great, in the near future, if (whether virtual or physical) a PMS server utilizing hardware transcoding could be built based on a GPUâŠespecially as UHD starts becoming the norm, and compression can be tamed using a graphics card vs CPU (similar to the bitcoin mining situation). My question is based on my specific environment, but since I would use a discreet âdirectâ connection to the hardware, it could be applied to a physical scenario as wellâŠ
âHowever, your server is probably handling the load fine now with just the CPU so why add the GPU?â
Youâre rightâŠI have absolutely no problem with my current setupâŠbut, I am interested in offloading to a GPU vs CPU. I would get the card, and dedicate half of it to PMS (if it can be exploited). My curiosity is based on this, say, 4 years from now, when (hopefully) 8k is the norm, and there is a compression scheme more demanding than h265. Is everyone that wants to stream to 10+ users using Plex going to have to buy a 4000+ dollar processor, or, a modern-day Intel Nuc Skull Canyon (smaller than an average paperback book) built around a bad-ass GPU?Iâve attached this monthâs data collection (from my Plex VM). I wish I had a screenshot from the month before, where the CPU table pretty much stayed at 100% for at least 18 hours a day..
I wouldnât see why not.
I currently use a Skull Canyon with HW transcoding and its more than capable of handling 10+ concurrent users for H.264 encoding. I will be looking at the Hades Canyon for support of H.265.
When I say I have an average of 8 concurrent users thatâs exactly what it sounds likeâŠI get between 15-20+ concurrent users during peak times. Most of my TV sourced media is encoded in HEVC H265âŠand there is plenty of media that is at 4k resolution (not to mention the DTS-HD and ATMOS audio tracks). My previous standalone PMS box was running a desktop CPU version of the 6700kâŠwhich (even with Iris hardware transcoding) felt like it was going to catch on fire (temps hovered between 85-99 degrees). I merely want to know if anyone is running hardware transcoding in a virtual environment successfully, using discrete addressing, aka SR-IOV (and it is proven better than software decoding). Are we there yet?
@johnnycole said:
When I say I have an average of 8 concurrent users thatâs exactly what it sounds likeâŠI get between 15-20+ concurrent users during peak times. Most of my TV sourced media is encoded in HEVC H265âŠand there is plenty of media that is at the 4K rolution (not to mention the DTS-HD and ATMOS audio tracks). My previous standalone PMS box was running a desktop CPU version of the 6700kâŠwhich (even with Iris hardware transcoding) felt like it was going to catch on fire (temps hovered between 85-99 degrees). I merely want to know if anyone is running hardware transcoding in a virtual environment successfully, using discrete addressing, aka SR-IOV (and it is proven better than software decoding). Are we there yet?
- Been there the whole way.
- Yes I have seen it done but you will not be able to utilize more than one passthrough GPU with Plex
- Not sure one GPU will scale to 15-20+ concurrent users as software decoding can with the CPU you have
- Software transcoding quality is currently deemed better than HW transcoding quality. Platform video acceleration libraries and hardware drivers are the variable determining factors here.
Thanks AchillesâŠbest answer so far! I mentioned Bitcoin mining earlier in the thread, which can be compared to Plex hardware transcodingâŠand from the sounds of it, even though it is now a standard feature, it still has not been optimized to take full advantage of the GPU hardware (processor, RAM, extensions, etc). I guess Iâm fishingâŠcan someone from the Dev team tell me (us) where their bottleneck is? If we give PMS a top of the line graphics card, capable of rendering raster graphics at 8k+ why canât it do the same with a compressed flat file? Iâm not trying to insult the Plex coders, however, it would be great to understand if there is a current limitation based on a codec or vendor extension (thatâs limiting the hardware transcoder). Please keep in mind, Iâm not talking about embedded graphics in this scenario, but a beefy card with crunching potential (heavy emphasis regarding the Bitcoin mining comparison). I seriously do believe that 8k will be here in the near future <5 yearsâŠand even H265 (which to me, is almost as important as the invention of the original MP3 compression scheme) will be replaced by something even more complex. I would even offer some of my free time to champion this effort (if there is a limitation that is vendor based). Like I saidâŠIâm kinda fishing (LOL) and would love some feedbackâŠis the hardware transcoding using pure OpenCl and CUDAâŠare there plans in the future to utilize SLI and CrossfireâŠIs Plex (as a company) working with Nvidia and AMD (game development shops do, and drivers are released to take advantage of code for certain games based on their rendering engine)? Thanks guys!
Honestly the bottleneck is in your files.
Youâre feeding it files that almost certainly have to be transcoded in order to be viewed by your friends. If you created a direct playable version in MP4 container using h.264 with a default audio track of AAC stereo (plus additional audio tracks) you would save yourself from having to transcode a large portion of the time and the quality would be better as well.
I used to use a dual XEONs with 12 cores each (24 cores or 48 virtual cores) as my Plex server. I started converting my media and now have everything in my sig in a direct playable format and havenât looked back. I now run my Plex server on a 1st gen i7 at 2.8Ghz with 8 virtual cores) and an older AMD R9 280x GPU and it works just fine.
I have 4K libs (mostly h.265) and also 1080 libs. Any movies in the 4K libs will also be encoded for 1080 as well. I only share the 4K lib to people who can direct play it and of course have 4K TVs. If I see them transcoding I stop sharing that library with them.
Something to keep in mind about the way Transcoding works in Plex in general. When you have a file that needs real-=time transcoding it doesnât start a file transcoding and let it go until finished. It transcodes only enough to keep the buffer filled on the client side. So if your CPU was fast enough itâs quite possible you could be streaming to 5 clients but only 1 transcode would be taking place at any time since as soon as itâs transcoded enough it shuts the transcode down and waits. When the next client needs more data it will transcode more, fill the buffer and wait again. Of course on slow machines it might have to transcode them all at the same time just to keep up. But on fast machines it can seem like the CPU is hardly doing anything and mostly waiting.
Enter GPU, same thing. So lets imagine you have a card that only supported 2 encodes at the same time. That doesnât mean you can only stream to 2 people using it. Since it transcodes, fills the buffer and stops itâs quite possible to support 6 to 8 streams only being able to transcode 2 streams at the same time. Think about it. If you could only do 2 encodes at the same time but each was 10x real-time speed (total of 20x) in theory you could support 20 streams (lets call it 12 to 15 to be safe).
Donât get caught up in the math from above as itâs made up just to explain how it could work. Every system is different as it depends on the hardware and the media it has to start with. But I just wanted to point out that X concurrent transcodes can support many more streams on fast hardware due to the pausing during encoding once you get ahead of whatâs needed.
Personally after having been down the path of using enterprise equipment for Plex and having used lower end âhomeâ equipment I wouldnât go back to enterprise class as itâs just not worth it $ wise.
I could easily build 3 or 4 cheap computers running as standalone Plex servers each with itâs own dedicated consumer class GPU (or use QSV) pointed to the same media. I then could assign users to a particular Plex server to balance the load. Now I have redundant backup servers as well.
I too used to be roughly where you are now streaming wise. I used to share with a ton of people, from guys I worked with to guys I shoot with to friends and family. Then my dad had friends and my kids had friends. I also had a few people from here in the forums as well. But then people started sharing their accounts with other members of their family and I would see 3 or 4 streams coming from the same account on different IPs. People I didnât know and I wasnât planning on being their âNetflixâ so I sent out warning to people and then started to remove people who were abusing my âgoodnessâ sharing with them. Over time I downsized who I share with and Iâm back to close friends and family.and I couldnât be happier.
But anyway even on the 1st gen i7 it would stream to a dozen people. That was before we had HW encoding. So knowing what I know know Iâd go for multiple smaller computers and segment users vs a big expensive to upgrade computer for running Plex. EG you could easily build 4 nice i7 machines for what that GPU will cost. The i7s would have QSV and would support HW encoding and could easily handle 40+ users without an issue. Plus you would have duplicated hardware and redundancy.
Just a thought,
Carlo
This is an interesting thread. Itâs been interesting to read along.
Vmware introduced SR-IOV in Vsphere 5.1 in 2012. Itâs 5+ year old tech. It may be new to some but we call it for what it is âPCI passthroughâ and everyone does it. Itâs been built into publicly available VirtualBox for a long time.
Comparing Bitcoin to Video isnât a valid comparison. Comparing gaming to video isnât valid either. One is based on pure flat matrix math, gaming is parallel 3D solid modeling, while video is a set of FFT sequences on ever-changing input data. Games objects are based on static models. While one can use OpenCL or Cuda for mathematics or game modeling (prime number proofs is an excellent example and what I do as my hobby), easily broken into smaller independent task blocks, with the blocks reassembled upon completion , video is not the same. Video frames are intra- and interdependent where the output of one is needed as input for the next on multiple axes (the FFTs) . Youâre asking for a maturity level which does not yet exist in the domestic video world.
Your analogy using SLI/Crossfire, where game model objects are independently rendered 3D solid models and âdroppedâ into the composite video frame, is not applicable. The closest you could hope for in this video world is a multi-stage pipeline where video enters one end of the pipe, is processed as it traverses the pipe, and exits the pipe upon completion. The level of software sophistication needed to accomplish this is is not readily available (free) to the general public. Yes, FFMPEG does support OpenCL/Cuda but not at the level you are asking and probably wonât for some time as FFMPEG is a volunteer effort.
Regarding 8K video. â5 years awayâ is an eternity. Itâs at least 2 generations of silicon away. 8K content is even further away as it would require 12K or 16K masters be produced. Sure, itâs possible to do 8K gaming or computer workstations but Hollywood (which includes the consumer video player manufacturers) isnât even doing any 8K filming yet. They need to upgrade every step in that process from cameras, to production equipment, to storage, and to the players consumers purchase. Having an 8K monitor / TV is quite arguably meaningless without the video to fully utilize it. Sure, one could do upscaling in the display but itâs interpolated data, not original content data.
Comparing MP3 compression to H.265 isnât valid. One must distinguish between 'more compression" vs âbetter compressionâ. WAV->MP3 falls into the âmore compressionâ class. HEVC falls into the âbetter compressionâ class. MP3 made the audio softer. The intent of HEVC is to reduce the total data required lower than H.264 while keeping those nice crisp edges, which it does nicely, but at the cost of more difficult decode/rendering.
Now to Plex:
- Speaking to Linux only: Hardware transcoding does not utilize OpenCL or Cuda directly. PMS accesses video acceleration through the VAAPI layer. What happens below that layer is invisible and immaterial to PMS.
- Plex doesnât publish roadmaps.
- Itâs not about coding existing technology. Itâs about creating new technology. Trying to predict when genius will strike canât be done. One cannot reliably predict those âToday I will be brilliantâ days when the stars align and genius strikes.
- We are monitoring the progress of Intelâs VAAPI 2.0.0 development. Engineering will be evaluating when it becomes a firm, final, release. This is because the API & ABI (most importantly) has changed (per Intel development themselves on 01.org). Upon release, there will be considerable NRE investment to up-rev to 2.0
I hear yaâŠhowever, I would never âcompromiseâ my media. It is actually about the math (at least to me). The resolution, encoding, number of sound channels, bitrates, etcâŠI have spent the last 30+ years building my entertainment system (paid more for my first large format TV than I did for my first carâŠboth in the early 90âs). Have you seen a classic movie remastered, going from <320 to 1080? Itâs beautiful. I donât believe in optimizing my media to work across different hardware platforms, I want the file as compressed as it can be while being as close to lossless as possible. I understand, that I am a minority, and most of your movie/music/hifi fanatics use optical media vs disk storage. Iâve been collecting media from the phone lines since 1982âŠand still actually have my original first MP3 CD (the year the compression format was released) 169 songsâŠand most people back then thought that was impossible. Whether Iâm watching a movie from my PMS server in my Man Cave with 16 speakers on a 4k screen (not gonna say how big, because it already feels like Iâm already kinda boasting), or watching 1300 miles away on my Momâs 5.1 Bose system on a 65 inch 1080p tv in her living room, I want the best experience. Not to mention, we have some really cool directors nowadays that are shooting great new films with really cool 4k RED cameraâs, and now letâs throw in 3d (and double the resolution). I guess you could use that feature of Plex, aka Optimize (that I wouldnât dare to use). Seriously, no offenseâŠnot trying to be a snob!. Lastly, I would like to add, because I have to expose my PMS to the internet, I wouldnât dare let Plex have ANY write privileges to any of my media (which sucks because I canât use a lot of the cool sync features). But hey, no code is perfect, the db structure is based off a giant tree and folder mapâŠand since folks can basically hack anything nowadays, I would like to keep my media (and all my other data safe and sound). Full circleâŠjust trying to be an IT Fortune teller hereâŠand am interested in taking an expensive graphics card and be able to be a mini Netflix provider to my friends and family (while providing the highest resolutions and as many audio channels the film provides). Using /exploiting the hardware is keyâŠhatâs off to the Plex development team because at least they figured out to offer multithreaded software transcodingâŠ
Yes, Itâs all about the math. It sounds like you and I have been around this stuff since flyback transformers where the first thing you learned NOT to touch >:)
What I find most interesting is how long it took for this tech to make it to consumer. I had a stint building full fidelity flight trainers complete with video and graphic rendering (aka⊠early CGI)
I use Linux because I have that absolute protection afforded by the OS. PMS runs as an unprivileged user; files are set read-only to PMS. There is also a full mirror in case I mess up and a full offline backup set of everything.