Distributed Transcoding/Scaling

obviously I am not suggesting trying to run one instance of the transcoder across 8 boxes, that would not really help and just cause tons of overhead... 

Although it wouldn't be that difficult to split the transcode of a single job that's meant to be synced over multiple PCs, accelerating the time it takes to sync.   Give each transcoding agent a specific, relatively large, portion of the media to work on, and combine the results. Two equally equipped transcoding agents could complete the transcode job in roughly half the time of a single transcoding agent, provided the network isn't saturated.  How to combine the results in a efficient way would require some thought, but doable.

For streaming, it wouldn't make much sense to split a single job over multiple agents.  Any single agent would need to be able to transcode at least one stream in real time, and you really can't watch it faster than that.

you can get blade servers for very cheap on ebay with tons of i7 based xeons... believe me 8 blades totalling 16 hex core xeons will blow away pretty much any single box you can find

if you have 8 boxes available, each running one instance of the transcoder it will perform much better than one box running 8 instances of the transcoder... thats pretty simple scaling to me?

obviously I am not suggesting trying to run one instance of the transcoder across 8 boxes, that would not really help and just cause tons of overhead... this is more for scaling/capacity and the folks running Plex on NAS devices

But at what cost?  Complaining that pre-transcoding takes too much storage (and could be separated across multiple boxes already) falls a bit flat when you suggest buying an array of blade servers. I cannot picture a situation in which additional storage would be more expensive than 8 blades and 128 cores.

Also, you must recall that the source file exists in one place.  Having 8 read threads at different places is a concurrency issue and will not result in the best performance.  

My problem is creating what amounts to an edge case (many tuners writing new content to the server) but not seeing that the requirements for that do not apply to most users.  If I create 8 new pieces of content and expect to watch them immediately at any bitrate, I have to apply significant horsepower to that.  Even pure computational items are limited in scaling efficiency (70-80% is considered great).  This is generally applied to x cores.  Having 4 cores might produce results 2.8 times (70% efficiency) faster than 1 core.  This does not take into account any IO.  When you realistically talk about a 10-15 GB source (conservatively) over a gigabit network with spinning disk drives at each end, you have to have significant memory to keep the pipeline moving efficiently.  Since there is still only one server during this time, all work still has to be sent back to the server to be transmitted to the remote client.  You can either have the segments so small that they stay in memory on the remote transcoder or they have to go to disk.  One means that the overhead on the true server is higher because it creates more segments.  The other option introduces disk lag on top of network latency.  

I agree that distributed transcoding is out there and can be beneficial, but there is significant difference when doing this on the fly, on demand for remote clients.  All the benefits seem a bit optimistic and it would seem few have additional (modern) machines sitting around unused.  It would appear that the audience to gain much from this would be very small.  I have a NAS installation of Plex.  If I felt I needed more real-time transcoding, I would replace the very power-efficient processor in it now with a much faster processor.

If someone wants to run down the theoretical benefits and attempt to make it work, more power to them.  But I believe this should be an alternative to the normal Plex installation and not part of the standard product.  I just don't see the overhead of the distributed transcoding as beneficial to any but a minuscule percentage of the install base.

But at what cost?  Complaining that pre-transcoding takes too much storage (and could be separated across multiple boxes already) falls a bit flat when you suggest buying an array of blade servers. I cannot picture a situation in which additional storage would be more expensive than 8 blades and 128 cores.

See my aforementioned reference to subtitles as just one example of where a transcode cluster could be very useful, and where pre-transcoding will not fill the need.  Sure it would be great for Plex to support native subtitles on all platforms, but that's not going to happen any time soon, and is actually far more complex to implement than distributed transcoding.

As for an "array of blade servers", that only speaks to the scalability of the approach.

Also, you must recall that the source file exists in one place.  Having 8 read threads at different places is a concurrency issue and will not result in the best performance. 

I think it has been stated multiple times that only one thread needs to access a single file source, since it's assumed any single node in the cluster would be able to handle transcoding in real time.  The only reason you'd distribute a single source over multiple is if you wanted to transcode that single file faster, for a sync job (a transcode that will be synced rather than streamed).  An optional configuration.
 
If you're talking about bandwidth between the NAS, transcoding nodes, Plex Media server, and clients, then sure, it'll put a little more strain on that if it's not reconfigured -- but again, we're not talking about uncompressed video here.  Gigabit would be plenty for the average user.  There are certainly options for larger installations.
 
 

This does not take into account any IO.  When you realistically talk about a 10-15 GB source (conservatively) over a gigabit network with spinning disk drives at each end, you have to have significant memory to keep the pipeline moving efficiently.  Since there is still only one server during this time, all work still has to be sent back to the server to be transmitted to the remote client

 
This is not very different than many Plex installations today.  The PMS pulls from a separate NAS, transcodes, and delivers to a separate client.  In this case, we add one more step:  The transcoding node pulls from the NAS, transcodes, delivers to PMS, which in turn delivers to the client.
 
Again, this doesn't need to happen any faster than real time.   We can take 2 hours to transcode that 20GB source for a 2 hour movie, processing a small chunk at a time, as needed (after initial buffering), just as Plex's transcoder does it now.  Easily doable without saturating a gigabit network.
 

It would appear that the audience to gain much from this would be very small.   I have a NAS installation of Plex. 

Are you saying you don't have a desktop or laptop sitting around that could act as transcoding agent, or simply that you wouldn't bother using it as a transcoding agent, even if the feature were available right now in Plex?

The audience is actually much larger than you think it is.

 

If someone wants to run down the theoretical benefits

You're over-thinking this one.  This is not a very complex task, and the benefits are obvious.   The most complex aspect is how to keep un-throttled sync transcodes from saturating the network -- but again, if your media isn't on the same machine as your Plex Server, that issue already for you exists today.

My media is on the same machine because my NAS (unRAID) runs my Plex installation.  I actually do not have an extra computer with more horsepower than the NAS because I specifically built the NAS to handle transcoding.

Now I see that you are really referencing one external transcoder in most instances.  This would produce no more contention for the source than normal, but would not solve it, either.  The benefits of just moving transcoding off to another machine will still be throttled by additional IO.  If you had avoided the network usage of separate NAS and PMS installations, you would be adding the traffic of the transcoded data going back to PMS to the flow.  This could, in low-quality networks, still provide some additional hurdles because the faster transcoding machine could saturate the network.  This could be overcome by avoiding low quality links between the PMS (possibly the NAS) and the transcoder.  With the separate NAS setup, this means you will have 2x the bitrate of the source file (one NAS to PMS, one PMS to transcoder) plus 2x the bitrate of the transcoded content (transcoder to PMS, PMS to router/remote client).  This will choke a 100Mbit network if it is a blu-ray source.  On a 1000Mbit network, you should be ok, as long as your wiring is up to task.  

I completely agree that from the computational side, this is a no-brainer.  The problem comes in you take into account the network and disk overhead.  Really good networking (quality cabling and 1000Mbit switching) and well-planned disk layouts (separating the source material from the area where Plex stores temporary transcode data) can significantly overcome a lot of these hurdles, but it takes a bit of planning.

I work in high performance environments daily and have for 20+ years.  Often we find that the processing power is the easiest problem to solve, but IO gets very expensive.  That being said, I've beaten this dead horse enough, so I will refrain from further comment.  

But at what cost?  Complaining that pre-transcoding takes too much storage (and could be separated across multiple boxes already) falls a bit flat when you suggest buying an array of blade servers. I cannot picture a situation in which additional storage would be more expensive than 8 blades and 128 cores..

storage is very expensive

48x 2Tb RE4's will cost over $9k... plus something to put them in

a 4 blade Dell C6100 with 2x L5520's and 24Gb DDR3 per blade costs about $1500

storage is very expensive

48x 2Tb RE4's will cost over $9k... plus something to put them in

a 4 blade Dell C6100 with 2x L5520's and 24Gb DDR3 per blade costs about $1500

And why do you think you need 48 drives when the discussion has been centered about mobile (low bitrate) clients?  Permanently storing a very good H.264 1080p transcode along with a low bitrate transcode takes less space than storing the original blu-ray rip.  

I have an 16TB unRAID array that cost less than $2000, including storage (10 2 TB drives at an average of about $90 per drive:  8 data, one parity, one cache, low power cpu, motherboard, power supply, very nice case, additional SATA controller)

A load balance option would be awesome. 

+1

And why do you think you need 48 drives when the discussion has been centered about mobile (low bitrate) clients?  Permanently storing a very good H.264 1080p transcode along with a low bitrate transcode takes less space than storing the original blu-ray rip.  

I have an 16TB unRAID array that cost less than $2000, including storage (10 2 TB drives at an average of about $90 per drive:  8 data, one parity, one cache, low power cpu, motherboard, power supply, very nice case, additional SATA controller)

because I like things in high bit-rate, but the ability to play on mobile clients is nice too...

I have 72 2Tb drives in 6x 20Tb RAID6's and 16x 1Tb drives in a 14Tb RAID6... it cost more than $2k I assure you

And why do you think you need 48 drives when the discussion has been centered about mobile (low bitrate) clients?  Permanently storing a very good H.264 1080p transcode along with a low bitrate transcode takes less space than storing the original blu-ray rip.  

I have an 16TB unRAID array that cost less than $2000, including storage (10 2 TB drives at an average of about $90 per drive:  8 data, one parity, one cache, low power cpu, motherboard, power supply, very nice case, additional SATA controller)

Here is my use case:

I have a home server (AMD Athlon 5350), which I use for several virtual box instances and as my NAS (with 4TB of storage). The hardware previously used (Intel Core 2 Duo E6400) for the home server has been switched in to be my gaming rig*. Everything in the house is connected via a Gigabit Ethernet network, I have Plusnet Fibre which gives 50-70 Mbps down and 15-25 Mbps up.

The new home server is good enough for 2-3 transcoding streams depending on the client, while the old server was capable of 1-2. Since we are a 2 person household this suits our needs quite nicely.

Various family members have seen my setup and asked me to replicate it with them, which has caused me a problem since there are times when multiple family members access my shared libraries.

Having the ability to switch on my old server and have it act as a transcoding node so I can now stably stream to 3-5 people would be really helpful to me.

Not all of us can afford to buy multiple TB's of storage with the latest i7, but we do have a number of old machines lying around.

*It took 6 months to put together the cash for the updated home server, I expect it will be a year before I get something decent for gaming put together. Also it’s good enough for Kerbal Space program and Civilisation V. 

Here is my use case:

I have a home server (AMD Athlon 5350), which I use for several virtual box instances and as my NAS (with 4TB of storage). The hardware previously used (Intel Core 2 Duo E6400) for the home server has been switched in to be my gaming rig*. Everything in the house is connected via a Gigabit Ethernet network, I have Plusnet Fibre which gives 50-70 Mbps down and 15-25 Mbps up.

The new home server is good enough for 2-3 transcoding streams depending on the client, while the old server was capable of 1-2. Since we are a 2 person household this suits our needs quite nicely.

Various family members have seen my setup and asked me to replicate it with them, which has caused me a problem since there are times when multiple family members access my shared libraries.

Having the ability to switch on my old server and have it act as a transcoding node so I can now stably stream to 3-5 people would be really helpful to me.

Not all of us can afford to buy multiple TB's of storage with the latest i7, but we do have a number of old machines lying around.

*It took 6 months to put together the cash for the updated home server, I expect it will be a year before I get something decent for gaming put together. Also it’s good enough for Kerbal Space program and Civilisation V. 

Keep in mind that my Plex server is running on a CPU that cost a grand total of $95 several years ago. 

Permanently storing a very good H.264 1080p transcode along with a low bitrate transcode takes less space than storing the original blu-ray rip.  

"Low bit rate" is a moving target and depends on which client is connecting from where.  1.5Mbps?  720Kbps? Lower?  Higher?  Can we increase quality using CABAC and/or B-frames for the given client?   If the goal is to avoid transcoding entirely, that means you're going to need a version for a great number a bit rates.

I'll also again point out that if we add subtitles to the mix, and don't want to permanently burn them in every transcode, it's almost impossible to eliminate transcoding while maintaining one of Plex's greatest features: broad client support.

 

And why do you think you need 48 drives when the discussion has been centered about mobile (low bitrate) clients?  

Mobile, low bit rate clients are only one aspect of the discussion.  The larger point is that distributed transcoding fits many use cases.

+1 - one piece of software as an example that does this is mythtv. It has the option of slave backends. When the main backend server is out of resources (or meets some user specified limit), it can send WOL packet and then begin sending transcoding jobs to slaves. There is no limit to the number of slave backends you can use.

+1

[edit]

One way to implement this could be to allow the server with the file to direct stream or upload to the server without the files. upload small files such as subtitle files. Each server could maintain a cache of files that can be refreshed periodically. If both servers have local copies for the same file, you should use the locally stored files to generate the stream. but then you'd have to find a way to sync play data.

The Issue then would be handling hand-offs between servers. This is where it gets more complicated, particularly when you stream over the internet.  Locally, the servers could tell the client to switch to the stream coming from another server and handle the setup on the back-end which would add time to starting the videos. This would also cause re-buffering issues if the server decide to hand-off mid stream.

For the internet, you'd have to tell the clients to switch ports but control via the original port just to maintain the connection?  

Should we have to option to obfuscate the fact that we have multiple servers and present one coherent library?

you could do it the same way exchange does it, you have frond end and back end services and you can run them on one or many servers to satisfy load requirements

Someone found this on /r/plex https://github.com/rxsegrxup/RunTellThat and now I see it in the developers forum: https://forums.plex.tv/topic/117967-beta-runtellthat-for-plex/?hl=runtellthat

Load balancer and proxy for plex but needs some more code work to separate out the users instead of running them as one big super user and losing features like watch list, etc

It’d be great to hear from someone who’s got it working.

I’m struggling to find something to distribute watched status among shared users across servers never mind transcoding.

Someone found this on /r/plex https://github.com/rxsegrxup/RunTellThat and now I see it in the developers forum: https://forums.plex.tv/topic/117967-beta-runtellthat-for-plex/?hl=runtellthat

Load balancer and proxy for plex but needs some more code work to separate out the users instead of running them as one big super user and losing features like watch list, etc

wow very cool, but that last bit is pretty much a deal killer

+1

+1

1 Like

But at what cost?  Complaining that pre-transcoding takes too much storage (and could be separated across multiple boxes already) falls a bit flat when you suggest buying an array of blade servers. I cannot picture a situation in which additional storage would be more expensive than 8 blades and 128 cores.

you know, I was thinking about this and your scenario is definately worth considering, at some point throwing more CPUs at the problem stops making sense, but it would take a very very large user load or a very rapid decline in the cost of storage (which is happening slowly but surely)

so maybe in addition to developing distributed transcoding it would be cool if plex could support some sort of stacked pre-transcoded files (which I would imagine would be much easier from the development standpoint)

so say

Movie XYZ.1mbit.mkv

Movie XYZ.3mbit.mkv

Movie XYZ.10mbit.mkv

Movie XYZ.original.mkv

etc or something

preventing CPU transcoding at all, and realistically the lower bit-rate files should be pretty small