Any news on GPU transcoding, especially Intel Quick Sync?

Hi there!

 

Is there any development towards using the GPU to transcode? Particularly Intel Quick Sync would be great as many users have an Intel CPU. 

This would be a huge improvement for the media server. The low power Atom based J1900 (10w tdp) would become THE perfect home Plex Server and NAS.

 

Greetings, oech!

 

 

Hi there!

Is there any development towards using the GPU to transcode? Particularly Intel Quick Sync would be great as many users have an Intel CPU. 

This would be a huge improvement for the media server. The low power Atom based J1900 (10w tdp) would become THE perfect home Plex Server and NAS.

Greetings, oech!

No news.

To any one from Plex I don't want to be a bother but I am wondering is there a reason that all of the transcoding is done on the CPU and there seem to be no visible effort to include GPUs or Quick Sync.  From my understanding it is almost always faster and better to do something like trasncoding on "purpose built hardware" then on the general purpose hardware(CPU).  

Are there some major hurdles that are keeping this from happening in Plex?

To any one from Plex I don't want to be a bother but I am wondering is there a reason that all of the transcoding is done on the CPU and there seem to be no visible effort to include GPUs or Quick Sync.  From my understanding it is almost always faster and better to do something like trasncoding on "purpose built hardware" then on the general purpose hardware(CPU).  

Are there some major hurdles that are keeping this from happening in Plex?

Yes. Only a small part of the transcoding process can be off-loaded to a chip when using for example OpenCL. Only a small gain would be given to the process. At the same time the work to support all the different "purpose-built chips" are using different codebases. So OpenCL are for some platforms, QS for others. So an enormous amount of work would have to be put in, and that work would only give max 13% boost. 

You could focus on maybe 1 XXX architecture so people building a server just for plex could purchase GPUs that support XXX, instead of supporting none at all.

Just a thought.

Yes. Only a small part of the transcoding process can be off-loaded to a chip when using for example OpenCL. Only a small gain would be given to the process. At the same time the work to support all the different "purpose-built chips" are using different codebases. So OpenCL are for some platforms, QS for others. So an enormous amount of work would have to be put in, and that work would only give max 13% boost. 

Thanks for the reply but you made me very sad :(

You could focus on maybe 1 XXX architecture so people building a server just for plex could purchase GPUs that support XXX, instead of supporting none at all.

Just a thought.

If this is possible I would recommend going with Intel Quick Sync :D  (J1900 fan) but if you can only move 13% it wouldn't be worth it.

Going forward when you start on Version 3.0.0.0 or 4.0.0.0 of the software (full code re-write) maybe offloading transcoding could be a guiding factor.

Thanks

I really can not believe that it would be only a 13% performance increase.

Basically converting a Blu Ray M2TS to a 1080p h264 MP4 with AAC or AC3 is the same thing as transcoding for streaming. The only difference is that you save your result instead of streaming it directly.
 

And using CPU only on a J1900 results in a conversion with 10-15 frames per second and 100% load all the time.

Converting the same thing with Intel Quick sync results in 70-90 FPS with only about 50% load.

For me this is 700% not 13%. And this with only half of the CPU load. Perhaps with an overclocked Intel i7 the increase is about 13% but then again the power saved when using GPU would be really nice to have.

So I really can not believe it is not worth the effort. 

I really can not believe that it would be only a 13% performance increase.
 
Basically converting a Blu Ray M2TS to a 1080p h264 MP4 with AAC or AC3 is the same thing as transcoding for streaming. The only difference is that you save your result instead of streaming it directly.
 
And using CPU only on a J1900 results in a conversion with 10-15 frames per second and 100% load all the time.
Converting the same thing with Intel Quick sync results in 70-90 FPS with only about 50% load.
For me this is 700% not 13%. And this with only half of the CPU load. Perhaps with an overclocked Intel i7 the increase is about 13% but then again the power saved when using GPU would be really nice to have.

So I really can not believe it is not worth the effort.

Sounds like your comparison might be flawed. It is more likely that you are comparing transcoding with remuxing. Remuxing a container into another, and only transcoding a small part of a video container (like for example the audio) is much quicker than transcoding all parts. As I have understood it M2TS is more likely to contain H264 already, so no transcoding needed for the video stream of that container.

One needs to remember that there are a large number of stages to manage when it comes to transcoding a file, and not all stages can be accelerated by the GPU. In fact only 2 out of 7 (?) stages can be GPU accelerated. Albeit one of them is the most intensive one.
From what I have been able to read up on there seems to be quite a lot of sales-talk from the companies out there, but when it comes down to it the solutions that exist today many are quite bad. The only ones that perform quite well does not have big gains (the 13% I was talking about), and the ones that are quick and energy efficient are producing crap quality images. "Frustration" seems to be the general consensus-feeling when people talk about GPU assisted transcoding. The use of graphics cards in transcoding tasks still requires the help of a CPU. Even when GPU decoding and encoding are both used, with Cyberlink for example, CPU core occupation is 100% for one core with the Radeons or the Intel HD 3000, 100% for two cores with the GeForces and as many as four cores at 100% occupation with Arcsoft.

It seems like GPU assisted transcoding is still a bit off until we have something good in our hands. I do believe that it will come, and when it does it will change quite a lot in the consumer video business. But as I have understood it we are not there yet.

DISCLAIMER: I am far from an expert regarding this. I have based my view on this by googling and things I have overheard some of the Plex devs mentioning over the years. It is quite possible that you Oech is a real expert in this area, and therefore I feel that me arguing about this is not really worth it. My arguments are based on others view on it, and I can't argue against someone that has a deeper understanding about this than me (if that is the case).

No, it is definitely not flawed. I am using DVD Fab and there is an option for copying the video source or converting it. 

The video was transcoded, not remuxed. The 50% load are 2 cores with full load and 2 cores idle. (1 core will be for transcoding audio, the other core for assisting Intel QS)
With software transcoding all 4 cores have full load.
The enormous 700% gain is also owed to the low computing power of the J1900. It only has a passmark score of 1943. And Quick sync does not depend on the CPU that much as it is the same encoding chip in all Intel HDs.

There are also reviews showing that Intel Quick Sync is far better and faster then Nvidia and AMD.

http://www.anandtech.com/show/4083/the-sandy-bridge-review-intel-core-i7-2600k-i5-2500k-core-i3-2100-tested/9

and

http://www.tomshardware.com/reviews/sandy-bridge-core-i7-2600k-core-i5-2500k,2833-5.html

No, it is definitely not flawed. I am using DVD Fab and there is an option for copying the video source or converting it. 

The video was transcoded, not remuxed. The 50% load are 2 cores with full load and 2 cores idle. (1 core will be for transcoding audio, the other core for assisting Intel QS)
With software transcoding all 4 cores have full load.
The enormous 700% gain is also owed to the low computing power of the J1900. It only has a passmark score of 1943. And Quick sync does not depend on the CPU that much as it is the same encoding chip in all Intel HDs.

These are the people you need to convince first, and be sure to note the licensing conditions which may make it tricky for any commercial product to use and inflexible in that usage.  Also worth noting are the OS restrictions which make it unlikely for Plex given they seem to aim for feature parity across all their supported platforms. (This has been mentioned by devs in other similar transcode offload threads)

Basically, I wouldn't be holding your breath to see this.

Sorry about the confusion earlier, Atrus was slightly confused regarding the differences between x264's OpenCL support and Intel's ISMD and QuickSync video, and asked me to drop in and explain the details.

OpenCL support in x264 is only able to offload the lookahead process to the GPU; lookahead isn't the largest part of H.264 encoding, but it's the easiest to parallelize, and thus the best-suited to being performed on the GPU. It never produces any quality degradation. Since lookahead isn't as important, it only gets about a 13% improvement (which Atrus mentioned), and this would be even smaller using the settings Plex Media Server generally uses (which are optimized for transcoder speed). This support probably isn't worth much to the end-user, but may end up usable (with no user interaction) in a future Plex Media Server version simply due to ffmpeg and x264 being updated and some minor additional tweaks, but I can't provide a confirmation or timeline on that.

Intel SMD (Streaming Media Drivers) are hardware audio/video decode/encode and other processing systems often seen on NAS machines, and QuickSync Video is a similar system seen on newer mainstream processors (starting with Sandy Bridge, expanded on Ivy Bridge, and available on almost all Haswells). These 2 systems (which have entirely separate SDKs) allow, in the best case, for the entire transcoding process to be performed on dedicated decoding and encoding hardware on-die with the CPU, which can create a huge speed advantage, especially if the CPU is already loaded (e.g. with another transcode, or some other program the user is running) or was weak to begin with, but they're limited to a fixed number of concurrent transcodes, and usually result in significantly lower quality than x264's output. Still, the process is worth while for many users' servers, so we may be looking into integrating these systems into Plex Media Server at some point in the future, but I also can't provide any confirmation or timeline. If I was Adam Savage, I'd say "Plausible".

Lastly, Intel Clear Video and similar technologies allow most or all of the H.264 decoding process to be performed on a general-purpose GPU. This is more useful for reducing power usage during playback on laptops and similar, but could provide some (no idea how much) performance boost in some cases for the transcoder. To be honest, I haven't even begun to research whether or not it'd be feasible to use GPU decoding across our supported platforms in ffmpeg, so I don't even know if it's remotely probable as a future feature.

All of the above features are things that we could look into in more detail for the transcoder in the future, but our dev team is small, and we can't focus on the transcoder as much as we'd like to. However, the transcoder itself is a fork of the open-source project ffmpeg, to which anyone can submit feature requests or patches to implement said new features. If you'd like better support for hardware transcoding, upstream is always a good place to look.

A fork of ffmpeg has implemented quick sync support.

 

https://github.com/drocon11/ffmpeg-qsv

 

HOWEVER, like all before it - depends on the Intel Media SDK as a build requirement. Which Intel only provides Windows and Linux. (no OS X, no FreeBSD).

A fork of ffmpeg has implemented quick sync support.

 

​https://github.com/drocon11/ffmpeg-qsv

 

HOWEVER, like all before it - depends on the Intel Media SDK as a build requirement. Which Intel only provides Windows and Linux. (no OS X, no FreeBSD).

Beat me to it ;)

There is no real debate about the speed increases to be had with quick sync as they are pretty substantial. As for the quality debate, I think when we are remote and need to start transcoding that quality goes to the sideline anyways as we are more concerned with bandwidth at that point.

Thoughts?

Has there been any movement on this? QNAP has implemented QuickSync in its own media streaming apps. It strikes me as odd that a NAS manufacturer is outdoing Plex here for streaming media. I appreciate that Plex's resources are limited, but as Roger mentioned above, this presents such a huge leap in performance for transcoded media. Perhaps it's simplistic of me to assume Plex can do this just because QNAP can, but media streaming isn't even QNAP's primary business and they've managed to do it.

http://www.anandtech.com/show/8192/qnap-tsx51-nas-series-intel-quick-sync-gets-its-killer-app/4

Has there been any movement on this? QNAP has implemented QuickSync in its own media streaming apps. It strikes me as odd that a NAS manufacturer is outdoing Plex here for streaming media. I appreciate that Plex's resources are limited, but as Roger mentioned above, this presents such a huge leap in performance for transcoded media. Perhaps it's simplistic of me to assume Plex can do this just because QNAP can, but media streaming isn't even QNAP's primary business and they've managed to do it.

http://www.anandtech.com/show/8192/qnap-tsx51-nas-series-intel-quick-sync-gets-its-killer-app/4

But Qnap only has to handle their OS. Plex has to come up with a solution that works on linux, windows, and osx servers. Thats pretty much the number one hurdle.

Has there been any movement on this? QNAP has implemented QuickSync in its own media streaming apps. It strikes me as odd that a NAS manufacturer is outdoing Plex here for streaming media. I appreciate that Plex's resources are limited, but as Roger mentioned above, this presents such a huge leap in performance for transcoded media. Perhaps it's simplistic of me to assume Plex can do this just because QNAP can, but media streaming isn't even QNAP's primary business and they've managed to do it.

http://www.anandtech.com/show/8192/qnap-tsx51-nas-series-intel-quick-sync-gets-its-killer-app/4

As noted in another thread on the topic, there would only be very slight gains in the transcoding process, so it is really not worthwhile.  You would not see the gains in performance in Plex that you would see in handbrake or some other QuickSync-enabled application.  IIRC, at best you would see about a 13% improvement possible under best conditions.  You can search for the thread, and I think Atrus made the comments, but I am getting old and feeble of mind...

As noted in another thread on the topic, there would only be very slight gains in the transcoding process, so it is really not worthwhile.  You would not see the gains in performance in Plex that you would see in handbrake or some other QuickSync-enabled application.  IIRC, at best you would see about a 13% improvement possible under best conditions.  You can search for the thread, and I think Atrus made the comments, but I am getting old and feeble of mind...

Partially true. But I did mess up one thing. For an accurate description, read this: https://forums.plex.tv/topic/112471-any-news-on-gpu-transcoding-especially-intel-quick-sync/?p=670646

Yes. Only a small part of the transcoding process can be off-loaded to a chip when using for example OpenCL. Only a small gain would be given to the process. At the same time the work to support all the different "purpose-built chips" are using different codebases. So OpenCL are for some platforms, QS for others. So an enormous amount of work would have to be put in, and that work would only give max 13% boost. 

The 13% increase may be for OpenCL. It isn't for QuickSync.

The 13% increase may be for OpenCL. It isn't for QuickSync.

Exactly.

MediaBrowser is trying to implement QuickSync - it only relies on swapping out the ffmpeg.exe and a few bits on the server side to trigger it.

http://mediabrowser.tv/community/index.php?/topic/10723-gpu-transcoding/page-2