Quicksync HW Accelerated Plex Transcoder crashing Ubuntu

Server Version#: 1.19.1.2645-ccb6eb67e
Player Version#: n/a (any client)
OS: Ubuntu 19.10 (upgraded today, still same issues as 18.04)

Every single time a transcode is fired up, the server crashes within a minute or so of playback. The entire OS completely shuts down abruptly and my syslog and Plex logs are left with remnants of an illegible (partially flushed) log message. I started noticing this behavior over the weekend, and when I looked back on Monday, I saw over 100 unclean reboots, all during waking hours when users were playing back transcoded content.

Troubleshooting steps I have tried:

  • Replace CPU (8500 --> 8500T – same issues)
  • Memtest86 (4+ hours - clean)
  • Prime95 (clean)
  • Upgrade Ubuntu from 18.04 to 19.10
  • Downgrade plex (to plexmediaserver_1.18.8.2527.deb)
  • Remove and clean install of Codecs folder
  • recursive chown plexmediaserver directory (just in case)
  • add plex to render group (even though /dev/dri is still 660 root video)

Log files attached (check Plex Media Server.1.log to see the transcode-induced crash and bad flush to log file. I disabled transcoding after that crash so my users could continue to direct play content, which may show in the newest log.

Plex Media Server Logs_2020-04-17_19-28-20.zip (4.5 MB)

Update: 12+ hours later of disabled HW transcoding and no crashes.

Same issue when the /dev/dri device is passed into a newly created Plex docker container for HW transcoding. The entire host crashes.

Is this in a VM?

The last time I saw this with any repeating frequency was with network adapter problems.

The specific case what LRO (Large Receive Offload) was enabled.
Ubuntu had (maybe having regressed) trouble handling the memory.

The result was a kernel panic.

Is this desired?

Apr 17, 2020 19:15:01.340 [0x7f1c0bfff700] DEBUG - Request: [172.22.1.191:49346 (WAN)] GET /:/timeline

This segment is RFC-1918 private network compliant and should therefore be LAN, not WAN.

I’ll quickly describe the current infrastructure of Plex and other semi-associated apps:

Plex runs on baremetal install of Ubuntu 19.10 (HP EliteDesk 800 i5-8500T w/ QuickSync) with a single NIC on the motherboard with IP 172.22.1.192.

I run other associated apps inside of VMWare CentOS VM on a Windows 10 host (this has been a temporary setup until migration to Unraid). The most relevant apps include an nginx reverse proxy, ombi, radarr, sonarr. The Windows 10 host has IPs 172.22.1.197 and 172.22.1.198. The VM has IP 172.22.1.191.

Throughout all of the crashes, the reverse proxy has not been setup in the Network settings of Plex, so traffic was coming in directly to the Plex port exposed through the router. I was testing it on and off this weekend, but this problem has persisted before any Plex traffic was proxied through nginx, so I don’t think that app is particularly relevant to this issue at the moment. Some of the other API calls coming from 172.22.1.191 are probably from Ombi, sonarr, and radarr, so the amount of traffic should be light.

It looks like LRO is disabled on the baremetal Plex host though:

mike@Mr-Meeseeks:~$ ethtool -k eno1 | grep large-receive-offload
large-receive-offload: off [fixed]

It sounds like there is another, to date unknown bug, in the Intel iHD_drv_video which Engineering will need to see if they can find and fix it too.

Here is a simple test to confirm. If the iHD_drv_video driver (Intel’s new ASIC API for VAAPI) isn’t working for the -8500T, this is our best shot at a workaround for now.

  1. Stop Plex

  2. As root, edit /var/lib/plexmediaserver/Library/Application Support/Plex Media Server/Preferences.xml

  3. Before the closing /> place the following text (surrounded by spaces)

VaapiDriver="i965"

  1. Save the file
  2. Restart Plex
  3. Take for a test drive.
  4. If it doesn’t fail then we’ve confirmed where the problem is located.
1 Like

I think you just hit the nail right on the head. I made the above change, force transcoded a 4K episode that would previously reliably fail right around 1 minute in, confirmed that it was HW transcoding, and 5+ minutes in without any crashes. I’ll keep an eye out over the next day or two to confirm these results… Any disadvantages with using i965 over the newer iHD_drv_video driver that you know of?

Originally I thought this was a hardware issue from overheating, power supply, or a defective component based upon the abrupt shutdowns and lack of meaningful log messages in all the obvious places. I’ve never seen a driver bring down an entire Linux OS so quickly and quietly, but I suppose there’s a first for everything.

The only known disadvantages of the the i965 driver are:

  • Use on ApolloLake (J3xxx and J4xxx CPUs - Celerons with 600 series ASICS)
  • Videos with a sustained bitrate of greater than 130+ Mbps. (VERY few exceed this)

Over a week and well over 50 transcodes later and not a single crash! I really appreciate your help @ChuckPa.

mike@Mr-Meeseeks:~$ docker ps | egrep plex
\375cc3368cd8        linuxserver/plex            "/init"             9 days ago          Up 8 days                                                              plex

mike@Mr-Meeseeks:~$ last reboot | head -5
reboot   system boot  5.3.0-46-generic Sun Apr 19 17:42   still running
reboot   system boot  5.3.0-46-generic Fri Apr 17 19:16   still running
reboot   system boot  5.3.0-46-generic Fri Apr 17 18:11   still running
reboot   system boot  5.3.0-46-generic Fri Apr 17 17:49   still running
reboot   system boot  5.3.0-46-generic Fri Apr 17 17:04   still running

Any ideas when Intel might fix this or is this something Plex would look into a workaround for using the newer Intel Media Driver?

Plex’s own transcoder team is working on the driver instead of waiting for Intel.
They’re upstreaming their work in hopes of stimulating some help on the other end.

I’m very happy to hear that the Plex team has taken on the burden of initiating a fix.

I updated my server today and it looks like the temporary workaround for adding the VaapiDriver="i965" to the configuration was once again removed. I noticed this after the update and the server started crashing again. I had a feeling this would be the case, but I just wanted to make anyone else that views this thread aware.

You have the text wrong for the driver.

Letter i (as in eye - lower case), Nine Six Five.

VaapiDriver="i965"

you have: VaapiDriver="1965" which is a One.

The name froms from the “Intel 965” chipset family.
Just like the i7 CPU family .

:slight_smile:

my mistake, I tried to write it from memory :slight_smile: the value was correct in the configuration file but it did get wiped out in the configuration on the PMS update, just so others are aware

How can I add this option in Windows?

If you have Preferences.xml, add it there.
If you don’t then it needs to go in the Registry, I guess???

I’m not a Windows user and can’t advise. (I don’t even have a VM of it)

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.