Transcoding crashes linux

Server Version#: 1.16.2.1321
Player Version#: 3.108.2

So I’ve noticed that if plex is trying to transcode media for playback, it crashes the operating system on my system:

[simon@xanadu]$ cat /etc/centos-release
CentOS Linux release 7.6.1810 (Core)

The problem started showing itself when I enabled creating optimised-for-mobile recordings. It’s not a 100% thing for every transcode, sometimes it can go 10 movies without problems, but at some point the machine will hard-crash (no more ssh, mouse pointer is unmoving, etc.)

I’m not using hardware encoding, and it’s an AMD Threadripper 1950x with 64G of RAM. It’s not overheating (typically ~45 degrees) and it has plenty (64TB or so) of available disk space.

Happy to provide logs if you want, but I’m assuming that anything that takes down the OS isn’t going to allow time for data to be written to disk…

Any hints ?

Please provide what you have in the Logs directory as a tar.gz (gzipped tarball)
There will be data there.

PMS isn’t taking down the host per se but it is exposing a problem in the installation.
Are you 100% certain it’s not overheating given the lack of HW transcoding support?
Spot-overheating can happen and trip one sensor while the temp displayed isn’t obviously above limit. The Threadripper can be at 40-70c in normal cases.

Also please remember: (https://community.amd.com/thread/230368)

I have learned that the Ryzen Master temperature is showing the reference temperature for all Ryzen processors. 57C for the CPU temperature on AIDA64 or others probably has the 27C offset for out CPU (1950X). So it is really 30C CPU Diode temperature which is compared to the limit of 68C. Right now AIDA64 is showing 59C for CPU and 32C for CPU Diode.

If there were an illegal instruction, the processor would trap and kill the process.
If a system runtime library is damaged/corrupted, if in kernel space, this could cause it to panic.

Please provide what you have in the Logs directory as a tar.gz (gzipped tarball)
There will be data there

Here you go: plex.logs.tar.gz

Are you 100% certain it’s not overheating…

I mean, I guess it’s possible, though it seems unlikely. I used the same machine for FPGA/ASIC place and route software before it was the media server, and that pegged all the cores for days at a time, and it was rock-solid. There’s no stress-test quite like running P&R for an ASIC…

Here’s what I see from ‘sensors’:

[simon@xanadu]$ sensors
k10temp-pci-00c3
Adapter: PCI adapter
temp1:        +41.4°C  (high = +70.0°C)

k10temp-pci-00cb
Adapter: PCI adapter
temp1:        +41.4°C  (high = +70.0°C)

The high being set to 70 implies that this is the without-offset value.

The machine is mounted in a 3U case in the garage, with 4 (high pressure and very loud) fans blowing through the case. On top of that, the CPU cooler is water-cooled, and the radiator for the water-cooler is directly in the path of all that air being blown.

Again, I guess it’s possible, but I rarely see the temp get above 50, even on a hot (100 degrees F) day. I’ll set up some temperature-monitoring scripts and poll it every 30 seconds or so, then start up transcoding again. If I can, I’ll try that this evening.

If it’s a system library, is there any good way to try and track it down ? Or is a ‘re-install everything from scratch’ the only sure way ? In the latter case, I’d appreciate some tips on how you migrate the current Plex state to a new machine instance - there’s already a lot of time invested in getting to where I am that I’d hate to throw away…

I did think about using hardware encoding to offset the problem, but this is an AMD machine running Linux, and I wasn’t sure if putting my 980Ti GPU in there would be supported - I’ve read various conflicting reports of hardware encoding being supported/not-supported under Linux…

Cheers
Simon

Thanks for the logs:

The failure appears to have occurred around 21:18:54 22-Jul-2019.

Jul 22, 2019 21:18:54.486 [0x7febab7fe700] DEBUG - Completed: [::ffff:127.0.0.1:35838] 206 PUT /video/:/transcode/session/7dz0ykdleie899j0ytgbib66/6d0da3dc-37d3-448f-963a-f155b8474176/progress/streamDetail?index=16&id=0&codec=dvd_subtitle&type=subtitle&language=rum (10 live) 0ms 256 bytes (pipelined: 22) (range: bytes=0-) 
Jul 22, 2019 21:18:54.491 [0x7febaa7fc700] DEBUG - Request: [::ffff:127.0.0.1:35838 (Loopback)] PUT /video/:/transcode/session/7dz0ykdleie899j0ytgbib66/6d0da3dc-37d3-448f-963a-f155b8474176/progress?duration=1406.000000 (10 live) Signed-in Token (plex@gornall.net)
Jul 22, 2019 21:18:54.491 [0x7febaaffd700] DEBUG - Completed: [::ffff:127.0.0.1:35838] 204 PUT /video/:/transcode/session/7dz0ykdleie899j0ytgbib66/6d0da3dc-37d3-448f-963a-f155b8474176/progress?duration=1406.000000 (10 live) 0ms 203 bytes (pipelined: 23) (range: bytes=0-) 
Jul 22, 2019 21:18:54.674 [0x7febabfff700] DEBUG - Activity: updated activity 29e6d33c-6114-4124-b5fa-2e1a86b51d8e - completed 12% - Generating video preview thumbnails
Jul 22, 2019 21:18:54.995 [0x7feb53fff700] DEBUG - Session 7dz0ykdleie899j0ytgbib66 (4) is unthrottling

There were 4 transcoder sessions running at the time.

As for running against the core or an ASIC. I have seen where ASICs are fine because those kernel modules are up to date but a core can’t keep up due to kernel limitations. Such is the case with the new Intel -9xxx family CPUs. CPU access to the ASIC changed. The ASIC itself is fine.

Is this running in a VM with hypervisor hidden?

The failure appears to have occurred around 21:18:54 22-Jul-2019

Yep, that tallies with my recollection of the last time I went traipsing off to reset the server :slight_smile:

I’m not sure I’m following your ASIC comments - is that to do with hardware encoding ? When I was talking about ASIC/FPGA I meant I was placing very heavy loads on the CPUs for extended periods of time, by running ‘Place And Route’ software. That has nothing to do with Plex per se, it was more a comment on how thermally stable the machine is.

So I have the temperature monitoring script running now:

MariaDB [temp]> select * from xanadu;
+----+---------------------+------+------+
| id | cron                | cpu0 | cpu1 |
+----+---------------------+------+------+
|  5 | 2019-07-23 09:28:01 | 30.6 | 30.6 |
|  6 | 2019-07-23 09:28:31 |   30 |   30 |
|  7 | 2019-07-23 09:29:01 | 29.8 | 29.8 |
|  8 | 2019-07-23 09:29:31 | 30.8 | 30.8 |
|  9 | 2019-07-23 09:30:01 | 30.5 | 30.5 |
| 10 | 2019-07-23 09:30:31 | 39.5 | 39.5 |
| 11 | 2019-07-23 09:31:01 |   42 |   42 |
| 12 | 2019-07-23 09:31:31 | 30.6 | 30.6 |
| 13 | 2019-07-23 09:32:01 | 30.5 | 30.5 |
| 14 | 2019-07-23 09:32:31 | 30.2 | 30.2 |
| 15 | 2019-07-23 09:33:01 | 30.1 | 30.1 |
| 16 | 2019-07-23 09:33:31 |   30 |   30 |
| 17 | 2019-07-23 09:34:01 |   30 |   30 |
| 18 | 2019-07-23 09:34:31 |   30 |   30 |
| 19 | 2019-07-23 09:35:01 | 29.8 | 29.8 |
| 20 | 2019-07-23 09:35:31 | 35.4 | 35.4 |
| 21 | 2019-07-23 09:36:01 | 30.1 | 30.1 |
| 22 | 2019-07-23 09:36:31 | 42.2 | 42.2 |
| 23 | 2019-07-23 09:37:01 | 42.5 | 42.5 |
| 24 | 2019-07-23 09:37:31 | 30.5 | 30.5 |
| 25 | 2019-07-23 09:38:01 | 30.5 | 30.5 |
| 26 | 2019-07-23 09:38:31 | 30.2 | 30.2 |
| 27 | 2019-07-23 09:39:01 | 30.2 | 30.2 |
| 28 | 2019-07-23 09:39:31 | 30.4 | 30.4 |
| 29 | 2019-07-23 09:40:01 | 30.5 | 30.5 |
| 30 | 2019-07-23 09:40:31 | 32.4 | 32.4 |
| 31 | 2019-07-23 09:41:01 | 30.5 | 30.5 |
| 32 | 2019-07-23 09:41:31 | 30.2 | 30.2 |
| 33 | 2019-07-23 09:42:01 | 30.2 | 30.2 |
| 34 | 2019-07-23 09:42:31 | 42.2 | 42.2 |
| 35 | 2019-07-23 09:43:01 | 42.8 | 42.8 |
| 36 | 2019-07-23 09:43:31 | 30.6 | 30.6 |
| 37 | 2019-07-23 09:44:01 | 30.5 | 30.5 |
| 38 | 2019-07-23 09:44:31 | 30.5 | 30.5 |
| 39 | 2019-07-23 09:45:01 | 30.4 | 30.4 |
| 40 | 2019-07-23 09:45:31 |   37 |   37 |
| 41 | 2019-07-23 09:46:01 | 30.8 | 30.8 |
| 42 | 2019-07-23 09:46:31 | 30.8 | 30.8 |
| 43 | 2019-07-23 09:47:01 | 30.5 | 30.5 |
| 44 | 2019-07-23 09:47:31 | 30.5 | 30.5 |
| 45 | 2019-07-23 09:48:01 | 30.4 | 30.4 |
| 46 | 2019-07-23 09:48:31 | 42.2 | 42.2 |
| 47 | 2019-07-23 09:49:02 | 42.5 | 42.5 |
| 48 | 2019-07-23 09:49:32 |   31 |   31 |
| 49 | 2019-07-23 09:50:01 | 30.8 | 30.8 |
| 50 | 2019-07-23 09:50:31 | 34.9 | 34.9 |
| 51 | 2019-07-23 09:51:01 |   31 |   31 |
| 52 | 2019-07-23 09:51:31 | 30.5 | 30.5 |
| 53 | 2019-07-23 09:52:01 | 30.5 | 30.5 |
| 54 | 2019-07-23 09:52:31 | 30.5 | 30.5 |
| 55 | 2019-07-23 09:53:01 | 30.5 | 30.5 |
| 56 | 2019-07-23 09:53:31 | 30.2 | 30.2 |
| 57 | 2019-07-23 09:54:01 | 30.2 | 30.2 |
| 58 | 2019-07-23 09:54:31 | 42.1 | 42.1 |
| 59 | 2019-07-23 09:55:01 | 42.6 | 42.6 |
| 60 | 2019-07-23 09:55:31 | 32.8 | 32.8 |
| 61 | 2019-07-23 09:56:01 | 31.2 | 31.2 |
| 62 | 2019-07-23 09:56:31 | 30.9 | 30.9 |
| 63 | 2019-07-23 09:57:01 | 30.8 | 30.8 |
| 64 | 2019-07-23 09:57:31 | 30.6 | 30.6 |
| 65 | 2019-07-23 09:58:01 | 30.5 | 30.5 |
| 66 | 2019-07-23 09:58:31 | 30.5 | 30.5 |
| 67 | 2019-07-23 09:59:01 | 30.8 | 30.8 |
| 68 | 2019-07-23 09:59:31 | 30.5 | 30.5 |
| 69 | 2019-07-23 10:00:01 | 30.5 | 30.5 |
| 70 | 2019-07-23 10:00:31 | 42.2 | 42.2 |
| 71 | 2019-07-23 10:01:01 | 43.2 | 43.2 |
| 72 | 2019-07-23 10:01:31 | 31.4 | 31.4 |
| 73 | 2019-07-23 10:02:01 | 30.9 | 30.9 |
| 74 | 2019-07-23 10:02:31 | 30.8 | 30.8 |
| 75 | 2019-07-23 10:03:01 | 30.8 | 30.8 |
+----+---------------------+------+------+
71 rows in set (0.000 sec)

I’ll see how that goes when I start a transcode this evening (don’t want to do it now, because I’m remote from the server, so I couldn’t reset it when it crashes).

As far as VM goes, nope, this is a plain-jane install without hypervisor on the bare metal.

My ASIC reference is based on my experience with them over the years.

In Plex context, I obviously referred to the QSV ASIC. Since AMD uses MESA, there is no ASIC access.

Where I am befuddled is that an application failure can trap and bring down the host.

This almost only ever happens when the application causes a fault in a kernel driver. (reference: e1000e ethernet adapter in VMWare with bad Ubuntu driver interaction taking down VMWare.

My ASIC reference is based on my experience with them over the years.

In Plex context, I obviously referred to the QSV ASIC. Since AMD uses MESA, there is no ASIC access.

Gotcha. I had to just go and look up what QSV meant :slight_smile: My ASIC stuff is my own designs, not 3rd party stuff :slight_smile:

I’m equally confused about the machine crashing. Up until now it’s been very reliable …

I do have another beefy machine I could install Plex onto. Is there a migration path for a Plex installation from one machine to another, so I don’t lose all the movies/tv shows I’ve already digitised ?

[edit: never mind, found this - I can install a virgin Centos onto the other machine and see where we go from there. Probably a weekend job, though]

I can offer you a much easier solution if you’re interested. One tailored to the Linux environment.

Let me know if interested.

1 Like

Sure - I’m all ears :slight_smile:

Target/New System:

  1. Media mount points exactly the same prior to PMS install
  2. Install same binary package as current.
  3. Run once.
  4. Stop.
  5. Delete “Library” directory in /var/lib/plexmediaserver

Source / existing system

  1. Stop PMS
  2. cd /var/lib/plexmediaserver
  3. sudo tar cf /home/PlexLibrary.tar ./Library (assuming /home is viable to use)
  4. transport that tar to the target’s /var/lib/plexmediaserver

Target/ new system

  1. cd /var/lib/plexmediaserver
  2. sudo tar xf PlexLibrary.tar
  3. sudo chown -R plex:plex ./Library

== Swoopy way ==

sudo sh
mount new-host:/var/lib/plexmediaserver /mnt
cd /var/lib/plexmediaserver
tar cf - ./Library | (cd /mnt ; tar xf - )

Now sign into target/new and complete the chown -R step in that UID/GID space.

The old host is now fully mirrored. Delete its Preferences.xml prior to using again.

1 Like

Excellent, thanks - will do this weekend :slight_smile:

Just to close the loop - this worked flawlessly :slight_smile: Thanks again :slight_smile:

Clear and swoopy enough? :smiling_imp:

1 Like

I don’t even use Linux but I know I can follow such a simple set of clear instructions.

Funny, sometime I wish I did use Linux so I can come here and ask you how to fix.

If you used Linux, and used it at this level, this way (how I demonstrate), you’d already know how to fix it innately! :rofl:

1 Like

Clear and swoopy enough?

Swoopy is a new one on me, but it’s certainly clear enough :slight_smile:

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.