Transcoder errors on FreeBSD

Server Version#: 1.21.1.3830
Player Version#: Multiple

Starting around plex 1.21.0 I’ve started getting transcoder errors like: Conversion failed. The transcoder exited due to an error.

Corresponding log entries look like this:

Jan 09, 2021 13:11:07.434 [0x80b447e00] DEBUG - Completed: [127.0.0.1:36022] 200 GET /status/sessions (7 live) GZIP 0ms 4397 bytes (pipelined: 5)
Jan 09, 2021 13:11:07.435 [0x80b447e00] DEBUG - Completed: [127.0.0.1:42953] 200 GET /status/sessions (7 live) GZIP 0ms 4397 bytes (pipelined: 1)
Jan 09, 2021 13:11:09.880 [0x81604fa00] DEBUG - [CompanionProxy] player agsrorcm7n5guloupxta6jly was last refreshed 30 seconds ago
Jan 09, 2021 13:11:12.657 [0x8110de000] ERROR - [Transcoder] [tcp @ 0x803904f00] Connection to tcp://127.0.0.1:32400 failed: Operation timed out
Jan 09, 2021 13:11:12.658 [0x8110de000] ERROR - [Transcoder] [stream_segment,ssegment @ 0x803897e00] Failed to open segment list 'http://127.0.0.1:32400/video/:/transcode/session/nw5g7lrhv9bexddvnvsp6ujk/2bcaff91-055a-43b6-a766-d7b2535baf0a/seglist?X-Plex-Http-Pipeline=infinite'
Jan 09, 2021 13:11:12.659 [0x8110de000] ERROR - [Transcoder] av_interleaved_write_frame(): Operation timed out
Jan 09, 2021 13:11:12.760 [0x816050400] ERROR - [Transcoder] Error writing trailer of media-%05d.ts: Invalid argument
Jan 09, 2021 13:11:12.768 [0x80a53f200] DEBUG - Jobs: '/usr/local/share/plexmediaserver/Plex Transcoder' exit code for process 3008 is 1 (failure)
Jan 09, 2021 13:11:12.768 [0x81604ff00] DEBUG - Killing job.
Jan 09, 2021 13:11:12.768 [0x81604ff00] DEBUG - Signalling job ID 3008 with 9

Some observations:

  1. This happens almost on every media source (those are typically 1080p)
  2. I use 2 clients: PS4 and a Web based one on the same host the server is running (that is, http://localhost:32400). This happens waaaay more often on PS4 than on localhost, but it happens on localhost sometimes too.
  3. Dropping quality to something like 2-3 Mpbs 720p makes things much better, but doesn’t completely fix that.

PS4 is connected to home network via Wi-Fi, I think there are no issues here because I’d notice that in other app and clients, though I checked the router and don’t see any anomalies there.

I’ve also tried to point Transcoding temp directory to a partition on other HDD and it didn’t change anything. It’s probably not related though, because I guess from the logs it times out on some HTTP API operations?

Also, I wouldn’t say the system is very busy when this happens. E.g. the usual load during playback is:

last pid:  3369;  load averages:  1,30,  1,17,  0,99  up 0+07:30:16    13:52:33
87 processes:  1 running, 86 sleeping
CPU:  9,4% user,  0,2% nice,  1,5% system,  0,5% interrupt, 88,4% idle
Mem: 1501M Active, 13G Inact, 567M Laundry, 6971M Wired, 1572M Buf, 9671M Free
ARC: 4016M Total, 2171M MFU, 1827M MRU, 64K Anon, 15M Header, 2926K Other
     3861M Compressed, 3882M Uncompressed, 1,01:1 Ratio

  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
 1579 novel        55  52    0  4222M  1669M select   0  17:37 120,36% firefox
 1535 novel        82  24    0  4123M  1422M select   5  67:33  34,28% firefox
 3361 plex          7  24    0   181M   114M nanslp   1   0:04  11,08% Plex Transcoder

Any thoughts how to debug this?

PS Other similar threads mention it could be because of library scanning. I have neither automatic nor periodic library scanning enabled. Don’t have that many tcp connections either:

$ netstat -4 -n|wc -l            
     774
$ netstat -4 -n|grep TIME_W|wc -l
      18
$ 

What version of FreeBSD?

Do you have any connections in LAST_ACK state?

netstat -n4 | grep LAST_ACK

I’m trying to track down an issue that started happening with FreeBSD (and TrueNAS) maybe related to some networking changes that landed a month or two ago.

Do you see any change if you set sysctl -w net.inet.tcp.nolocaltimewait=1?

What version of FreeBSD?

FreeBSD host 13.0-CURRENT FreeBSD 13.0-CURRENT #25 r368626: Mon Dec 14 05:08:25 +04 2020 root@host:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64

Do you have any connections in LAST_ACK state?

$ netstat -4n|grep LAST_ACK|wc -l
     415
$ netstat -4n|grep LAST_ACK|grep "127.0.0.1.32400"|wc -l
     396

Do you see any change if you set sysctl -w net.inet.tcp.nolocaltimewait=1 ?

Things seem better this way. Once in a while it gets stuck ‘Buffering’ for 15-30 seconds, but doesn’t crash.

Update: still fails, maybe just less often.

Connections being stuck in LAST_ACK is weird, especially local connections. :-/ It doesn’t look like a Plex-only issue; other programs get connections stuck in the same state.

I’m suspicious about some changes that landed on Nov 9.

[base] Revision 367530

And MFC’d to 12- on Nov 30.

https://svnweb.freebsd.org/base/stable/12/sys/netinet/tcp_stacks/rack.c?view=log

Reverting to a prior build addresses the issue for me.
But I haven’t done the work to confirm exactly what’s going on.

Do you have a kernel/boot environment from before the Nov 9 changes you could test? Or could you build one?

I’ve checked out the latest sources from git, reverted the commit you mention and rebuild the kernel. After a brief testing I had no crashes, and no ‘Buffering’ interrupts either.

So either this commit is indeed problematic, or something was fixed compared to the previous version from Dec 14 I was running.

It’s not hard to check though, just need to undo the revert and test again. I’ll probably do that later.

1 Like

Finally managed to complete testing. So I built exactly same kernel (same sources, same configuration, same build box) without a revert, I now I can see the same issues again. So it looks this commit is indeed causes this behavior.

Volts, do you happen to have an upstream bug report for this issue, or should I create one?

1 Like

I haven’t opened an upstream bug report. I was being lazy. :slight_smile: LMK if you do, I’d like to follow along.

I first noticed this on TrueNAS nightlies, and I haven’t gotten around around to reproducing it on FreeBSD directly. Thanks for doing so.

My thought is still that it’s unlikely to be a Plex issue. I saw other apps’ connections in the same unusual state.

There’s been more churn in that code recently, to work around missing timestamps. It makes me wonder if FreeBSD itself isn’t generating timestamps in some to->localhost scenarios.

[base] Revision 369132

[base] Revision 369133

What’s odd is that this seems like high-profile code.

I’ve asked the question on the PR for the original patch and got the following response: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=250499#c6 (which basically mentions the same commits you found).
I’ve upgraded my -CURRENT setup, hopefully will be able to test Plex soon.

1 Like

I’m curious if the standalone patch resolves the issue.

If the new sysctl needs to be enabled, does that imply that FreeBSD is sometimes the “broken stack”?

I’ll get around to rebuilding my FreeBSD world eventually. I appreciate that you’re testing it. :slight_smile:

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.