Permanent Transcoder error since update

Server Version#: 1.19.4.2935
Player Version#: 4.35.1

Hey guys,

I’m having a huge amount of transcoder failed errors and conversion failed errors.

I looked into the logs and found this

Jun 13, 2020 09:29:37.056 [3468] ERROR - [Transcoder] [tcp @ 03b45ec0] Connection to tcp://127.0.0.1:32400 failed: Error number -138 occurred
Jun 13, 2020 09:29:37.056 [3468] ERROR - [Transcoder] [dash @ 0300b280] Unable to open http://127.0.0.1:32400/video/:/transcode/session/fgv2sm42dgfqbar5otr8vsp4/cfa18cf2-5ef5-4670-9e54-93a9f95c4ed0/manifest?X-Plex-Http-Pipeline=infinite for writing: Error number -138 occurred
Jun 13, 2020 09:29:37.065 [21728] DEBUG - Jobs: 'C:\Program Files (x86)\Plex\Plex Media Server\Plex Transcoder.exe' exit code for process 3568 is 1 (failure)
Jun 13, 2020 09:29:37.065 [5804] DEBUG - Killing job.
Jun 13, 2020 09:29:37.065 [5804] DEBUG - Job was already killed, not killing again.
Jun 13, 2020 09:29:37.065 [5804] DEBUG - Stopping transcode session fgv2sm42dgfqbar5otr8vsp4
Jun 13, 2020 09:29:37.065 [5364] DEBUG - Streaming Resource: Terminated session 0xb60eea8:zpcf8eu3khkx1gzey6lbp9go with reason Conversion failed. The transcoder exited due to an error.

After investigating for a few hours, I stumbled upon a few similar topics where the culprit was the antivirus software. I tried everything. Turning it off entirely, set up exclusions for the Plex.exe’s and so on. I don’t know what has changed but it’s impossible to stream for me and my friends.

I’ve had these errors as well:

ERROR - [Transcoder] [dash @ 028ef3c0] Unable to open http://127.0.0.1:32400/video/:/transcode/session/xxxxxxxxxxxxxxxxx/xxxxxxxxxxxxxxxxx/manifest for writing
ERROR - [Transcoder] av_interleaved_write_frame(): Unknown error

And according to this guide (Windows: Transcoder failures during playback, live TV, or DVR recording | Plex Support) setting up exclusions can resolve the issue. I didn’t have Malwarebytes installed when the errors occurred, but now I even bought it in hopes it would resolve my issues with the transcoding when following this guide - to no avail.

Any help would be highly appreciated as my server is not more than a movie and TV show database at the moment :crying_cat_face:

Edit: Everything was perfectly fine until yesterday

Could it be related to a maintenance task? I’m trying to have VPT for all of my TV shows and episodes and I always started an “Analyze” task for the entire TV show library (~50k episodes)

So, after the task with analyzing the entire TV Show library has finished, there are no more transcocder errors (!) neither local nor remotely. Maybe there’s an issue on my end, but if it’s something on the PMS side, I will gladly provide logs to fix the issue or help in any way

This is the windows tcp dynamic port pool exhaustion issue - it manifests itself with tcp error event id 4227 in the windows system event log and a transcoder tcp error as here

Jun 13, 2020 09:29:37.056 [3468] ERROR - [Transcoder] [tcp @ 03b45ec0] Connection to tcp://127.0.0.1:32400 failed: Error number -138 occurred

There would also be curl errors in the Plex Media Scanner logs

The issue came about after an update to curl where curl started to use 2 extra tcp ports for every loopback request Plex Media Scanner makes - and with 2 requests for every movie or tv show season we would end up using 6 ports instead of 2 which would not be available again for reuse until TCP_WAIT_TIME which is a few minutes

There is an open issue on this with the server team and we have also an issue raised for curl

Hey there,

thank you very much for this information! I was almost going nuts with it the past days. Is there anything we can do in the meantime that might prevent transcoder errors? E.G. not scanning/analyzing/optimizing while many people stream?

Thank you very much again <3

how many movies do you have ? If more than 4000 you could split the library - only risk there would be if periodic scanning is enabled and the split libraries are scanned in sequence one after the other

I don’t have more than 4.000 movies. I have almost 1.500, far from 4k. Does it matter how many episodes I have? These are a lot more (~52k)

Let me have a look at debug server logs zip after a scan of a library that brings on the problem.

I had one example earlier of a music library still using the old scanner and that caused the error - old scanner / agent would show when editing the library in Plex Web and checking Advanced and then scanner and agent

I have not yet come across a TV library cause this but I think a high number of TV shows seasons could be a possibility

So relaunch Plex Media Server, wait 5 minutes and then scan a library at a time and check the windows event log (system log) for event id 4227 getting logged

When you see the 4227 download the logs zip

Hey again.

I wouldn’t like to restart my PMS just now, as I’ve put some effort in manual file analysis (I’ve manually analyzed ~1000 episodes which haven’t finished yet and I would like to avoid going through all of them again [as analyzing the entire TV show library seems to break the transcoder]), but I could already see a tremendous amount of these errors in the Windows system log (4227) over the past week(s).

Is there anything I can provide you with right now without having to restart the server?

Other than that I will gladly provide you with anything you need once the VPT generation has finished for the manually selected episodes

What is important is doing a test after knowing that we have the initial complete set of free ports - a reboot would be one way to know that we are starting with the full set of ports in the pool - circa 16000 tcp ports - and then i can see what scanning needs to happen to get the first 4227 after the reboot.

There are only 6 scanner log files that cycle round after filling up or a new scan run. So would need to reproduce the problem with no more than 6 library scans done. If you have many libraries and we scan all of them then some logs will get overwritten

Hey there again and good morning!

So there was indeed the error you implied (4227 tcpip), but not after scanning the TV show library, but after analyzing it. The scans of my 3 libraries went just fine, but after a short time of the analysis and a local stream running, I got the addressed transcoder error, which I didn’t have since I stopped to analyze all episodes on a TV show level, so there’s clearly a correlation between this.

I hope I did everything correctly as my 2 boys just woke up and well, things went pleasantly chaotic. Let me know if it helps you anything or whether you’d need more/more correct information and I’ll be happy to assist you assist us :smiley:

Kind regards!

Plex Media Server Logs_2020-06-14_07-15-15.zip (11.4 MB)

P.S.: I hope the events are in the logs. There was some time between downloading the logs and the analysis event due to le boys.

EDIT: Oh and the system log tcpip event (4227) was printed at exactly 06:56:29

F***! I just realized I didn’t have debug logging activated, nooooo! I’ll come back with a fresh debug log tonight, I’m sorry :sleepy:

Sorry for the late reply @sa2000, here are the logs, I hope you’ll find everything you wanted to look for.

I started the analysis of my TV show library at 16:10, got the tcpip error in the Windows System error logs at 16:12. Let me know if there’s anything else I can come up with and have a great day!

Plex Media Server Logs_2020-06-15_16-17-34.zip (9.6 MB)

Thanks - the scanner analysis logs have wrapped but it does not matter - i can see what is happening

It is an issue. With the curl change we are ending up using 9 tcp ports per episode instead of 3.

The logs show 20 files a second are being analyzed - so using up at least 180 ports a second - instead of 60.

I will see if there is any movement on the curl issue.

Without the curl issue and with the current speed it will be marginal as we would be using up about 14000 ports in 4 minutes - by this time some previously used ports should get returned to the pool (TIME_WAIT)

Would like to know the exact time the 4227 event was first flagged - at 16:12 so i can count the number of requests to that time.

Hey again.

Thank you for sharing your knowledge. The exact time stamp of the 4227 error was 16:12:29.

I’m afraid I don’t have the insights and understanding like you have, but I understand that there’s an issue with curl, so I hope there might be a fix soon.

Another thing that might be related to this is that I experience the analysis sometimes “stopping”, let me explain. As analyzing the entire TV show library doesn’t work atm for me (due to the error), I’m manually analyzing 300-1000 episodes on the episode level for days now. Sometimes, it would just stop analyzing these, even though I started the analysis task for these episodes hours ago.

Manually triggering analysis another time, would not start the VPT generation. These items are then just skipped (or broken?) and would only start generating VPT after restarting the PMS.

My question here is now: Is that related to these curl errors?

Thank you very much for helping in this regard :blush: Or let’s say helping raising awareness of the issue

It is possible. If you see 4227 events around the time then it could be.

Recovery from 4227s is probably best by a restart or even a reboot of windows. If there are no bugs in windows in this area, then leaving the system idle for 5 minutes would free up the pool - but it would be difficult to force apps to get to an idle state - so reboot is the easiest

if a restart allows you to re do the failed analysis and indexing then that would be good enough whilst we have this issue. If, however, items remain in permanent broken state as far thumbnailing and indexing is concerned then that would be worth pursuing

By 16:12:30 we only indexed circa 1500 episodes and there were circa 8000 requests in the 155 seconds elapsed period. Without the curl issue that would only be 8000 ports - but with the curl problem it needed about 24,000 ports.

You could make registry changes to reduce TIME_WAIT to say 60 seconds instead of windows default - but if making such changes, it is important to record and know how to undo the changes afterwards and revert to the windows system defaults

Oracle published an article on tweaking this for heavy workload - you could experiment with and be careful not to make big changes

See
https://docs.oracle.com/cd/E26180_01/Search.94/ATGSearchAdmin/html/s1207adjustingtcpsettingsforheavyload01.html

(This article was written before Microsoft changed the size of the dynamic pool - it should now be about 16000 and not 4000 as the article states)

Last change by Microsoft

Run these commands in a windows command line, They should show a range from 49152 for 16384 ports

netsh int ipv4 show dynamicport tcp
netsh int ipv6 show dynamicport tcp

Microsoft document

Wow thank you sa for a great workaround until there’s a fix!

I will try increasing the TIME_WAIT via the registry and also try this command ( netsh int ipv4 set dynamicport tcp start=1025 num=64511) to increase the portrange, is that correct?

Thank you anyways, I will figure out how to do it and notate everything I did to make sure I can easily revert it :slight_smile: Thanks a ton!

EDIT: So I already adjusted the Registry and added the two DWORDS (TIME_WAIT and Port Range set to 32768) and ran the command “netsh int ipv4 set dynamicport tcp start=1025 num=64511”. That’s all I need to do, am I right?

As I have plenty of users streaming right now I will do a reboot later the night (Because of the Registry) and let you know whether this resolved the issue, but I’m so darn curious that I’d love to check out right now! But I won’t turn off the streams for 8 users, so… patience :smiley:

You mean reduce it. The default is 120 seconds. We need to reduce it to 60 seconds.

If the key does not exist it would be at the default

I was not suggesting changing this. One change at a time and initially just the time wait.

Increasing the dynamic pool size would result in lower start for the range and may clash with reserved port numbers for applications

But before making a change to the dynamic pool, it is important that you noted down what it was set to before so you can reverse your changes

These two commands would have given you the range

netsh int ipv4 show dynamicport tcp
netsh int ipv6 show dynamicport tcp

Hey sa!

I see. I already wanted to ask why I wouldn’t be able to let these settings as I adjusted them and what the disadvanteges or possible side effects could be.

Can’t I just leave the settings now as they are? Regardless whether PMS will have a fix or not? The analysis is now running perfectly fine without any transcoder errors.

The default range was exactly what you suggested (49152 to 16384 ports) as I checked with your suggested netsh cmd. And you’re absolutely right - I reduced the wait time of course, and not increasing it.

And I just noticed, that the analysis task itself is completing, but still not generating VPT for the items that don’t have VPT yet :frowning_face:

EDIT: Nevermind me, rebooting the computer (which I should’ve done way earlier anyways) (temporarily ?) fixed it and VPT generation on a library level analysis started just fine and is generating as well!

Another EDIT: No, it doesn’t, unfortunately. It goes through episodes that don’t have VPT but doesn’t start generating them in the next step. Noooo :unamused: