Live TV drops out, spurious "weak signal" message

Server Version#: 1.19.4.2902
Player Version#: 6.6.0.6414-653e9c44c (Roku)

Accessing live TV remotely constantly gives us grief. If you’re lucky, you’ll get solid playback for a good while, but eventually, without fail, you will hit two or three quick stretches of buffering, followed by a message about how the channel can’t be played and that this ‘may be due to a weak signal’. This seems like nonsense–there is nothing wrong with the signal; the cable is coming from an ONT and then split into two HDHomeRun PRIMEs. We have symmetrical Gigabit from FiOS on the server-side and 200Mbps cable on the client-side, so bandwidth should not an issue–indeed I’ve observed this happen even if I decide to transcode to very low-bitrate SD.

Note that I have no problem watching recordings or mkvs or streaming any other type of media–only live TV, at any bitrate and in any format (so even without transcoding from MPEG2/4). The server is running an i7-4790S with 16GiB of RAM, so system resources shouldn’t be an issue either, and indeed don’t seem to be–there is almost never more than one or two people streaming at once, and this often happens when just one person streams. While I sometimes have very high RAM usage (75-80%), shutting down the services that cause this did not solve the problem.

The OS is Win7 x64 (this used to be a WMC box, which was the main reason I stuck with 7–I may change this in the future). Reviewing my network topology, I found that IGMP Snooping was turned off on one of the switches between the server and my tuners, so I turned this on, but it doesn’t seem to have made a difference.

The clients are Roku Premiere+s and Ultras and a modern (got near end of last year) Fire TV Stick–all of them have this issue.

One more thing–I am reverse proxying behind Cloudflare and nginx, without which the peering is so bad that Plex is essentially unusable, not just for live TV. But why this would affect Live TV specifically in a negative way escapes me.

A curious thing I noticed is that even after getting the error message, the server reports upload usage around the same as it does if something is in fact playing, even though nothing is…

Otherwise, logs are attached here:
Plex Media Server Logs_2020-06-04_22-39-52.zip (1.9 MB)

I didn’t have debug logging on for the last log. Turned it on, promptly experienced the same issue (around 1:27am tonight, literally a minute ago).

Plex Media Server Logs_2020-06-05_01-27-25.zip (1.8 MB)

i have experienced this when using an adblocker and vpn …
at moments when ads would or could play… it will buffer past the trigger points…

the ad block could be software level such as on the browser or system level such as a modified host file or VPN
could be firewall level or even ISP level

i get the message when i use my work guest wifi or even when i used the work pc with eternet… any watchdog isp level could mess with those ads and you would get that experience

You have many server issues of which Jumbo frames and tcp port exhaustion are most likely. You can turn off Jumbo frames, the port exhaustion is being worked on.

Live TV is like a torture test for your server and network, the stream has to travel from the tuner to the server and back out to the client. So many things can delay the packet that then shows up as weak signal.

Thanks for chiming in guys.

the ad block could be software level such as on the browser or system level such as a modified host file or VPN

There is no adblock software running on either the server or the clients, and I am not using a VPN.

Could be firewall level or even ISP level

Not sure what in my firewall could be causing this, and I sincerely hope it isn’t something any of the ISPs involved (or like, Cloudflare) are responsible for, as then there’d literally be no hope. I also don’t understand why something like that would specifically affect only live TV and no other kind of video streaming. I have no idea how they would be able to discriminate on that basis, especially given that everything is encrypted.

You have many server issues of which Jumbo frames and tcp port exhaustion is most likely.

I have never enabled Jumbo frames anywhere, not on my main router/gateway, not on any of the switches in the path, and not on the server’s NIC. Are the logs attesting that Jumbo frames are nonetheless being used? I’m not sure what TCP exhaustion is–is this something that can be optimized for via some router or switch setting?

Live TV is like a torture test for your server and network etc

Would physically moving the server closer to the tuners and router/gateway be helpful? This would eliminate two different switches in the path. There is also other applications running on the same server–could they be relevant? How badly could they be affecting this?

Logs have many references to :

Line 76: Jun 02, 2020 17:05:06.009 [4268] ERROR - MyPlex: mapping failed due to the network being configured for jumbo frames

You would have to find the source of this, disable it and grab new logs before finding any additional problems.

There is also other applications running on the same server–could they be relevant? How badly could they be affecting this?

Could be a trademark of the Haswell processor, I run Debian Linux without a GUI, even though the GUI only adds about 10% load it is enough to cause issues with Live TV. There are also Slow DB messages which point to I/O problems with the database.

Jun 02, 2020 18:24:44.670 [7364] WARN - SLOW QUERY: It took 218.401398 ms to retrieve 50 items.

You would have to find the source of this, disable it and grab new logs before finding any additional problems.

OK, so I double-checked. The gateway (Asus RT-AC68U/freshtomato 2020-02) is good:

gw0_no-jumbo

Switch with my tuners are connected to (D-Link DGS-1100-08) is good:

sw0_no-jumbo

First 10G switch (Mikrotik CRS305/RouterOS 6.47) is good:

crs305a_no-jumbo

Second 10G switch (Mikrotik CSS326/SwOS 2.11) - in basement, next to my server and a bunch of other things…is inconclusive. It has no jumbo frames or MTU setting. However I found this tidbit:

It seems strange to me that this would result in multiple regular frames being aggregated into one jumbo frame because of this, since such behavior would wreak havoc with any number of devices. Nobody else seems to have attested to such behavior with this switch, and it didn’t come up in the one review I’ve found (I did just ask in the comments there though).

Otherwise, while I was looking through the CSS326’s settings, I found that flow control was on, and there were some Rx and Tx pauses on the main upstream 10G port (512 Rx pauses, 6114 Tx pauses, with an uptime of 100 days), though not on my server’s port. Flow control doesn’t appear to be on on any other switch, so I’m wondering if shutting it off at least on that port could be helpful.

Finally, the server’s NIC (Intel I217-V) is good:

I217-V_no-jumbo

For good measure, I double-checked my lubuntu VirtualBox VM on the same machine, which is what I use for email and web serving and proxying, and its MTU settings were nominal at 1500:

$ ifconfig| grep -i mtu
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536

Could be a trademark of the Haswell processor

…I mean, this would be terrible if true. This is an i7-4790S, which is going to be better binned than other Haswell chips, and has a good boost-clock to boot (~3.5GHz). Ofc it’s essentially obsolete now, especially with Zen 3 coming out soon, but I’d’ve thought that getting decent live TV needn’t require, you know. A ridiculous HEDT chip or anything :confused:

There are also Slow DB messages which point to I/O problems with the database.

Hmm. This is running on a SATA III SSD (Intel 545s 512GB). As far as I can tell, the Plex db is under User/AppData/Local/Plex Media Server/Metadata, not on any of the NAS disks (HGST NAS and WD Reds fwiw, though they’d still be orders of magnitude slower than an SSD).

Anyway, thanks for the feedback, and I welcome any further comments you or anybody else might have. I’ll see if having shut off flow control on that root port makes any difference. If not, I’ll be at this location tomorrow, and will be able to try cutting the CSS326 temporarily out of the equation to see if that helps.

Check the database location to see that you are getting backups every 3 days.

"%LOCALAPPDATA%\Plex Media Server\Plug-in Support\Databases"

For good measure do a OPTIMIZE DATABASE and CLEAN BUNDLES.

Other people have 10G gear so I would think if that was the issue more would have it.

I only meant that the Haswell’s have slow memory.

(Watching later last night, it quit entirely just once, but instead of the message about a weak signal, there was one about the recording failing…even though I wasn’t recording anything :confused: )
Plex Media Server Logs_2020-06-08_23-11-28.zip (6.9 MB)

Hi again tiebierius, thanks again for chiming in.

This seems to be happening:

Will do!

Right, I didn’t mean to suggest that 10G was generally a problem, more maybe that this particular switch not having configurable jumbo/maximum frame size might? be causing this jumbo frame message to show up. Otherwise it seems to be showing up for no real reason, because everything else in the path is configured for nominal 1500-byte size frames. (Oh, there’s a second VLAN on here, but the main VLAN, which is where the server and tuner live, is untagged and their ports are not members of the second VLAN.) Unfortunately I wasn’t able to go and switch it out/bypass it today, but will try next week.

Interesting! I was under the impression that RAM speed was not terribly relevant for anything but very high performance, niche applications. Looks like I’ve been misinformed…

FWIW, changing the location of the server so that it went straight into the same switch that the tuners were in (eliminating the Mikrotiks entirely from the equation) did not fix this issue. Will try to reproduce and post a new log tonight or tomorrow.

Following your post here, I checked your last set of logs and you have the symptoms of tcp port exhaustion issue

Jun 08, 2020 23:09:19.099 [16512] ERROR - [Transcoder] [tcp @ 02efde80] Connection to tcp://127.0.0.1:32400 failed: Error number -138 occurred

Jun 08, 2020 23:09:19.144 [12108] ERROR - [Transcoder] [tcp @ 02d5b780] Connection to tcp://127.0.0.1:32400 failed: Error number -138 occurred

If you open your windows event logs with “Event Viewer” and filter the “System” event log for event Id 4227, you can see how often it is happening

If you filter on 4227,6005 that will show reboots and tcp failures due to running out of dynamic ports.

Safest option after getting a 4227 is to reboot windows.

Once you run out of tcp ports then things start to fail

Plex Media Server scanning of large movies libraries (greater than 4000) or analyzing TV Shows library with thousands of episodes have been seen to result in this. It is a curl bug that came in when we upgraded curl in Plex Media Server. A fix has already been produced and we are waiting for the new version of curl to be available

Thanks for the detailed explanation! I’ll be sure to check the event-viewer for these issues, and I guess will start rebooting when this starts happening. In the meantime, it sounds like other mitigations might be ‘don’t have the entire library scanned again when changes are detected’ (it is rescanning everything when it detects a change right now and ‘if you periodically scan your giant library, do it less often’ (right now I have it set up to scan every 6 hours). Thanks again!
ah, one more question–there is a good chance I make a new server/attempt to migrate this one to a different OS soon, either bsd (FreeNAS jail/plugin) or Linux (probably Ubuntu-flavor in a VM on the FreeNAS box) -based. Do these also suffer from this bug in curl, or is this a specific issue with the Windows build/how curl interacts with the Windows TCP stack?

Well, I’ve tried the mitigations I thought of above, and rebooted for good measure. Unfortunately the weak signal error happened again anyway. No evidence in the Windows logs since the reboot of error 4227 (6005 yes, but this seems to signal that only that the event log service started). No evidence of any complaints about tcp exhaustion in the Plex logs, nor about jumbo frames either. (I am looking in Plex Media.Server.log–presumably this is where I should be looking, correct?)

But, plenty of this (was watching an h264 channel at the time):

Jun 20, 2020 22:35:47.485 [6296] ERROR - [Transcoder] [h264 @ 020c07c0] SPS unavailable in decode_picture_timing
Jun 20, 2020 22:35:47.485 [6296] ERROR - [Transcoder] [h264 @ 020c07c0] non-existing PPS 2 referenced
Jun 20, 2020 22:35:47.486 [6296] ERROR - [Transcoder] [h264 @ 020c07c0] SPS unavailable in decode_picture_timing
Jun 20, 2020 22:35:47.486 [6296] ERROR - [Transcoder] [h264 @ 020c07c0] non-existing PPS 2 referenced
Jun 20, 2020 22:35:47.486 [6296] ERROR - [Transcoder] [h264 @ 020c07c0] decode_slice_header error
Jun 20, 2020 22:35:47.486 [6296] ERROR - [Transcoder] [h264 @ 020c07c0] no frame!

and later (took a separate log for this, after watching an mpeg2 channel):

Jun 20, 2020 23:12:22.253 [4908] ERROR - [Transcoder] [mpeg2video @ 021507c0] Invalid frame dimensions 0x0.
Jun 20, 2020 23:12:22.276 [4908] ERROR - [Transcoder] [mpeg2video @ 021507c0] Invalid frame dimensions 0x0.
Jun 20, 2020 23:12:22.298 [7480] ERROR - [Transcoder] [mpeg2video @ 021507c0] Invalid frame dimensions 0x0.
Jun 20, 2020 23:12:22.322 [11052] ERROR - [Transcoder] [mpeg2video @ 021507c0] Invalid frame dimensions 0x0.
Jun 20, 2020 23:12:22.360 [11052] ERROR - [Transcoder] [mpeg2video @ 021507c0] Invalid frame dimensions 0x0.
Jun 20, 2020 23:12:22.396 [3932] ERROR - [Transcoder] [mpeg2video @ 021507c0] Invalid frame dimensions 0x0.
Jun 20, 2020 23:12:22.425 [3932] DEBUG - Transcoder segment range: 0 - 467 (467)
Jun 20, 2020 23:12:22.427 [3932] DEBUG - Transcoder segment range: 0 - 468 (467)
Jun 20, 2020 23:12:22.453 [3932] ERROR - [Transcoder] [mpeg2video @ 021507c0] Invalid frame dimensions 0x0.
Jun 20, 2020 23:12:22.477 [3932] DEBUG - Transcoder segment range: 163 - 466 (466)
Jun 20, 2020 23:12:22.477 [3932] DEBUG - Pruning segments older than 164, view offset is 465.015022, min was 163, max is 466, last returned is 465.015022
Jun 20, 2020 23:12:22.477 [3932] DEBUG - Pruning segment 163
Jun 20, 2020 23:12:22.487 [5560] DEBUG - Transcoder segment range: 164 - 467 (466)
Jun 20, 2020 23:12:22.544 [5560] ERROR - [Transcoder] [mpeg2video @ 021507c0] Invalid frame dimensions 0x0.

Though I’m not sure if these correlate with the times that I got the buffering followed by weak-signal message.

There’s also some strange permissions issue, but it comes only after the session was ‘whacked’.

Jun 20, 2020 23:28:45.629 [10472] ERROR - Transcoder: Failed to delete session directory (boost::filesystem::remove: Access is denied: "C:\transcode\Transcode\Sessions\plex-transcode-0c08e136-b8e1-4590-8228-78c66c89e36d\media-00541.ts")

…happened again while I was writing this post, at 12:06am my time (-4 GMT). No errors around then :frowning:

Plex Media Server Logs_2020-06-21_00-07-01.zip (7.8 MB)
Plex Media Server Logs_2020-06-20_22-45-00.zip (7.3 MB)
Plex Media Server Logs_2020-06-20_23-34-49.zip (7.6 MB)

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.