I’ve had this memory problem for weeks/months and have been continuously updating the version of plex server by following this thread. Around 3am the server memory starts filling up until it consumes all memory. I used to have 32GB but have now added another 32, yet by 3:30 or so PMS uses up essentially all 64GB.
Currently running Version 1.21.0.3616 on ubuntu 20.04 but have tried many previous beta and release versions to no avail.
I don’t have DLNA enabled.
I don’t have hw transcoding enabled.
I’m not using Docker.
A couple of days ago I enabled the LogMemoryUse and have the log files here:
The effect of this problem is that my server becomes unavailable from 3am until well past 5am. OOM occurs several times during that window. Unfortunately this server is used for many other 24x7 operations and Plex is having a major impact on uptime.
I may have a different OOM problem than what’s being discussed. In the log files I submitted, file ‘Plex Media Server.2.log’ clearly shows some sort of problem:
Edit 202011260545: I just noticed that I had previously disabled full debugging and verbose logging. I re-enabled those yesterday. Here’s an updated log including this morning’s OOM.
This is very interesting and very different from other gradual memory use increase examples seen before. There are big jumps. I have referred this to our development team. Would it be possible to get a zipped download of your Plex Media Server database sent to me privately. Might be too big and would need to be a link to an uploaded zip
I’d just like to say that my OOM is clearly different than the OP. I don’t want to detract from his issue. Perhaps this should be a separate bug report?
Development team have looked at this and could not see why it is happening and it has not been possible to reproduce.
You are right in that this is different from the other reports and i will move the posts to a separate forum topic.
I am thinking may be there are repeated large network packets coming in and there is an exception path where memory is not released - but i would have expected some additional log lines
If this is still happening, could you in addition to having the memory use logging to also do network packet capture and have that available - sent zipped by private message to see if the memory use increases coincide with the received network packets
When providing new diagnostics, please run on the current latest available version of Plex Media Server (beta if one is available)
I couldn’t continue with the situation so I deleted my database and cache etc and started fresh. it hasn’t happened since.
One thing noteworthy is that there were several video files that had future create dates (2097, Feb 21 2021, etc) of episodes and movies that I only noticed after the rebuild because they would appear in the “Recently Added” at the front of the queue and didn’t disappear as new stuff was added. I touched these files then did the Plex Dance to re-add them.
I can’t say for sure that it was these files causing the problem because I fixed them shortly after doing the reinstall, but currently I’m not experiencing the OOM issues.
In the other hand, it may take some time for the problem to reappear if it’s related to how fragmented the database is or cache corruption or some other issue that creeps in. I’ll keep an eye on it and report back if and when I see the problem re-emerge.
Here’s my logs, I’ve also included a log with the information of what processes were running every second during an incident. (systemctl-watch.log). Please let me know if there’s any other information that would be helpful. My next step is to script up something to see if I can watch all the files that are opened by all plex processes and see if there’s a common thing between every crash. I can also create a database copy if you would like too.
PM sent, there’s several logs of it before, during and after a crash there, when it happens it’s frequently not up for 30 minutes to get logs before it goes Out Of Memory.
It seems to be behaving this morning. I’m going to try to get it to crash differently instead of OOM killer by setting up some resource limits and see if maybe I can get a core dump out of it so there’s a snapshot of what’s going on while it’s going nuts. It seems to happen more often late at night, but not necessarily during the maintance window, so I’ll keep watching.
Thanks I’ll keep monitoring and seeing if i can figure anything out on my end. It’s been up for hours now today after I made a change to the systemd service file, not sure if that did something or if it just hasn’t triggered yet and won’t until the maintance window tonight.
The change was me adding the Limit* lines to my override.conf which now looks like this. Previously I just had the MemoryMax/High/Limit which use cgroups to make the kernel kill the process sooner (before it has a chance to try to take up all 128GB of ram on the machine), so that it will restart quicker and not cause other interruptions. The new lines LimitRSS/DATA/CORE set the older unix style resource limits (seen in a shell as ulimit -m and friends, see man ulimit) which I was hoping would cause it to cause a different kind of crash when allocation failed as it approaches the 10GB limit, and maybe let me capture a core dump during that event. It could be that it’s some third party library that’s bundled with it that’s doing something annoying and only leaks when there’s no limit set. (This wouldn’t be the first time I’ve seen that happen, including on my own software). If I can’t reproduce it for a few days with the Limit*= lines, I’ll take them off and see if it recurs, if it doesn’t then I’ve got a way to work around the problem that others can use while we try to figure out the root cause.
It’s been several days now and it’s been completely stable without issues during multiple maintanance windows and with it playing video for several hours in a row. I’ll disable the resource limits tonight and see if the issue recurs. If it does then this looks like a viable work around at least and probably somewhere to look at for finding the underlying bug.
It took longer for some reason to occur but it finally failed again tonight with this issue. It would certainly seem to be something in the background processes causing it. I haven’t been able to predict it well enough to figure out if it’s a specific file of mine that’s doing it or if it’s something else just going wrong. I’ll keep it with the resource limits off and see if the problem recurs a few times like it would when it happens. If it continues to happen tonight I’ll turn the limits on again and see if it stops failing again.
2 weeks later and it’s basically the same, without the ulimits set it will eventually fail, with it it works perfectly fine. beats me why, but it’s an easy work around that seems to be keeping up fine.