[REQUEST] Move from SQLite to MySQL or Postgres

So I had been reading this thread for a while and thankfully @Volts brought some reason to it. After the last two comments, there are still a few more tradeoffs that deserve mention in changing database types. The simple fact is what is proposed in this thread is nothing short of a massive undertaking when there are many more pressing issues at hand.

Anyone who has worked on multiple types of database knows that there isn’t a standard SQL language. Each type of database implements a SQL-like syntax but there are large variations between the two. So it’s not as simple as substitute one database type for another and all the SQL statements just work. Furthermore each database has it’s own quirks that must be dealt with and even differing types of columns. So this brings up a very large list of code segments that need to be rewritten or at the least extensively tested should the type of database be changed.

Let’s Assume We Were to Switch Database Types:
To switch database types, we would need a mechanism to bundle the database server or library in PMS itself and run it. Some have mentioned MySQL which does have a commercially licensed embedded version by everyone’s favorite company in the industry. In my personal opinion, this aspect alone kills any hope of this being a good idea.

Others have mentioned PostgreSQL which is a very nice database to use. I’ve used it extensively at $lastJob and I particularly like its JSONB column type for storing more free-form data. The problem is it doesn’t have an embedded version so it would require a fork/exec mechanism to run and there are certain platforms where this isn’t allowed. Keep in mind that the server also runs in a smaller form in mobile clients and iOS doesn’t allow multiple processes. Then there’s also the communication between the server and the database. Ideally this should be a unix socket rather than TCP but not all platforms readily allow this. So this seems to kill the idea of switching to PostgreSQL.

In SQLite, we currently have a few custom column types which are derived from code in PMS. In particular how searches are conducted is using ICU with table columns optimized for this. Other database types could potentially lose this ability which would be a significant loss.

Let’s Assume We Were to Support Multiple Database Types:
Above I described the problems with switching to these database types so what about allowing other types of database in addition to SQLite. The problem here is that it restricts what is allowed to be done within the SQL statements. No feature that doesn’t exist across all database types may be used. So not only do you not gain any additional database features, you lose ones you currently have that are not present in others. Additionally an abstraction layer needs to be in place such that a vast majority of the code doesn’t touch anything SQL and that layer must allow every kind of query currently in use including complex joins and sub-statement selects.

But, if we were to assume that those challenges were overcome, the question remains: who would actually use a different one? How many would care much less understand the advantages/pitfalls of doing so? This seems like a very very small segment of the user base would go down this route.

So with this, the question remains: Are any of these routes worth doing? That’s the question anyone with limited resources must ask and with the above I don’t see that being a yes right now.

5 Likes

Frankly speaking, I can understand where this is coming from, though I wouldn’t label it as a conspiracy theory. More like a general frustration. From a user position, it’s rather frustrating if there is an official forum where you can engage with a company, and sometimes it feels like there is mostly a user exchange. To be fair, I feel like Plex is listening to their users. Some feature requests are implemented really fast. But some threads are left completely alone. Even after years, there are no official responses. I know it’s hard. And being a developer myself, it’s even harder to commit to something that might be held against you at a later point :slight_smile:

This is where there is a lack of clear communication. I am not advocating for a complete switch from SQLite to any other DB. I’m advocating for supporting multiple Databases, and during setup, asking the user. If the user selects the embedded database, then Plex initializes the SQLite DB, and off to the races. However, if the users select mariaDB/Postgres, then Plex prompts for the address of their database server, username, password, and database to use. Plex then connects to said database using the credentials and creates the required tables, etc to use. Using the embedded version of mysql is not a solution.

ISO/IEC 9075? Correct me if I am wrong, but there is a standard SQL language that the DBMS follow, and where there are deviations from the norm, they publizes the differences.

For example, I know and understand where mariaDB/MySQL doesn’t support or implement the standard or where it differs, and try to avoid or use the db-specific query. And even when one forgets, our development mysql server is run ’ --transaction-isolation=SERIALIZABLE --sql-mode=ANSI to catch any differences.

Really, I don’t think anyone who has commented on this thread is asking Plex to use the embedded version of any database. In fact, the OP stated:

I’m now going to reply, but out of order to another members post.

Since again, we are not asking that Plex use any embedded version of a DB, but rather giving the user the option of which DBMS to use. But let’s check something out.

I know mariadb/mysql and postgres can be installed on:
Windows, Linux, FreeBSD, macOS, ASTUSTOR, unRAID, QNAP, Synology (MariaDB) – I believe Postgres is installed on Synology but is locked by default, TerraMaster (MariaDB), and FreeNAS. I admit, I didn’t check on every NAS available, as even though some NAS can run Plex, should they? Regardless, since we are not asking for other DBMS to be embedded, if the required libraries for a particular OS are not available for a specific DBMS, during install the DB option would not be accessable.

As do we, and when we have to use something that is not in the ANSI SQL, we code for that.

Let’s speak of some of the problems with SQLIte.

  • Concurrency. While you can have multiple processes querying a SQLite DB, only one process can make any changes and thus, if multiple processes are trying to do say an INSERT or UPDATE, all other processes must wait.
  • Security. YOU said ‘Security’ but you must remember with RDBMSs, you can control data access with better precision which allows more locking and better concurrency.
  • Writes. Since SQLite only lets one write operation to take place at a time, this limits throughput, especially if the application requires a lot of write operations or there are multiple concurrent writes.
  • DB Size. I know that SQLite can “technically” support a DB up to 140TB in size, but even SQLite suggests if your DB approaches 1TB, to be placed on a RDBMS. I understand that Plex’s DB are no where near that size, but the amount of people who have had database problems, the backdoor fixes, the numerous “DB is busy, waiting 200 seconds”, etc show that there’s a problem. One should not have to worry about their DB getting corrupted, and ultimately the user isn’t doing anything wrong.

I could go on and on.

Let’s go to Appropriate Uses For SQLite, under the section Situations Where A Client/Server RDBMS May Work Better

  • " good rule of thumb is to avoid using SQLite in situations where the same database will be accessed directly (without an intervening application server) and simultaneously from many computers over a network."
  • SQLite supports an unlimited number of simultaneous readers, but it will only allow one writer at any instant in time. For many situations, this is not a problem. Writers queue up. Each application does its database work quickly and moves on, and no lock lasts for more than a few dozen milliseconds. But there are some applications that require more concurrency, and those applications may need to seek a different solution. – Our primary Plex Server supports 30 users constantly. It is spread out over 6 homes, all on a cul-de-sac, all connected to each other via fiber we had run ourselves. All traffic is local. We use Plex extensively for all media watching. Even television shows / programming that can be recorded from a system with a cablecard and local TV via a hd homerun and the recordings from Kodi’s PVR is served back via Plex.

I could go on, but at the end of the day it doesn’t matter. It will never happen. Even with a lot of users wanting the option, having the knowledge to run a db server, giving us the option will never happen. And as @elan said,

I wonder, since the statement “There are almost certainly decent optimizations we can do”, why these optimizations have not been done. I guess all the problems people have with their sqlite dbs becoming corrupt, locked indefinitely, etc has prompted Plex to have dedicated support articles for Repairing a Corrupted Database, which in and of itself is funny because it states “Though rare, it is possible for your main Plex Media Server database to become corrupted.” It isn’t rare, it happens often, and people complain about it. But I can see why they claim it’s rare, as since they have a Support Article about it, when it happens to a user and they search, they come across the fix and do not have to report it. I wonder, how many views to that particular support article there are.

I’ve stated it earlier, but another indication that this would never happen is, and proof that Plex doesn’t actually read the entire post, is the nonsense about using the embedded mysql database. Since the original post clearly stated external. And there is no licensing needed for any software to connect to another DBMS and use it as a backend.

I would like to point out something.

Fortunately, Plex is written in C/C++, and SQLAlchemy is for Python. If going the route of using a 3rd-party API for DB Encapsulation, then SQLAPI++ is a direction, but there’s a cost associated with it (It’s a one time cost, depending on the size of the development team - $598 for up to 10 developers, $897 for unlimited developers) but the good thing about it is, there are no distribution royalties.

I understand there is no easy way to accomplish this. But giving users the option and see how many actual choose the route of using mysql/postgres, before just writing it off. There can even be a warning with using mysql/postgres that Plex will not provide any database support for users electing to use any database system outside of sqlite. But, all we want is the option.

3 Likes

I would love this to be natively inside Plex, but this will never happen.

Few years ago, I wrote an implementation of libsqlite3.so that would parse and rewrite SQLite3 query to route them to PostgreSQL, it worked for a time, but it needed a lot of maintenance.
I spent nights to troubleshot some Plex segfaults.

From what I saw, Plex uses libsoci to write their SQL queries. This lib isn’t an ORM, it’s just a connector, so Plex SQL queries are written in plain SQLite3. Handling multiple database would need a complete re-implementation of their SQL queries since SQL queries are specific to the database engine you use. For example, Plex uses HINT keyword to speedup some queries, but this keyword isn’t available in PostgreSQL.

At the beggining I naively thought I could patch SQL queries with regex. This happened to be a pretty stupid approach and when the regex started accumulating I had to try something new.
I ended up forking a SQLite lexer and patching it to fully support SQLite, parse the whole SQL query and rebuild it for PostgreSQL. It worked most of the time, but some problems appeared.

Main problems where:

  • Plex uses SQLite FTS4 module for full text search, this module create virtual tables to represent the metadata tables. I had to hard replace these virtual tables by the real ones.
  • Plex uses spatial module (don’t remember exactly) to handle Photo coordinates, it could be implemented via PostGIS, but I never did
  • From the point you chose to use this lib, there is no way back, if Plex decide to update and the lexer doesn’t handle a request, your Plex will segfault and will not run until the lib is fixed. There is no way to convert back your PostgreSQL database to SQLite3.

In the end, I was able to boot and run a Plex with basic features (Movies & TVShow playing, scan working) without crash on a PostgreSQL database. Even migrations from the first Plex database schema were working. You still had to convert an existing SQLite3 database with PGLoader to bootstrap the server.

I ended up giving up the project because I was scared this would break in the future and would require a lot of maintenance. Also, since there is no way back, it would require me to rescan my entire media to rebuild a SQLite3 database in case of failure.

I could opensource the project if someone is interested, but I don’t have time/motivation to maintain it. Also, this project is pretty tough to compile and setup.

4 Likes

Which is way harder than just a switch.

As @gbooker02 pointed out, we take advantage of many SQLite features (e.g. FTS4) which aren’t part of any ISO spec.

Again, you’re asking for something much more difficult: “support ALL the databases” (including embedded).

(emphasis mine) Pretty sure the media server is an intervening application server.

There’s not a lot of writing going on. Even for N simultaneous sessions, there’s like one write per session per 15-30 seconds to update view position. We’re a very very read-heavy application.

Priorities. The vast majority of people don’t have databases anywhere near the size at which there could be issues.

1 Like

I would love a switch or support for multiple database systems.
I have 39 600 items in my movie library and it’s crashing constantly.
Tried optimizing and even removing it completely and re indexing everything… problems came back.

Or at least a way to make the sqlite database more stable when handling big databases

Just adding an advanced option for setting the cache_size should go a long way towards solving most stability issues with large databases. Currently, the default is a measly 2 MB.

2 Likes

Personally, I think it’s very stable.

And when that’s said, a huge work, so if this is going to happen, then I do hope that Plex would prioritize this as low as possible, since there’s other matters that IMHO needs more attention

But regardless, you have a point regarding your additional mentions, but overall…No vote from me…Want Plex to focus to other matters, like SSL etc.

Actually this can’t be that much of a work if they abstracted the db connectivity from the rest of the code.By ussing a db class for example. And the advantages of beeing able to go use other db’s would make plex much more usable with large libraries. Also no problems with data or corrupted db’s

Everyone, you’ve pretty much been given the definitive answers on this topic and it’s not going to change anytime soon.

I thought about making another large reply but then I realized that most of the new comments are from those who didn’t actually read the complete posts by the employees/ninjas. So I just don’t see the point.

1 Like

Is allowing us to configure the cache_size (or at least increasing the default from the current 2 MB to something reasonable) unfeasible too for some reason? I have tried to go through the thread but there doesn’t seem to be any reason given against that yet.

2 Likes

Do you specific evidence that it would help anything specifically?

The size is per connection, and we keep a number of connections open.

1 Like

I cannot give you evidence from Plex since I have no way to make the change. But I have been using sqlite databases frequently over the years and this has always given a significant performance benefit.

In Calibre, for example, increasing the cache_size from the default 5 MB to even 128 MB gave almost a 10x decrease in loading times (from almost 2 minutes to less than 10 seconds).

Another media streaming software has also recently added this as a config and the response has been overwhelmingly positive from its users w.r.t the browsing speeds and loading times, even for those that had smaller databases, saying nothing about the larger ones (multi-GB) where the difference was even more significant.

Even if you do have multiple connections, unless they are short-lived, and are fetched and returned from a pool, it should still be a net-positive improvement by far.

2 Likes

Do you have a good way to benchmark via Plex itself?

In my experience the impact of SQLite cache_size is influenced by the application’s IO profile and how good the OS and underlying filesystem are at vfs caching. If the OS is very competent then increasing the SQLite cache might not help.

I’ve been binary patching Plex Media Server to raise cache_size from 2000 to 9999 - the max value when using “simple” binary patching. It may be placebo but it does seem to help. I admit that I haven’t benchmarked it and I have more memory than sense.

I also rebuild my DB with a larger page_size. SQLite changed the default from 1024 to 4096 years ago, but I think Plex’s database template predates that. This is a big win on any filesystem with larger blocks. This also raises the maximum amount of cached data (page_size * cache_size).

Mozilla discussion, with measurements, they went to 32k page_size:

https://bugzilla.mozilla.org/show_bug.cgi?id=416330


@SwiftPanda16, when you requested includeGuids, you shared some benchmarks via the API. Would you be willing to test with these changes? And/or share how you performed those tests?

4 Likes

We use pool size of 20 connections, so if you raise cache size from 2MB to e.g. 4MB that raises memory usage by 40MB which is not insubstantial.

Would love to hear of any specific benchmarking, but our general view is that the OS’ filesystem cache would probably be more important than an app-level page cache (especially one which isn’t shared per connection).

3 Likes

It depends on the hardware where Plex is running. I have 64 GB RAM on my server with the current usage (with multiple processes running) (15 GB) being less than a fourth of that. I would very much like having the option to allow Plex to use the remaining memory as a cache if it would allow it to be more stable and faster in general.

Also, I would be fine if it would actually use the OS filesystem cache more effectively but even that doesn’t seem to be happening right now. Checking the file with vmtouch (vmtouch(8): Virtual Memory Toucher - Linux man page) shows that the current RSS percentage of the file is less than 20% whereas the other media streaming software, with a 256M cache_size, is ~55% and calibre, with a 128M cache_size, is ~70%.

I have tried using vmtouch or just a plain cat to read the whole DB into /dev/null so it’s cached but the moment Plex starts, all that cache seems to be dumped anyway.

And considering that the value of the cache_size is just a maximum which should have no difference in regular usage, if indeed it is as you suspect and the bottleneck is somewhere else.

I can try to give the API script a try later but since the change is not very significant, I doubt the results would be noticeable enough to consider it anything other than placebo.

All I did was pull mass data through the API. I don’t know if database modifications would make a noticeable difference.

Hope that the plex devs at least make the database work with big libraries. My movies server crashes around 1 time per day now…
Had to split up plex into 2 servers. One for tv series and one for movies. The movies one freezes ( 40 000 items )

2 Likes

I suppose the cache_size increase won’t be considered then? Anything specific that we can provide that will help change your opinion about it?

4 Likes

A benchmark which shows a measurable different for real-world loads.

4 Likes